Almost a year and a half ago, MLB Advanced Media announced a major update in its tracking of baseball games: a system that would supersede (and, eventually, replace) PITCHf/x and its non-public derivatives. We’ve had such amazing fun out here in the public with PITCHf/x that I think we’ve all been completely drooling over the possibilities that the new system, named Statcast, would be capable of providing us.

At Beyond the Box Score, Bryan Cole discussed the new system after information from it had started to be used in a limited number of games (it was installed in 3 parks last year, but all 30 this year):

How It Works: StatCast is actually a combination of two systems. The first, produced by TrackMan, tracks the trajectory of the ball as it is pitched, hit, and thrown. TrackMan’s system is based on Doppler radar and will be discussed more in the next section. The second, which tracks the players, is the brainchild of ChyronHego and uses two camera arrays spaced about 15 meters apart to capture stereoscopic video. This allows StatCast to judge three-dimensional distance to a player similar to the human visual system. Not much is known yet about the process that allows them to produce videos like this, but the combination of video and radar is memory-intensive; terabytes of raw signals will be produced for every game.

Taking a quick step back: the first priority for MLBAM has been fan experience of games, and reasonably so. PITCHf/x lets Gameday tell us what pitches were thrown, where, and how fast; that’s quite a lot more entertaining than following a game live on an app and seeing “ball” or “strike.” On television, we’ve gotten the same information — and I can’t be alone in relying on it a ton. Different ball parks have different center field camera angles, etc., and it’s nice to see what happened right at the plate, given that the catcher’s glove is a few feet behind it.

Despite everything we’ve been able to do in analysis after the fact with PITCHf/x data, the game experience aspect of everything is probably the reason we have access to any of this information. That’s the backdrop. “Terabytes of raw signals” also means that the raw data would be unworkable for most people; I don’t think I’ve ever seen a hard drive in person large enough (7TB+) to handle the data for a single game captured by the Statcast equipment.

But in a Grantland interview in March 2014 (also linked by Cole in the BtBS piece), MLBAM CEO more than implied that getting access was not a pipe dream:

We’ve talked to baseball ops people, letting them know that the path we’re going down is to make it available to everybody, and we want that to be well understood by every team. It won’t be as granular as stats folks want initially, more like looking at the finished product rather than ingredients — though ops people will see that granular data right away…

We want to get it out in somewhat varnished form to the masses. But the goal is to also get it out in unvarnished form to people who want it eventually. That provides value, too, so that [analysts] can think of things that we haven’t thought of.

Hooray! This was back in early 2014, and the system has only been installed in every park for this season’s games. What we have gotten is batted ball velocity, which was provided to and made available by Daren Willman at Baseball Savant, a ridiculously helpful resource that makes the difference between possible and impossible for some analysis, including at this site. Other than that and some broadcast snippets, these new Statcast leaderboards are all that we’ve received so far. Each of the five menus has just the top 50 results for some particular metric, but other metrics are included for that top 50. Here’s a quick guide to what we can learn from each.

HR Distance: The 50 most explosive bombs this season, by distanced measured. If you’ve ever hung out at ESPN’s Home Run Tracker, you’ve enjoyed their “true distance” estimates, speed measured off the bat, apex, elevation angle and a granular metric for where between the foul poles it landed. The Home Run Tracker also knows what turns into a home run (which involves not just dimensions, but wall height — depending on trajectory, that can make a difference), and so at that site, you can also learn for a particular HR the number of parks in which it would have been a HR. Actively comparing HRs to the parks in which they were hit are helpful, because it’s sometimes nice to know how many “Just Enoughs” are among a particular player’s HRs. You could tell with the Home Run Tracker which players might be more affected by a change in home ball park than others, but only if a change would make it harder — there are no “not quite enoughs,” which kind of makes sense anyway. Here, we’ve used the information when trying to determine the extent to which home runs would be reduced through use of a humidor, which was a big help to that analysis (thanks, Greg Rybarczyk and ESPN!).

The Statcast leaderboard tells us more information about the top 50 home runs; unlike the Tracker, it matches the home run to pitch speed. That’s not really an advancement, for analysis purposes; it would have been a chore to match up pitch data from Savant or Brooks Baseball with the data from Home Run Tracker, but not an impossible one, especially if it were just for a top 50 list. Instead, this Statcast leaderboard really only tells us that the other information about home runs at the Home Run Tracker was pretty reliable (I’ve spot checked), but maybe a little off (sometimes a few feet or one degree). Otherwise, this leaderboard is just a way to gush over the players who made the top 50 multiple times (hey there, Giancarlo Stanton) and the long home runs that were unusual (how do you hit a homer 480 feet by hitting it 176.8 feet in the air, Hanley Ramirez?). So we can, for example, say that no D-backs hitter has one of the year’s longest 50 home runs; we can’t use this to say that no D-backs hitter has hit a home run longer than 450 feet.

Exit Velocity: Similar to the HR Distance leaderboard, but for all batted balls, and sorted for launch speed. That’s what we got at Savant, via Willman; my understanding, though, is that the files share with Willman have been a little spotty, especially at the beginning of the season. It wasn’t a big difference, and as far as I can tell, there are only a few of these top 50 that are not at Savant (actually only found one, but it’s hard to tell because of rounding). Distance is something we already had; it was important for broadcasts and any kind of spray chart you’ve ever seen to know how far the ball had traveled before it was fielded. Launch angle, though, well now that’s interesting. Right now, we make do with batted ball “types” for this information: line drives, fly balls, ground balls, popups. We don’t really need more granular information to gauge results, although I think that could help make some smaller samples a bit more meaningful; what it could do, however, is help us understand how accurate or precise a player is with the bat, and maybe even determine when a player’s skill level has changed with that (probably on the “it’s deteriorating” side).

A top 50, though, does little to help in analysis here, and just as with the HR list, it’s more about storytelling. This particular list, though, has four batted balls from Diamondbacks; we knew about the two Mark Trumbo had hit, and David Peralta has been on that list for some time. For what it’s worth, though, the list does include a second Peralta batted ball we didn’t have, a 116.1 mph line drive that never made it out of the infield and was caught for an out on April 27 this year. You can’t fake bat speed like that, friends.

Average Hit Velocity: Now we’re talking, because unlike the previous two lists that only addressed 50 batted balls each, this list is an analysis of all of a particular player’s batted balls, sorted for the 50 players with a top average speed. In other words, even though this only includes 50 players, it has some additional analysis (synthesis?) baked right in. Hit the ball hard, and good things happen. I’ve had some fun this season putting batted balls in buckets based on speed; batting average skyrockets once you get past 90 mph. This includes some extra batted balls; Paul Goldschmidt has “178 tracked events,” which includes 12 not at Savant. The differences aren’t too significant. The inclusion of that 116.1 mph lineout does bump Peralta into the top 50, with an average batted ball speed of 92.2 mph. Not bad, I says. Not bad at all. Launch angle, again, could be really cool — but although we might find that hitters with certain average launch angles match up better against certain types of pitches or pitchers. This is something that now we can start to at least test out as a concept with specific hitters among these top 50, although it would be difficult to control for how good they are in the first place, etc., because this is not a representative sample.

Another really interesting thing: “Average Gen Velocity (mph),” which is, I take it, the average difference between the speed of the ball put in play by the hitter and the speed of the pitch. Since these are averages, I don’t know if they’re using something like an equivalent of “end speed” in the PITCHf/x data, but that’s what we’d want to calculate, probably. It’s kind of a curious thing, and it seems to match up well with good hitters; it passes the smell test, but since pitch speed shouldn’t matter a whole lot, it probably shouldn’t pass the smell test. Guys who crush the ball supply their own momentum in addition to that ball-restitution force that bounces it off the bat. We might suffer from some really awful lurking variables, however, both for average batted ball velo and for this “avg gen velocity.” Jarrod Saltalamacchia ranks third on the average launch speed list (!) at 95.0 mph, well ahead of Goldy’s 93.5 mph. Welington Castillo ranks 21st, at 93.2 mph. But if not-quite-so-called “gen velocity” is important (something of which we should be skeptical right now), then we need something in the middle. Even if batted ball speed is completely or nearly completely about the bat and not the speed of the pitch, Salty being first but only having 4.0 mph gen velocity might be because he is only hitting fastballs in the first place; so the 95 mph average may be unfair. I’m not sure what to make of this, but that’s the point: we need to look under the hood here not only to be able to do more analysis, but to know which biases we may have a problem with.

Fastest Pitches: What we learn here: Aroldis Chapman may be as inhuman as his pitches are inhumane. All 50 of this year’s fastest pitches were fired by Chapman. But, seriously, we have more that we know Statcast can do. We’ve already had spin rate for the ball via PITCHf/x, and granular information about release point (Brooks Baseball is still the best destination for that kind of thing, I think, but it’s all in the raw data). I’m not sure if the “extension” information here is front foot or pitch release, but the latter seems to match well with PITCHf/x — it’d be helpful to test that against, say, one other pitcher. Thanks for nothing, Aroldis.

Average Pitch Velocity: What Average Hit Velocity was for the Exit Velocity board, this is for the Fastest Pitches board. Now, we have far more than 50 pitches, and far more than 50 pitches for 50 pitchers. In the Average Hit Velocity leaderboard, it was possible for the same player to be there twice: Trumbo repped himself well both with the D-backs and the Mariners. You can do the same thing here, like Jumbo Diaz and Noah Syndergaard and Yordano Ventura have done, basically by having two different fastballs thrown very hard. The inclusion of the Avg Extension information is fun and helpful (oh you devil, Carter Capps, leaping to a 8.2 foot extension), but when applied to reach “Avg Perceived Speed,” we may or may not have inconsistent results.

This is all relative, right? It’s not that Chapman’s 100.2 mph actually looks like it’s 100.9 mph, and in truth it’s neither speed by the time it reaches the plate (maybe even 10 mph slower, probably 7 or so). This looks like they took the “extension” number and added their own kind of proportional speed into the mix based on how quickly it reaches a batter compared to someone with an average extension. It looks like extensions under 6.5 feet mean perceived speed goes down, and over 6.4 feet means perceived speed goes up. So there’s your average, something like 6.45 feet extension. What I don’t really understand is that in PITCHf/x, we got speed readings from the very same point for every pitcher, regardless of where the pitch was released. This looks like it’s measured at the release point, exactly — and if that’s the case, then yeah, we kind of do need some kind of stat for “perceived.” Which could just be, “speed ten feet from the mound” or something like that, or what PITCHf/x currently offers.

Overall: this is a little underwhelming, but there’s enough here — especially with launch angle — to be even more tantalized. When we get more Statcast-derived information, we’ll learn more. If the increases are incremental, that still may not add up to much. Right now, it seems, MLB teams have access to the unvarnished data, or at least more of it. Just as MLBAM CEO Bowman was concerned about three teams getting a competitive advantage last season (because it was installed in just three parks), what is being done with the data available now could have big consequences. Every team — including the D-backs — presumably has someone working through the data made available to them, flagging the types of things that the data could be helpful for and how to use it. This extra teaser from Statcast makes it even more apparent that synthesizing the data that the system provides is not even necessarily about gaining a competitive advantage — it’s about making sure that you’re not left behind.



Tagged with:

Leave a Reply

Your email address will not be published.