Download presentation
Presentation is loading. Please wait.
Published byStephany Allen Modified over 9 years ago
1
PitchFX: Sounds great!... Now, where do I get it? Daniel I. Brooks The University of Iowa
2
PitchFX
3
Tracks each and every pitch thrown in MLB – in real-time Provides all of the parameters necessary to very accurately model the flight of the baseball from the pitcher’s hand to the plate Data includes accompanying play-by-play info Available at a very affordable price. (Free!) Sounds great!... Now, where do I get it? PitchFX
4
Overview Prologue: Accessing PitchFX Data in 2007 PitchFX Data Today Part 1: How can interested fans get access to it? Part 2: What can they get out of it? Part 3: How could availability and analysis improve?
5
Prologue: How to get PitchFX Data Circa 2007
6
(The First?) Step by Step Guide Alan Nathan – Published August 6 th, 2007 – http://webusers.npl.illinois.edu/~a- nathan/pob/tracking.htm http://webusers.npl.illinois.edu/~a- nathan/pob/tracking.htm Contains a section on “How to Download”. Ought to be straightforward enough, right?
7
How to Download MLB Extended Gameday Pitch Logs I. How to Download Downloading the data: Go to the web site http://gd2.mlb.com/components/game/mlb/. Click on the year, then on the month; on the next page click on the day; on the next page click on the specific game; on the next page click on pdb; on the next page click on pitchers. For the Baltimore vs. Boston game played on August 1, 2007, the full link is as follows: http://gd2.mlb.com/components/game/mlb/year_2007/month_08/day_01/gid_2007_08_01_balm lb_bosmlb_1/pbp/pitchers/http://gd2.mlb.com/components/game/mlb/ http://gd2.mlb.com/components/game/mlb/year_2007/month_08/day_01/gid_2007_08_01_balm lb_bosmlb_1/pbp/pitchers/ The above steps take you to a page with a bunch of links that are of the form zzzzzz.xml, where zzzzzz is a six-digit code for a specific pitcher (see section III). For the above game, click on 122201.xml, which will get you to the pitch logs of Paul Shuey, who pitched to two batters in the 7 th inning. There is no way you could have known this
8
Here’s What You Get Error Message?
9
How to Download You will then see a lot of numbers on the screen. Use whatever tools you have with your browser (e.g., "save page as") to save it as 122201.xml in some convenient folder. Now launch Excel. From the File menu, open the file you just saved. An Open XML box will pop up. Check the As an XML list box, then click OK, and the file will load. You should see columns A through AK (37 columns total) filled and with headers in the first row. Immediately save is as an Excel file. The number of columns may change depending on when the file was written, and there is no guarantee that the number will remain the same into the future. However, the header names will hopefully stay constant. In the next section, I will discuss the meaning of the important parameters in the database. As long as your version of Excel supports this… earlier versions (and some Mac versions) do not
10
It gets harder… But that’s only if you want to look at two batters worth of data (pitched by Paul Shuey!) Want to look at multiple starts? Then you need a database. And you probably need perl or php or some other scripting language with easy XML parsing. And then you need a database front-end, and you need to learn SQL to access your database…
11
Accessing PitchFX data in 2007… Is really hard. Is really time consuming. Requires a high level of technical expertise. And that’s before you even ever get into what it means. What has changed to help remedy this problem?
12
Part 1: The Casual Sabermetrician or How can interested fans access PitchFX Data?
13
The Casual Sabermetrician A new group of baseball viewer “Casual Sabermetricians” This group is roughly made up of: – Bloggers – Forum Dwellers And… – Sportswriters – Major League Scouts
14
The Casual Sabermetrician The casual sabermetrician : – Wants to answer data-driven question – Knows PitchFX Data is out there – Lacks expertise to access PitchFX data
15
How to Access PitchFX Data? There are now a few different ways these individuals can access PitchFX data: – Josh Kalk’s Website (now offline) – Fangraphs.com – BrooksBaseball.net
16
PitchFX Tools Fangraphs.com – Seasonal detail – Some game-by-game info – Lots of other sabermetric statistics handy BrooksBaseball.net – Lots of game-by-game detail – Easily view other pitchers from same game – Strikezone maps / Splits / Situational Graphs
17
PitchFX Tools These tools simply getting information Still require that you can interpret the data once you have it… …but they offload the busy work onto computers
18
Part 2: What can we get from the PitchFX data?
19
Let’s Pick a Pitcher Suppose we were interested in Jon Lester. Let’s generate a “scouting report”: – What does he throw? – How hard does he throw the ball? – What mix of pitches does he use in games? – Which pitches worked for him? – When does he throw different pitches?
20
FanGraphs.com: How to Search
21
FanGraphs.com: Click “PitchFX”
22
FanGraphs.com: Pitch Selection
23
FanGraphs.com: Velocity Tracking
24
PitchFX through B-Ref
26
Searching BrooksBaseball.net
28
Jon Lester Pitch Clusters
31
Lefty/Righty Splits Vs. LHHVs. RHH
32
Maintaining Velocity
33
Different Pitches in Different Counts
34
Smoltz Pitching Backwards
35
Strikezone Map
36
PitchFX Tools Using a combination of PitchFX tools we can get an incredible amount of information about how a pitcher has performed. Fangraphs: season-wide perspective BrooksBaseball: start-by-start perspective
37
PitchFX Tools Each tool provides other information that can help evaluate a pitcher: FanGraphs provides easy access to other sabermetric pitching statistics BrooksBaseball provides easy access to other pitcher detail from the same game
38
One More Case Study Aroldis Chapman “Aroldis Chapman has a tantalizing 100 mph fastball, but also question marks about his other pitches -- and his maturity.” -…also ESPN “He has a fastball clocked at 101 or 102 MPH, and a plus curveball and plus slider, to use the scouts' vernacular.” -ESPN “His fastball was clocked from anywhere between 97 and 100 mph.” -MLB.com "In order to become the best pitcher, I still need lots of things. I need to improve professionally. I need to work. I need to work with curveballs. I need to work with other kinds of pitches." -Chapman “He throws 100 and 101 mph… If he polishes up his changeup and tightens up his slider, he can be a young Randy Johnson.” -His Agent
39
Case Study: Aroldis Chapman
40
Can Aroldis Throw 100mph?
41
A “plus-slider and plus curveball”? ?
42
Case Study: Aroldis Chapman You can go do this at home. You need to know virtually nothing about computers, you just need to know who Aroldis Chapman is and when he might have pitched.
43
Part 3: Improving Availability and Analysis
44
The New Access Barrier The casual sabermetrician : – Wants to answer data-driven question – Can easily access PitchFX data Problem Solved!... Right?
45
The New Access Barrier The casual sabermetrician : – Wants to answer data-driven question – Can easily access PitchFX data Problem Solved!... Right? – Two existing problems: Data analysis is non-trivial How trustworthy is the data?
46
Data Analysis is Non-Trivial Identifying pitches can be difficult at first – though it gets easier with practice Sabermetricians are notoriously descriptive statisticians. You could read dozens of articles online and not find a single inferential statistical test or any measure of variability. This is exacerbated by a strange fascination with small sample sizes.
47
Problems with Trust Our tools purport to show lots of information How accurate is this info? Scouts/Teams may feel that the data isn’t trustworthy enough to use to evaluate pitchers. May feel that due to obvious errors, data is bad – Consistent pitch classification is a huge problem. May feel that due to odd conventions, data makes no sense
48
That Graph From Earlier
49
Consistent Classification is an Issue
50
Problems with Trust Certain conventions that the community has adopted are strange and educated fans/scouts get frustrated – Vertical Movement (rising fastballs, etc) Certain results from the data are so counterintuitive that people get worried – Sinkers in large majority don’t really sink.
51
Existing Problems Existing problems in accessibility relate to: – Poor ability to analyze / understand data – Lack of trust of data With very basic instruction, both of these problems could be easily overcome
52
Conclusions
53
The New Access Barrier Rather than technical, the new access barrier is subject-driven: – How do different pitches move? – How do I identify pitches? – What kind of information can I extract from the PitchFX dataset?
54
The New Access Barrier My opinion: these kinds of limitations are better than technical ones. Sort of like running a complex statistical analysis: – hard and time consuming by hand… – but the problem is really in the interpretation
55
Conclusions But, for most people who are: – a) familiar with major league pitching – b) able to read a graph or table – c) familiar with very basic statistics Accessing & learning how to understand PitchFX data is greatly simplified. Instantly provides a wealth of previously unavailable information to the interested fan.
56
…since most of us don’t have these guys at our disposal
57
Thanks Sportvision The Many “Daves” of Fangraphs.com Alan Nathan Alex Clapp Good friends at sonsofsamhorn.net
58
Questions? For Webcasters - Feel free to email: Contact info: dan@brooksbaseball.netdan@brooksbaseball.net
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.