Download presentation
Presentation is loading. Please wait.
Published byKatrina Marcia Snow Modified over 9 years ago
1
Baseball Statistics: Just for Fun!
2
2/16 Issues, Theory, and Data Hypothesis Hypothesis Testing Home Run hitters: more strikeouts and four balls, and less steals? Data collection Korea Baseball Organization and US Major League Home Pages Model y1=#strikeouts,y2=#steals,y3=#4Bs, x=#HRs. Regress y on constant, x. Test the statistical significance of regression slopes using t-tests.
3
3/16 2. Data Collection KBO http://www.koreabaseball.or.kr US Major League Baseball http://www.majorleaguebaseball.com
4
4/16 3. Model I (#strike outs) = 1 + 1 (#HRs) + (#strike outs) = 1 + 1 (#HRs) +
5
5/16 3. Model II (#steals made) = 2 + 2 (#HRs) + (#steals made) = 2 + 2 (#HRs) + (#steals attempted) = 3 + 3 (#HRs) + (#steals attempted) = 3 + 3 (#HRs) +
6
6/16 3. Model III (# four balls) = 4 + 4 (#HRs) + (# four balls) = 4 + 4 (#HRs) +
7
7/16 4. Hypothesis Testing t-test on 1 = 0.84 t -value = 2.89 1 = ?? 1 = ?? 4 = 0.51 t -value = 2.50 4 = ?? 4 = ?? 2 = -0.12 3 = -0.18 2 = -0.12 3 = -0.18 t -value = t -value = -0.94 t -value =-1.14 2, 3 = ?? 2, 3 = ?? Insignificant Significant
8
8/16 4. Hypothesis Testing (1)HR hitters get more strike outs! (3)HR hitters pull out more four balls! (2) HR hitter does not well steal a base because of his big body. Insignificant
9
Wait a minute! To prevent “ spurious correlation ” between #HRs and #strike-outs, #steals, #4Balls, we need to control for the number of appearance at the batter box. Right!
10
10/16 Multiple Regression – control for “ #at bats ” - Without “control for # at bats,” a hitter with more appearances would record a higher number in each category than others, generating “spurious correlation between any pair of variables among #HRs, #strike-outs, #steals, and #four balls. Two ways of control for # at batter box 1.Use a subsample of hitters who appeared more than 100. 2.Use “# at bats” as a control variable in multiple regression.
11
11/16 Model I (extended) (#strike outs) = 1 + 1 (#HRs) + 2 (#at bats) (#strike outs) = 1 + 1 (#HRs) + 2 (#at bats)
12
12/16 Results 1 = 0.89 (2.88) 1 = 0.89 (2.88) 2 -0.03 (-0.49) 2 = -0.03 (-0.49) 1 = 0.84 (2.89) 1 = 0.84 (2.89) 1 = 2.40 (11.64) 1 = 2.40 (11.64) 1 = 0.63 (3.11) 1 = 0.63 (3.11) 2 0.14 (12.53) 2 = 0.14 (12.53) using entire sample using sub-sample
13
13/16 When using a sub-sample which is already rather homogeneous in terms of number at bats, it doesn’t make much diference whether you control for # at bats or not. However, when using the entire sample which comprises of hitters vastly differing in terms of number at bats, control for # at bats does matter. In this entire sample, you would get distorted results if you do not control for # at bats. Interpretation 1 = 0.89 (2.88) 1 = 0.89 (2.88) 2 -0.03 (-0.49) 2 = -0.03 (-0.49) 1 = 0.84 (2.89) 1 = 0.84 (2.89) 1 = 2.40 (11.64) 1 = 2.40 (11.64) 1 = 0.63 (3.11) 1 = 0.63 (3.11) 2 0.14 (12.53) 2 = 0.14 (12.53) entire sample sub-sample
14
14/16 Model II (extended) (#4Balls) = 1 + 1 (#HRs) + 2 (#at bats) (#4Balls) = 1 + 1 (#HRs) + 2 (#at bats)
15
15/16 Results 1 = 0.34 (1.71) 1 = 0.34 (1.71) 2 0.12 (2.77) 2 = 0.12 (2.77) 1 = 0.51 (2.50) 1 = 0.51 (2.50) 1 = 1.32 (11.01) 1 = 1.32 (11.01) 1 = 0.33 (2.73) 1 = 0.33 (2.73) 2 (11.51) 2 = 0.07 (11.51) entire sample sub-sample
16
The End Was it fun?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.