Studying the Effects of Aging in Major League Baseball Phil Birnbaum www.philbirnbaum.com.

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

Umpire Rules Test (cont.) 29. runner on 2 nd attempts to steal 3 rd. The batter swings and nicks the pitch that is caught by the catcher, then steps backward.
The “Sophomore Slump” Mike Kalis, Joe Hultzen, James Asimes.
Randomized Strategies and Temporal Difference Learning in Poker Michael Oder April 4, 2002 Advisor: Dr. David Mutchler.
Gould on Baseball: The demise of the.400 hitter (Notes from Full House)
Nate Silver, Baseball Prospectus,
Baseball Statistics By Krishna Hajari Faraz Hyder William Walker.
PREDICTING MLB CAREER SALARIES Stephanie Aube Mike Tarpey Justin Teal.
Baseball is a bat and ball sport played between two teams of nine players each. The goal is to score runs by hitting a thrown ball with a bat and touching.
Is There Really Racism Among MLB Umpires? Revisiting the Hamermesh Study Phil Birnbaum
SCOREKEEPING CLINIC Presented by: Anita Arnold 2/19/2015.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Part 3: The Minimax Theorem
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Calculating Baseball Statistics Using Algebraic Formulas By E. W. Click the Baseball Bat to Begin.
Statistical Analysis – Chapter 4 Normal Distribution
Taking Uncertainty Into Account: Bias Issues Arising from Uncertainty in Risk Models John A. Major, ASA Guy Carpenter & Company, Inc.
Chapter 9: Interpretative aspects of correlation and regression.
April 27, Nolan Ryan pitched for a record breaking 27 years in the MLB. He played for four different teams. He was inducted into the baseball hall.
Two Sample Project Example 5/6/2013 Ms. Browne made this up Saber metrics: TX Rangers vs. SF Giants.
Brian Duddy.  Two players, X and Y, are playing a card game- goal is to find optimal strategy for X  X has red ace (A), black ace (A), and red two (2)
Were the 1994 Expos Just Lucky? Estimating the “real” skill level of a team Phil Birnbaum –
Why Normal Matters AEIC Load Research Workshop Why Normal Matters By Tim Hennessy RLW Analytics, Inc. April 12, 2005.
PARAMETRIC STATISTICAL INFERENCE
A Comparison Between the Mets and the Yankees Many baseball fans criticize the New York Yankees for “buying” the best players in Major League Baseball.
SANDY KOUFAX A pitching legend. BIRTH Sandy Braun was born on December 30, In 1938 when Sandy was three, his parents divorced. His mother Evelyn.
Sabermetrics- Advanced Statistics in the MLB. More On Base Percentage (OBP) measures the most important thing a batter can do at the plate: not make.
Test Topics 1)Notation and symbols 2)Determining if CLT applies. 3)Using CLT to find mean/mean proportion and standard error of sampling distribution 4)Finding.
1919 New York Yankees Official Logo William Brennan Sports Finance February 6, 2014.
Case 2: Assessing the Value of Alex Rodriguez Teresa Sonka Gail Bernstein.
Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.
Examining Home-Field Advantage Phil Birnbaum
90288 – Select a Sample and Make Inferences from Data The Mayor’s Claim.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
AP STATISTICS LESSON SIMULATING EXPERIMENTS.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
Essential Statistics Chapter 161 Inference about a Population Mean.
 Rules and gameplay  Scoring  Field  Red is input.
© Wallace J. Hopp, Mark L. Spearman, 1996, Forecasting The future is made of the same stuff as the present. – Simone.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
A baseball/softball game is played by two teams who alternate between offense and defense. There are nine players on each side. The goal is to score more.
Statistics Outline I.Types of Error A. Systematic vs. random II. Statistics A. Ways to describe a population 1. Distribution 1. Distribution 2. Mean, median,
MIS 480/580 Final Project Presentation Knowledge Management in Cricket – A Research Project By: Luis Barreda Deepika Nim Jagadish Ramamurthy James Sanford.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 14 l Time Series: Understanding Changes over Time.
My Baseball survey by Angel Aguila
Who was the greatest person in baseball? By: Austin Kidder.
6.3 One- and Two- Sample Inferences for Means. If σ is unknown Estimate σ by sample standard deviation s The estimated standard error of the mean will.
Impulse = F  t = m  v F = Force (N)  t = Elapsed time (s) m = Mass (kg)  v = Change in velocity (m/s) Force, time, mass, and  velocity Example: A.
Comparative Advantage and Specialization Sports and Trade
Ryan Howard’s Unique Journey to the Big Leagues John D. Cappello Cell
Fightin Phils Story Objectives On base/ Slugging Earned run/ batting Fielding % PLAY GAME Directions Home Page.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
As a data user, it is imperative that you understand how the data has been generated and processed…
The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.
By. -Mean (Average) =.292 -Median =.296 -Range =.139 -Mode =.296 -Outliers:.200,.245,.261, , and.339 -Outliers:.200,.245,.261, , and.339.
Lopez – MA 276 MA 276: Sports and statistics Lecture 2: Statistics in baseball 0.
Statistics in Baseball How do players perform under different counts?
At Bats Hits Runs Doubles Triples Home Runs RBI’s Walks Batting Average Strikeouts.
By Adam Rothstein and Jesse Cox. Project description We are going to examine what characteristics, extranalities, and other influences are statistically.
The Baltimore Orioles, Relationship of Wins and Loses, Batting Average, Earned Run Average, and Errors Stalanic Anu, Matthew Beeman, Jonathon Chudoba,
Psychology Unit Research Methods - Statistics
Association of Player Demographics and Pitching Characteristics with Pitcher Injuries in Major League Baseball Leslie Schwindel, MD; Vincent Moretti, MD;
01. Invent Yourself: Good Guesses
Bellwork Suppose that the average blood pressures of patients in a hospital follow a normal distribution with a mean of 108 and a standard deviation of.
Do Pitchers Try Harder for Their 20th Win?
A-Rod: Signing the Best Player in Baseball
Science Fair – Baseball
Presentation transcript:

Studying the Effects of Aging in Major League Baseball Phil Birnbaum

Aging patterns in baseball How do players age? Is it different for hitters and pitchers? If you have a good player who's 31, how much do you expect him to decline over the next few years? Want a result like: "hitters decline X% between age 31 and 35"

Studies Bill James' classic aging study in the "1982 Baseball Abstract" Work by Tom Tango Academic studies: Jim Albert, Ray C. Fair, and others (This presentation is based mostly on Tango, with a bit of James)

Previous findings The best batters peak at 27 – that's when most of the major awards are won (James) Different skills peak at different times: speed early, HRs mid-career, BBs late (Tango)

A naive look What's the average performance of the various age cohorts? Fairly similar, it turns out, except at the extremes

Average Batting vs. Age

Average Pitching vs. Age

A naive look Statistical illusion Curve traces different groups of players Players at 25 are a cross-section of the league Players at 40 are former superstars The players at 40 were much better players when they were 25

Example Age 27 Player A: 6.00 … Player B: 5.00 … Player C: 4.00 Average: 5.00 Age 35 Player A: 5.50 … Player B: 4.50 … Player C: released Average: 5.00 Age 40 Player A: 5.00 … Player B: retired … Player C: released Average: 5.00 All players decline with age, but the mean is still 5.00

Paired seasons "Paired seasons" method Find all players who were 28 in season X See how they did in season X+1 (Weight the average by playing time) The average difference reflects the effects of aging from 28 to 29 Career path obtained by chaining (multiplying) single-year effects

Paired seasons: Batting

Paired seasons: Pitching

Paired seasons: results biased The paired-seasons method shows big declines as players age But it suffers from a bias – selective sampling Players who were "lucky" in season X (large positive error term) get more playing time in season X+1 Those "lucky" players will show bigger declines So big declines are over-represented

Example Three 37-year-olds, all of whom have skill of.250 this year,.240 next year This year, due to chance, they hit.200,.250,.300 respectively The.200 guy is forced to retire The.250 guy plays half time next year and loses 10 points (.250 .240) The.300 guy plays full time next year and loses 60 points (.300 .240) The weighted average loss is 43 points, not 10 points The decline is very much overestimated

How can we eliminate this bias? Can try to estimate the "true" talent of the three players Regressing to the mean The.200 guy is "probably".220 The.250 guy is "probably".250 The.300 guy is "probably".280 Now the third guy declines only 40 points, not 60 Average decline: 30 points More accurate than previous estimate of 43 points If we regressed "perfectly" – all players to their talent of.250 – we'd get the right answer (10 pts)

Regressing season X How much to regress? Need to do some research to figure that out Can probably get a theoretical lower bound from binomial (multinomial) distribution For now, consider 10% and 30%

Batting, regressed 10%

Batting, regressed 30%

Pitching, regressed 10%

Pitching, regressed 30%

Conclusions Results sensitive to how much we regress Getting correct estimates of aging using the paired- seasons method depends on solving the selective sampling problem and/or figuring out how much to regress Alternative: can fit curves to careers (Albert, Fair) But this method requires a long career, which means only the most successful players are analyzed Some selective sampling issues there too

References "Looking For the Prime," 1982 Bill James Baseball Abstract, p. 191 Tom Tango, Tom Tango, "Forecasting Pitchers – Adjacent Seasons," Ray C. Fair, "Estimated Age Effects in Baseball,"