Eric Huggins, Ph.D. Fort Lewis College Durango, CO.

Slides:



Advertisements
Similar presentations
Regression of NFL Scores on Vegas Line – 2007 Regular Season.
Advertisements

Multiple Linear Regression uses 2 or more predictors General form: Let us take simplest multiple regression case--two predictors: Here, the b’s are not.
Brief introduction on Logistic Regression
Time Series and Forecasting
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Picking a Winner in the NHL… Is It Possible? An Investigation By.
Choosing Sample Size and Using Your Calculator Presentation 9.3.
Linear Transformation and Statistical Estimation and the Law of Large Numbers Target Goal: I can describe the effects of transforming a random variable.
x – independent variable (input)
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
Biol 500: basic statistics
Review of the fundamental concepts of probability Exploratory data analysis: quantitative and graphical data description Estimation techniques, hypothesis.
The Best Investment in this Economy (Safer than the S&P) Ana Burcroff Kathleen Fregeau Brett Koons Alistair Meadows March 3 rd 2009 – Team 7.
Are Behavioral Biases Consistent Across the Atlantic? The Over/Under Market for European Soccer Rodney J. Paul – St. Bonaventure University Andrew P. Weinbach.
Simple Linear Regression NFL Point Spreads – 2007.
Blackjack: A Beatable Game
1 CHAPTER M4 Cost Behavior © 2007 Pearson Custom Publishing.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
2.4 Using Linear Models. The Trick: Converting Word Problems into Equations Warm Up: –How many ways can a $50 bill be changed into $5 and $20 bills. Work.
Graph the linear function What is the domain and the range of f?
Line Bias: exploiting gambling line data Mike
The New Era of the NCAA By: Mr. Dunlap Why we are here? We have already seen during the past year, change is coming with the NCAA. Football is leading.
Trap Games in College Football Ryan Gimarc EC499 – Spring 2013.
STAT E100 Section Week 3 - Regression. Review  Descriptive Statistics versus Hypothesis Testing  Outliers  Sample vs. Population  Residual Plots.
Blackjack: A Beatable Game Amber Guo Adapted from: David Parker Advisor: Dr. Wyels California Lutheran University ‘05.
NFL Play Predictions Will Burton, NCSU Industrial Engineering 2015
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
4.1.2 – Compound Inequalities. Recall from yesterday, to solve a linear- inequality, we solve much like we solve an equation – Isolate the variable –
Using Relationships to Make Predictions Pythagorean Formula: Predicted Winning Percentage.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Math Meeting Write today’s date __________________________ How many weeks and extra days are there until May 1 st ?_______________ Time: _______________________.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
The NCAA in 2035 By: Mr. Dunlap Why we are here? We have already seen during the past few years, change is coming with the NCAA. Football is leading.
Basic Review of Statistics By this point in your college career, the BB students should have taken STAT 171 and perhaps DS 303/ ECON 387 (core requirements.
Section 4.2 Building Linear Models from Data. OBJECTIVE 1.
* SCATTER PLOT – GRAPH WITH MANY ORDERS PAIRS * LINE OF BEST FIT – LINE DRAWN THROUGH DATA THAT BEST REPRESENTS IT.
Jeopardy Domain Range Graph transformation s Solving Functions Function Regression Probability Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300.
3.2 Least-Squares Regression Objectives SWBAT: INTERPRET the slope and y intercept of a least-squares regression line. USE the least-squares regression.
Using Basic Statistics to fill out an NCAA Bracket Peter Legner Math Resource Center Specialist, College of Southern Nevada I am happy.
LEAST-SQUARES REGRESSION 3.2 Role of s and r 2 in Regression.
PRESENTATION TO MIS480/580 GABE HAZLEWOOD JOSH HOTTENSTEIN SCOTTIE WANG JAMES CHEN MAY 5, 2008 Betting in Super Bowl match ups.
Section 3.9 Linear Approximation and the Derivative.
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
Time Series and Forecasting Chapter 16 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
EXCEL DECISION MAKING TOOLS AND CHARTS BASIC FORMULAE - REGRESSION - GOAL SEEK - SOLVER.
The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.
Gambling and probability 1. Odds and football.  Predict the Premier League results for this weekend.  Can you estimate the probability of a win/draw/loss.
Chapter 7: Random Variables 7.2 – Means and Variance of Random Variables.
Wednesday: Need a graphing calculator today. Need a graphing calculator today.
Scatter Plots & Lines of Best Fit To graph and interpret pts on a scatter plot To draw & write equations of best fit lines.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
CHAPTER 3 Describing Relationships
3.2 Least-Squares Regression
CHAPTER 3 Describing Relationships
Unit 6 test review Describing data.
Psychology Unit Research Methods - Statistics
Determining How Costs Behave
Building Linear Models from Data
Math in Science In science we use the metric system to make measurements in the lab The basic unit of the metric system include: Gram (mass) Liter (volume)
Corruption in NCAA basketball, Exposing cheating and corruption
S519: Evaluation of Information Systems
The NCAA in 2035 By: Mr. Dunlap.
Common Core Math I Unit 2 Day 2 Frequency Tables and Histograms
Correlation and Regression
Pac-12 Student Sections.
CHAPTER 3 Describing Relationships
The New Era of the NCAA By: Mr. Dunlap.
Solving Linear Equations
What’s the plan? First, we are going to look at the correlation between two variables: studying for calculus and the final percentage grade a student gets.
Solving Harder Linear Equations
Presentation transcript:

Eric Huggins, Ph.D. Fort Lewis College Durango, CO

Goals 1. To estimate the probability that a football team will win a game in advance. 2. To determine and use all relevant and significant leading information: Point Spread Over/Under Home/Away 3. To develop the best fitting equation for a large set of data. 10/14/2012INFORMS 2012

References Point Spreads = Efficient Markets Numerous How to Beat Point Spreads Numerous Questionable “On the Probability of Winning a Football Game” by Hal Stern The American Statistician, /14/2012INFORMS 2012

Stern’s Paper Data from NFL 1981, 1983 and 1984 seasons. n = 224 per year Stern developed an equation to predict the probability that the favorite wins depending on the point spread p. For p < 6, P(Favorite wins) ≈ 50% + 3%p. Example: The Arizona Cardinals are a 4.5 point favorite later today, so there is approximately a 63.5% chance that they will win. 10/14/2012INFORMS 2012

Extensions 1. The probability function is clearly non-linear: If the point spread is 0, the probability is 50% As the point spread increases, the probability should approach 100%, non-linearly. 2. NFL points spreads are usually close and there are only a couple hundred games per year. So, use NCAA college football instead: Wide variation among point spreads. Almost a thousand games per year. 10/14/2012INFORMS 2012

Data Collection: 10/14/2012INFORMS 2012 ARIZONA STATE (2010 Season) (SUR: 6-6 PSR: 10-2 O-U: 7-4) S.04* PORT. ST. W S.11* N. ARIZONA L o57 S.18 Wisconsin W u49 S.25* OREGON W o55 O.02 Oregon St. W +3' o54' O.09* Washington W +1' u59 O.23 California L o51 O.30* WASH. ST.# W -21' 42-0 u57' N.06* Southern Cal W +5' o60' N.13* STANFORD W u59' N.26 UCLA W -12' o48' D.02* Arizona-OT W o56

Data Collection Collected data from 7861 college football games from 2011 to Stopped at 2001 since data started getting sketchy. Have data for NFL, too; haven’t run it yet. Converted and cleaned data into MS Excel format. Data contains teams, date, point spread, over/under and actual score of the game. 10/14/2012INFORMS 2012

Data Highlights Lowest point spread = 0, highest point spread = 56.5 points (Louisiana Monroe at Florida 2001) Most common point spreads: 2.5 to 3.5, approximately a field goal 6.5 to 7.5, approximately a touchdown Biggest upset: Stanford beats USC (2007) despite being 40.5 point underdog. Lowest over/under = 34 (Ohio State at Penn State 2004), highest over/under = 83 (Tulsa at Rice 2007) Most common over/under: 47 points. 10/14/2012INFORMS 2012

Point Spreads and Over/Under The point spread (or Las Vegas odds) is a handicap for the underdog. The casinos want half the betting on one side and half on the other. The line can and does change. The Arizona Cardinals are a 4.5 point favorite over the Buffalo Bills later today. The over/under is a wager on the total points to be scored in a game. The over/under in the Arizona/Buffalo game is 43 points. 10/14/2012INFORMS 2012

Are the Point Spread and Actual Point Differential Correlated? 10/14/2012INFORMS 2012

Are the Point Spread and Actual Point Differential Correlated? 10/14/2012INFORMS 2012 This team was favored by 34 and won by 72.

Are the Point Spread and Actual Point Differential Correlated? 10/14/2012INFORMS 2012 This team was a 29 point underdog but won by 14.

Are the Point Spread and Actual Point Differential Correlated? 10/14/2012INFORMS 2012

So, Let’s Estimate! Given the point spread and over/under, predict the probability that the favorite wins the game. 1. Use point spread alone. 2. Use some combination of point spread and over/under. Spread Percentage = (point spread/(over/under)) Point spread and over/under as separate variables. 3. What model will fit the curve? Picture the graph from p =0 to very high p. Use logistic regression. 10/14/2012INFORMS 2012

Point Spread vs. Probability of Win 10/14/2012INFORMS 2012

Point Spread vs. Probability of Win 10/14/2012INFORMS 2012

Point Spread vs. Probability of Win 10/14/2012INFORMS 2012 n = 103, p = 21 points, prob = 94.2%

Point Spread vs. Probability of Win 10/14/2012INFORMS 2012 n = 7, p = 40.5 points, prob = 85.7%

Point Spread vs. Probability of Win 10/14/2012INFORMS 2012

Spread % vs. Probability of Win 10/14/2012INFORMS 2012 Recall that the Spread % is the point spread divided by the over/under. So, in a game with a 5 point spread and over/under of 50, the Spread % is 10%.

Spread % vs. Probability of Win 10/14/2012INFORMS 2012 n = 212, s% = 10%, prob = 61.8%

Spread % vs. Probability of Win 10/14/2012INFORMS 2012 n = 142, s% = 28%, prob = 80.3%

Spread % vs. Probability of Win 10/14/2012INFORMS 2012 n = 16, s% = 67%, prob = 93.8%

Spread % vs. Probability of Win 10/14/2012INFORMS 2012 The best fitting line, forcing the probability at x = 0 to be 50%, is y = 1/(1+e -z ) with z = 5.916x. But just look at the graph, it fits!!!

Which is Better? Point Spread: r = Spread %: r = /14/2012INFORMS 2012

Two Variables: Point Spread and Over/Under Set up a logistic regression with both point spread p and over/under o/u. Not sure how to force probability to 50% for p = 0 with two variables. Over/under is not really significant, but included it anyway. Tried several combinations: o/u, o/u – p, p/(o/u), etc. Best fit: y = 1/(1+e -z ), z = 0.128(p) (o/u) /14/2012INFORMS 2012

Linear estimator: Point Spread 10/14/2012INFORMS 2012 Three lines: p ≤ 10 → prob ≈ 50% + 3%p 10 < p ≤ 30 → prob ≈ 80% + 1%p p > 30 → prob ≈ 100%

Linear Estimator: Spread % 10/14/2012INFORMS 2012 Two lines: s% ≤ 50% → prob ≈ 50% + s% 50% < s% ≤ 100% → prob ≈ 100%

Further Research Analysis on NFL data. Factor in home/away games. Compare probabilities to Las Vegas money lines: Accuracy? Advantage? The Arizona Cardinals are at -220 to win today. (The Buffalo Bills are at +190.) 10/14/2012INFORMS 2012

Show Estimator in MS Excel 10/14/2012INFORMS 2012