Presentation is loading. Please wait.

Presentation is loading. Please wait.

Huyen Nguyen, Dung Phan, and Girish Shirodkar

Similar presentations


Presentation on theme: "Huyen Nguyen, Dung Phan, and Girish Shirodkar"— Presentation transcript:

1 Examining factors that influence English Premier Soccer Results Using JMP® Pro 11
Huyen Nguyen, Dung Phan, and Girish Shirodkar Oklahoma State University, Stillwater, OK 74078 Introduction Soccer is the most popular sport in the world with more than millions players in over 200 countries. English Premier League is broadcasted in 212 territories to 643 million homes and 4.7 billion TV audience. It is therefore of great general importance to determine what attributes drive English Premier League game results. Very few concrete studies have been done to explore the influencial factors to soccer game results. This study, which is based on 10 annual seasons of English Premier League games data, attempts to explore from the perspective of Home Teams. JMP ® Pro 11 is utilized for data preparation, data analysis, and predictive modeling. Fig. 1b: Forward Logistics Regression Odds Ratios Fig. 2a: Neural Network Confusion Matrix Fig. 2b: Neural Network Fig. 1a: Forward Logistics Regression Confusion Matrix Data Preparation The English Premier League games dataset consists of observations and 23 variables. The target variable Home Team Results is derived from the two variables: Full Time Home Goal and Full Time Away Goal. It is a binary variable, with 0 meaning Home Team loses or draws a tie, and 1 meaning Home Team wins. Using JMP ® Pro 11 the data were consolidated and prepared before Predictive Modeling were utilized. Variable Selection were performed using domain knowledge and statistical methods. 21 key variables were selected. Predictive Modeling Predictive models including Stepwise Logistics Regression Model, Forward Logistics Regression Model, Decision Tree and Neural Network have been used and competing models were analyzed and compared with each other. Fig. 3b: Decision Tree Confusion Matrix Fig. 3a: Decision Tree

2 Examining factors that influence English Premier Soccer Results Using JMP® Pro 11
Huyen Nguyen, Dung Phan, and Girish Shirodkar Oklahoma State University, Stillwater, OK 74078 Model Misclassification rate Generalized R square AICc BIC Logistic Regression 1 22.15% 48.13% 1056.5 Logistic Regression 2 22.88% 47.56% 1099.7 Decision Tree 21.78% 46.58% N/A Neural Network 20.92% 53.00% Based on Misclassification Rate Criterion, Stepwise Logistics Regression Model outperforms other models with Misclassification rate of 22.15% . Stepwise Logistics Regression Model points out that factors such as Half Time Home Goal, Half Time Away Goal, Home Team Red Cards, Away Team Red Cards, Home Team Shots, Away Team Shot are the most important predictors in determining game results of English Premier League. Stepwise Logistics Regression Model yeilds a sensitivity of 86.20%, and a speficity of 87.18%. Fig. 4c: Stepwise Logistics Regression model results Conclusion and Discussion Stepwise Logistics Regression Model is selected as the final model. Half Time Home Goal, Half Time Away Goal, Home Team Red Cards, Away Team Red Cards, Home Team Shots, Away Team Shot are the most important predictors in determining game results of English Premier League. It is feasible to predict with high accuracy game results after the first half of the game. Fig. 4b: Stepwise Logistics Regression model results The effects of influential factors to the Soccer Game results can be quantified. For each additional goal Away Team scores by the second half of the game, they stand 264% more chance of winning, whereas for each additional goal Home Team scores, the chance of losing or calling it a tie only decreases by 79.3%. The same pattern is also observed in the effects of Red Cards on the full time results of the game. If Home Team gets an additional Red Card, the chance of losing or calling it a tie goes up by 122% while it is 32.3% for Away Team. Reference The differences in how these factors drive the results of the games can be put down to the influence of Home Playground. Whereas Home Teams have certain advantage of playing on their stadium, the quantified effects mentioned above point to the fact that Home Team is also under more pressure, therefore the effects of Half Time Goal and Red Card are diluted when it comes to Home Team. Acknowledgements Dr. Goutam Chakraborty, founder of SAS and OSU Business Analytics Program at Oklahoma State University, for his continued support and guidance. Fig. 4a: Stepwise Logistics Regression ROC


Download ppt "Huyen Nguyen, Dung Phan, and Girish Shirodkar"

Similar presentations


Ads by Google