Download presentation
Presentation is loading. Please wait.
Published byArline McDaniel Modified over 9 years ago
1
A Statistical Determination of the Characteristics of Playoff Teams in The National Hockey League Dan Foehrenbach & Chris Claeys DFoehrenbach@gmail.comDFoehrenbach@gmail.com & cmclaeys@gmail.comcmclaeys@gmail.com UP-STAT 2013, April 6
2
Outline Predicting which teams make the playoffs is a difficult task, yet can be achieved using multivariate techniques Two-dimensional plot, playoff teams and non-playoff teams Unsupervised learning and supervised learning Cluster analysis Four different unique clusters Different modeling techniques to predict whether or not a given team will make the playoffs Linear discriminant analysis Current season predictions
3
Statement of the Problem The main purpose of this report is the analysis of data, its application to uncovering what makes a team successful Discovery of common trends between good and bad teams Data was collected over the past 11 years The variables chosen ensure every aspect of hockey is used within the analysis Response variable and dummy variable (Playoffs, SC)
4
Variables GGGoals Score Per Game GAGGoals Against Per Game GFARGoals For/Goals Against Ratio PPPower Play Percentage PKPenalty Kill Percentage S.GShots Per Game SA.GShots Against Per Game Sc1Win Percentage When Scoring First Tr1stWin Percentage When Trailing After 1 st Period FOFaceoff Win Percentage SVSave Percentage
5
Statistical Analysis Exploratory Data Analysis Gain an overall sense of the dataset, how playoff teams and Stanley Cup winners are dissimilar, or similar, to the rest of the teams Multidimensional scaling (11D broken down into a 2D plot of data) Modeling Stanley Cup winners would be too difficult with only 10 data points Also, gain a sense of which variables distinguished the best between playoff teams and non-playoff teams in a univariate sense
6
2D Plot of Data After Multi-Dimensional Scaling
7
Boxplots of Explanatory Variables
8
Unsupervised Learning Clustering Analysis Determine what similarities, or dissimilarities, exist among teams without any prior knowledge of group membership Clear statistical distinctions that can separate teams into 2 or more groupings – playoff and non-playoff Hierarchical clustering using Ward’s method (to determine the cluster means) Refinement of Ward clusters using kMeans clustering Four clusters yielded the most useful interpretations
9
Number of Clusters
10
Cluster Interpretations and Means Cluster 1: Teams that have a low win percentage and are offensive oriented Cluster 2: Teams that have a high win percentage and are offensive oriented Cluster 3: Teams that have a low win percentage and are defensive oriented Cluster4: Teams that have a high win percentage and are defensive oriented Cluster Means
11
kMeans Clusters Using Centroids from Ward Clusters There is a clear distinction between the 4 clusters in only 2 dimensions The majority of the playoff teams are coming from the clusters with higher win percentage The playoff teams from clusters 2 and 3 tend to be borderline with cluster 1 or 4. This implies that these teams are likely “bubble” teams – those teams that are ranked 7 th /8 th in their respective conferences
12
kMeans Clusters Using Centroids from Ward Clusters
13
Stanley Cup Winners The Stanley Cup champions over the past 11 years and the following subgroups the team belonged to Stanley Cup Winners Over the Past 11 Years and Cluster Membership Offensive OrientedDefensive Oriented Boston Bruins 2010Pittsburgh Penguins 2008 Anaheim Ducks 2006Chicago Blackhawks 2009 Detroit Red Wings 2007Tampa Bay Lightning 2003 Carolina Hurricanes 2005New Jersey Devils 2002 Colorado Avalanche 2000Detroit Red Wings 2001 Los Angeles Kings 2011
14
Cluster Summary There is a clear statistical distinction between teams with high and low win percentages as well as teams that are more offensive and defensive oriented Cluster membership also distinguishes very well between playoff teams and non playoff teams The largest discriminating criterion between clusters seems to be win percentage metrics, scoring/goals allowed metrics and offensive/defensive metrics
15
Supervised Learning Model Comparisons Seek to find a model that best predicts whether or not a given team will make the playoffs Several different model types were considered: ◦ Linear Discriminant Analysis ◦ Logistic Analysis ◦ PCA Logistic Analysis ◦ Kernel Support Vector Machine ◦ Recursive Partitioning ◦ Random Forests
16
Training and Test Sets The data were split into training and test sets – each model was built on the training set and predictions were then made on the test set The accuracy of each model was calculated as the number of correct predictions made Random data points were selected to maintain independence of observations 100 iterations of the training/test splits
17
Accuracy Comparison Between Different Models
18
Mean Accuracy There are similar mean accuracy for certain models over the 9 iterations of training and test Given that model and variable interpretation is difficult with kSVM, LDA was chosen as the most useful model going forward Mean Accuracy LDA0.89 Log0.88 Log-PCA0.89 kSVM0.89 R.Part0.80 RF0.88
19
Linear Discriminant Analysis The final LDA model was built on the entire data set and the priors were specified as Pr[0] = 14/30 and Pr[1] = 16/30 (since we know the exact number of teams that will make the playoffs each year) Group Means for LDA Model
20
Linear Discriminant Coefficients The results agree with the univariate exploratory analysis – GFAR, GAG, Sc1, and Tr1st show some of the largest coefficients and discriminating power between two groups
21
Linear Discriminant Analysis in 2 Dimensions The result was reduced to 2 dimensions to get visual aid Discriminating line was added
22
Linear Discriminant Analysis in 2 Dimensions
23
Difficulty of Predicting Stanley Cup To predict the Stanley Cup winner an LDA model was built on the entire data set with the priors specified as Pr[0] = 29/30 and Pr[1] = 1/30 (again, we know that there is one going to be 1 Stanley Cup Winner per year) Without assessing via training and test, the difficulty of the task by making predictions on only 1 year of data (2010) shows the following:
24
Predictions for This Years Data The current season was shortened in length due to a lockout in the beginning of the year (48 games) Data was collected as of April 2, 2013 and the model was used to predict which teams will make the playoffs this year The following teams were predicted to make the playoffs: Anaheim Ducks Boston Bruins Chicago Blackhawks Detroit Red Wings Los Angeles Kings Minnesota Wild Montreal Canadians New York Rangers Ottawa Senators Pittsburgh Penguins San Jose Sharks St. Louis Blues Tampa Bay Lightning Toronto Maple Leafs Vancouver Canucks
25
Predictions for This Years Data Only 15 teams have been predicted, due to the above results of around 8% misclassification (1 team is being wrongly classified as not making the playoffs) The teams with the highest probabilities for playoff entry are the following:
26
Critical Evaluation and Future Plans There are indications that there are techniques and strategies that help a hockey team achieve success With the data that was collected, predicting who won the Stanley Cup was not achieved (additional data could have achieved something remarkable) Future plans would be to collect more data, as well as keep watch on this season and compare the results to the predicted results If all goes well and 89% of the teams predicted actually make the playoffs (especially in a shorter season), this model can be used in a way that can end up being lucrative in future endeavors
27
QUESTIONS????? DFoehrenbach@gmail.com
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.