SVD, PCA, AND THE NFL By: Andrew Zachary
Introduction What is PCA? What is SVD? What is the NFL? Principal Component Analysis What is SVD? Singular Value Decomposition What is the NFL? National Football League
SVD 𝐴=𝑼𝚺 𝐕 ∗
Computing the SVD Matrices Consider, 𝐴 𝐴 𝑇 = 𝑈Σ 𝑉 ∗ 𝑈Σ 𝑉 ∗ 𝑇 = 𝑈Σ 𝑉 ∗ 𝑉Σ 𝑈 ∗ =𝑈 Σ 2 𝑈 ∗ 𝐴 𝐴 𝑇 𝑈=𝑈 Σ 2 And, 𝐴 𝑇 𝐴= 𝑈Σ 𝑉 ∗ 𝑇 𝑈Σ 𝑉 ∗ =𝑉Σ 𝑈 ∗ 𝑈Σ 𝑉 ∗ =𝑉 Σ 2 𝑉 ∗ 𝐴 𝑇 𝐴𝑉=𝑉 Σ 2
Eigenvalue Problem 𝑈=𝜆 Σ= 𝜎 1 0 0 𝜎 2 Where 𝜆 is the eigenvectors and 𝜎 1 and 𝜎 2 are the eigenvalues of 𝐴 𝐴 𝑇 𝑈=𝑈 Σ 2 Similarly, 𝑉=𝜆 Where 𝜆 is the eigenvectors and 𝜎 1 and 𝜎 2 are the eigenvalues of 𝐴 𝑇 𝐴𝑉 =𝑉 Σ 2
Covariance Matrix 𝐶 𝑋 = 1 𝑛−1 𝑋 𝑋 𝑇 Where X is a matrix of data and n is the number of data points. Elements from the covariance matrix Diagonals → Variance (dynamics) Off-diagonals → Covariance (redundancy)
PCA Ideal basis No Redundancy Diagonalization
NFL NFL Ranking Fantasy Ranking PCA Ranking Single stat Several Stats (Position Dependent) PCA Ranking All available NFL stats
NFL Correlation Matrix
Correlation vs. Covariance 𝜌 𝐴,𝐵 = 1 𝑁−1 𝑖=1 𝑁 𝐴 𝑖 − 𝜇 𝐴 𝜎 𝐴 𝐵 𝑖 − 𝜇 𝐵 𝜎 𝐵 , 𝑃𝑒𝑎𝑟𝑠𝑜𝑛 𝐶𝑜𝑒𝑓. 𝜌 𝐴,𝐵 = 𝑐𝑜𝑣 𝐴,𝐵 𝜎 𝐴 𝜎 𝐵 𝑅= 𝜌(𝐴,𝐴) 𝜌(𝐴,𝐵) 𝜌(𝐵,𝐴) 𝜌(𝐵,𝐵)
Singular Values PCA Projection: 𝑋 𝑝 =𝑈 Σ 𝑟𝑒𝑑𝑢𝑐𝑒𝑑
Results
Conclusions PCA is a powerful data analysis tool PCA is the ideal basis for a dataset SVD and Covariance Matrix equal PCA PCA has strong applications in sports
References “Data-Driven Modeling and Scientific Computation, Methods for Complex Systems & Big Data”, J. Nathan Kutz, 2013 “An Introductory application of Principal Components to Cricket Data”, Ananda B. W. Manage et. al., 2013