Kurt RoutleyOliver SchulteTim SchwartzZeyu ZhaoSajjad Gholami.

Slides:



Advertisements
Similar presentations
Todd W. Neller Gettysburg College
Advertisements

11 Planning and Learning Week #9. 22 Introduction... 1 Two types of methods in RL ◦Planning methods: Those that require an environment model  Dynamic.
Announcements  Homework 3: Games  Due tonight at 11:59pm.  Project 2: Multi-Agent Pacman  Has been released, due Friday 2/21 at 5:00pm.  Optional.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
CS 4700: Foundations of Artificial Intelligence Bart Selman Reinforcement Learning R&N – Chapter 21 Note: in the next two parts of RL, some of the figure/section.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Reinforcement Learning
Excursions in Modern Mathematics Sixth Edition
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Planning under Uncertainty
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Discretization Pieter Abbeel UC Berkeley EECS
Planning to learn. Progress report Last time: Transition functions & stochastic outcomes Markov chains MDPs defined Today: Exercise completed Value functions.
1 Machine Learning: Symbol-based 9d 9.0Introduction 9.1A Framework for Symbol-based Learning 9.2Version Space Search 9.3The ID3 Decision Tree Induction.
Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Performance Evaluation of Computer Networks: Part II Objectives r Simulation Modeling r Classification of Simulation Modeling r Discrete-Event Simulation.
Beyond +/-: A Rating System to Compare NHL Players Dennis F. Lock Michael E. Schuckers St. Lawrence University.
Reinforcement Learning
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Meng Yang 02/27/2014. Summary The paper was written by Babatunde Buraimo and Rob Simmons. Their research interests include audience demand, sports broadcasting,
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
HUMAN RESOURCE PLANNING BU354 Lecture 7. HR Planning Activity ◦ Your group has been hired to act as the GMs of the Toronto Maple Leafs ◦ Your first task.
Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Markov Cluster (MCL) algorithm Stijn van Dongen.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
INTRODUCTION TO Machine Learning
Kurt RoutleyOliver SchulteTim SchwartzZeyu Zhao Computing Science/Statistics Simon Fraser University Burnaby-Vancouver, Canada.
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
MDPs (cont) & Reinforcement Learning
Functional Data Approach to Longitudinal Modeling in the National Hockey League Matthew J. Valente, David P. MacKinnon, and Hye Won Suk Arizona State University.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
1-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
Controlling for Context S. Burtch © Traditional Points Plus/Minus Faceoffs Real Time Stats (hits, blocked shots, takeaways/giveaways) Ice Time.
What Is The Most Important Statistic In The NHL? Does a higher Corsi%, Save %, or Shot % increase your chances of winning a Stanley Cup?
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
The Next Frontier: How x/y tracking data is a game-changer for NHL analytics Presented by:Marc Appleby President, Solution Architect PowerScout Sports.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
Stochastic tree search and stochastic games
Multilevel Modeling in Hockey Analytics: Untangling Individual and Team Performance In Even-Strength, Power Play, and Short Handed Situations Sophie Jablansky.
Passing Networks in Hockey RIT Analytics Conference 2015 S Burtch
A Descriptive Model for NBA Player Ratings
POMDPs Logistics Outline No class Wed
A Crash Course in Reinforcement Learning
Analytics and OR DP- summary.
Move or Die: How Ball Movement Creates Open Shots in the NBA
School of Computing Science
CS 4700: Foundations of Artificial Intelligence
Analysis of MLS Season Data Using Poisson Regression with R
CS 188: Artificial Intelligence
Model Trees for Identifying Exceptional Players in the NHL Draft
Game Changing Hockey Intelligence
CS 188: Artificial Intelligence Spring 2006
Reinforcement Nisheeth 18th January 2019.
Markov Decision Processes
Markov Decision Processes
Meta-metrics to Quantify Properties of Quarterback Statistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Kurt RoutleyOliver SchulteTim SchwartzZeyu ZhaoSajjad Gholami

 North American Sports: $485 Billion$485 Billion  Sports Analytics: ◦ growing in industry.  $72.5M Investment in Hudl.Hudl ◦ growing in academia.  #Sports Analytics papers = 7x #applied operations research papers.  AI ◦ modelling and learning game strategies. ◦ multi-agent systems. ◦ structured data. 2/68 Cochran, J. J. “The emergence of sports analytics” Analytics, 2010, Coleman, B. J. “Identifying the players in sports analytics” Research Interfaces, 2012, 42,

3/68 Reinforcement Learning Sports Analytics on-line intro text by Sutton and Barto

4/68

5/68 Sports Analytics Evaluate Player/Team Performance Predict Match Outcomes Identify strengths, weaknesses Advise on drafts, trades

6/68 Evaluate Player/Team Performance Action Value Counts Issues entails transitivity interpretable? considers final results only Latent Strength Model Chess: Elo Rating Gaming: MS TrueSkillTrueSkill

Olympics 2010 Golden Goal 7/68 Issues for action values:  Common scale for all actions  Context-awareness  Lookahead

 Sabermetrics in Baseball Sabermetrics in Baseball  +/- Score in ice hockey  nhl.com nhl.com  Advanced Stats Advanced Stats 8/68

Search 9/68

 Many areas of AI and optimization involve lookahead.  In AI this is called search.  Example: GPS route planning. 10/68

 Backgammon  AlphaGo!  Chess. chessbase.com/js /apps/MyGames/ chessbase.com/js /apps/MyGames/ 11/68

 Markov Chain Demo Markov Chain Demo  our nhl model > 1M nodes  Solving a Markov Decision Process ◦ Value Iteration Demo Value Iteration Demo 12/68

 How much does the action change the expected reward at the current state?  Example: how much does the action change the chance of winning at the current state? 13/68 Expected reward after action Expected reward before action

14/68

 Transition graph with 5 parts: ◦ Players/Agents P ◦ States S ◦ Actions A ◦ Transition Probabilities T ◦ Rewards R  Transitions, Rewards depend on state and tuple of actions, one for each agent. 15/68 Littman, M. L. (1994), Markov games as a framework for multi-agent reinforcement learning, in ’ICML', pp

16/68 GD = Goal Differential MP = ManPower PR = Period CV = chance that home team scores next goal

17/68

18/68 GD = Goal Differential MP = ManPower PR = Period CV = chance that home team scores next goal

19/68

20/68 GD = Goal Differential MP = ManPower PR = Period CV = chance that home team scores next goal

21/68

22/68

23/68

24/68

25/68

26/68

27/68

28/68

 Players in our Markov game = {Home, Away}.  Models average or random player. 29/68

 Context Features ◦ Goal Differential GD ◦ Manpower Differential MD ◦ Period PR 30/68

 13 Action Types  Action parameters: team, location. ◦ faceoff(Home,Neutral) ◦ shot(Home,Offensive) ◦ hit(Away,Defensive 31/68 Action Types Blocked Shot Faceoff Giveaway Goal Hit Missed Shot Shot Takeaway...

 Use action description notation (Levesque et al, 1998) ◦ Actions written in form a(T,L)  Action a  Team T  Location/Zone L ◦ faceoff(Home,Neutral) ◦ shot(Home,Offensive) ◦ hit(Away,Defensive) 32/68

 Transition probabilities are estimated from observances in play-by-play data ◦ Record occurrences of state s as Occ(s) ◦ Record occurrences of transition as Occ(s,s’) ◦ Parameter Learning.  Transition probabilities T estimated as Occ(s,s’) / Occ(s). 33/68

 Goals ◦ R(s) = 1 if s corresponds to a goal(Home,*) ◦ R(s) = -1 if s corresponds to a goal(Away,*) ◦ R(s) = 0 otherwise  Penalties ◦ R(s) = 1 if s corresponds to a penalty(Home,*) ◦ R(s) = -1 if s corresponds to a penalty(Away,*) ◦ R(s) = 0 otherwise  Wins ◦ R(s) = 1 if s corresponds to a Win(Home) ◦ R(s) = -1 if s corresponds to a Win(Away) ◦ R(s) = 0 otherwise 34/68

35/68  Basketball Demo - Open in Chrome Basketball Demo - Open in Chrome

The Data 36/68

 Complete Tracking: which player is where when. Plus the ball/puck. ★  Box Score: Action Counts.  Play-By-Play: Action/Event Sequence. 37/68

 Basketball Example from SportsVUSportsVU  Coming to the NHL? 38/68

 Oilers vs. Canucks Oilers vs. Canucks 39/68

 Successive Play Sequences Successive Play Sequences 40/68

Source: nhl.com No Locations 41/68 NHL.com Teams32 Players1,951 Games9,220 Events2,827,467 SportLogiq Teams32 Players2,233 Games446 Events1,048,576 Source: SportLoqigSportLoqig 2015 Action Locations

 Basic question: What difference does an action make?  Quantify effect of action on outcome (goal) = action value.  Player contribution = sum of scores of player’s actions. ◦ Schuckers and Curro (2013), McHall and Scarf (2005; soccer).  Example: +/- Score in ice hockey ◦ nhl.com Advanced Stats nhl.comAdvanced Stats Schuckers, M. & Curro, J. (2013), Total Hockey Rating (THoR): A comprehensive statistical rating of National Hockey League forwards and defensemen based upon all on-ice events, in '7th Annual MIT Sloan Sports Analytics Conference’.

Computation 43/68

 V(s) = Expected reward starting in state s 44/68 RewardAbsorbing StatesQ(s) represents WinGame EndWin Probability Differential GoalsGame EndExpected Goal Differential GoalsGame End + GoalsNext Goal Probability Differential PenaltiesGame EndExpected Penalty Differential PenaltiesGame End + PenaltiesNext Penalty Probability Differential

 Iterative Value function computation (on policy) for i=1,...,h steps.  h is the lookahead horizon 45/68 Immediate reward Prob. of Action Expected Future Reward given action and state

46/68 Cervone, D.; D’Amour, A.; Bornn, L. & Goldsberry, K. (2014), POINTWISE: Predicting points and valuing decisions in real time with NBA optical tracking data, in MIT Sloan Sports Analytics Conference

Examples 47/68

48/68 Immediate reward Expected Future Reward given action and state

We discretize locations by clustering the points at which a given action occurs. Example: 49/68

50/68

 Average values of actions at location, over all states and both teams. 51/68 Action = shot Chance of scoring the next goal lookahead = 1

52/68 Chance of scoring the next goal lookahead = 1 Chance of scoring the next goal after shot lookahead = 14

53/68 Which is better? Figure by Shaun Kreider, Kreider Designs.

54/68 Chance of scoring the next goal after carry Chance of scoring the next goal after dump-in

55/68

expected reward after action 56/68 expected reward before action

Players 1.Apply the impact of an action to the player performing the action 2.Sum the impact of his actions over a game to get his net game impact. 3.Sum the net game impact of a player over a single season to get his net season impact. 57/68 Teams Sum the impact of all players.

Compare  average impact of team in game (our model)  average goal ratio of team in game (independent metric). 2-1 = 4-2 = 6-3  Correlation = 0.7! 58/68

 Commonly used (e.g. Financial Times) Commonly used  Correlation only /68

60/ no location data

61/20 Jason Spezza: high goal impact, low +/-. plays very well on poor team (Ottawa Senators). Requested transfer for season.

62/68 Correlation coefficient = Follows Pettigrew(2015) Pettigrew, S. (2015), Assessing the offensive productivity of NHL players using in-game win probabilities, in '9th Annual MIT Sloan Sports Analytics Conference'.

63/ no location data

 Built state-space model of NHL dynamics.  The action-value function in reinforcement learning is just what we need.  Incorporates ◦ context ◦ lookahead  Familiar in AI, revolutionary in sports analytics! 64/68

 State-space Markov game model for ice hockey dynamics in the NHL.  A new context-aware method for evaluating locations, all actions and players.  “We assert that most questions that coaches, players, and fans have about basketball, particularly those that involve the offense, can be phrased and answered in terms of EPV [i.e. the value function].” Cervone, Bornn et al /68

Thank you – any questions? 66/68

Cervone, D.; D’Amour, A.; Bornn, L. & Goldsberry, K. (2014), POINTWISE: Predicting points and valuing decisions in real time with NBA optical tracking data, in MIT Sloan Sports Analytics Conference. Routley, K. & Schulte, O. (2015), A Markov Game Model for Valuing Player Actions in Ice Hockey, in 'Uncertainty in Artificial Intelligence (UAI)', pp /68

68/68

69/68

 No ground truth. ◦ Relate to predicting something (?) ◦ Break down into strong and weak contexts?  Compare Apples-to-Apples. ◦ Cluster players by position. ◦ Learn player clusters. ◦ Interesting ideas in Cervone et al /68 Cervone, D.; D'Amour, A.; Bornn, L. & Goldsberry, K. (2014), 'A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes', ArXiv e-prints.

 Extract patterns about which actions have the most impact when. 71/68

 Fit parameters for each player (cricket, baseball, basketball).  Smooth towards similar players when a player visits a state rarely.  Combine reinforcement learning with clustering agents? 72/68

 Game Clock.  Penalty Clock.  Player, puck location (eventually).  Can we take existing RL off the shelf? ◦ E.g. continuous finite-time horizon?continuous finite-time horizon ◦ Spatial Planning? Spatial Planning? ◦ RL with both continuous time and space? 73/68