GameRank: Ranking and Analyzing Baseball Network Zifei Shan, Shiyingxue Li, Yafei Dai

Slides:



Advertisements
Similar presentations
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Advertisements

Review of Identifying Causal Effects Methods of Economic Investigation Lecture 13.
Analysis and Modeling of Social Networks Foudalis Ilias.
Nate Silver, Baseball Prospectus,
Computer Systems Lab TJHSST Current Projects In-House, pt 6.
Baseball Statistics By Krishna Hajari Faraz Hyder William Walker.
Circle Baseball – get everyone involved Cone out a circle with a 20y diameter Place 5 cones (bases) around the edge of the circle Teacher (T) stands on.
Baseball is a bat and ball sport played between two teams of nine players each. The goal is to score runs by hitting a thrown ball with a bat and touching.
Predicting Outcome of NBA Games Using Artificial Neural Network
An Analysis of Social Network-Based Sybil Defenses Sybil Defender
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Maggie Zhou COMP 790 Data Mining Seminar, Spring 2011
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Music Recommendation By Daniel McEnnis. Outline Sociology of Music Recommendation Infrastructure –Relational Analysis Toolkit Description Evaluation –GATE.
Calculating Baseball Statistics Using Algebraic Formulas By E. W. Click the Baseball Bat to Begin.
Objective: To test claims about inferences for proportions, under specific conditions.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Data Mining Techniques
Supporting the Automatic Construction of Entity Aware Search Engines Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Dipartimento di Informatica.
Two Sample Project Example 5/6/2013 Ms. Browne made this up Saber metrics: TX Rangers vs. SF Giants.
Social Network Analysis: What it Is, How it Works, and How You Can Do It Prof. Paul Beckman San Francisco State University.
Major League Baseball has been around since 1869 and is considered America’s Pastime because of this. Baseball is also the only major sport in America.
Sabermetrics- Advanced Statistics in the MLB. More On Base Percentage (OBP) measures the most important thing a batter can do at the plate: not make.
HadoopDB Presenters: Serva rashidyan Somaie shahrokhi Aida parbale Spring 2012 azad university of sanandaj 1.
WAES 3308 Numerical Methods for AI
1919 New York Yankees Official Logo William Brennan Sports Finance February 6, 2014.
Case 2: Assessing the Value of Alex Rodriguez Teresa Sonka Gail Bernstein.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
Collusion-Resistance Misbehaving User Detection Schemes Speaker: Jing-Kai Lou 2015/10/131.
By: Madison Kerr and Nina Zimmerman. Background  16 MLB teams: 8 in the National League 8 in the American League  White Sox Won World Series 4 – 2 
A Graph-based Friend Recommendation System Using Genetic Algorithm
Scheduling the Optimal Baseball Line-up Stefanie Molin Christian Morales Sarah Daniels.
Coevolution Chapter 6, Essentials of Metaheuristics, 2013 Spring, 2014 Metaheuristics Byung-Hyun Ha R2R3.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Convergence of PageRank and HITS Algorithms Victor Boyarshinov Eric Anderson 12/5/02.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Chapter 8: Introduction to Probability. Probability measures the likelihood, or the chance, or the degree of certainty that some event will happen. The.
JEHN-RUEY JIANG, GUAN-YI SUNG, JIH-WEI WU NATIONAL CENTRAL UNIVERSITY, TAIWAN PRESENTED BY PROF. JEHN-RUEY JIANG LOM: A LEADER ORIENTED MATCHMAKING ALGORITHM.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Expanding Plus- Minus for Visual and Statistical Analysis of NBA Box-Score Data IEEE VIS 2013 Workshop on Sports Data Visualization Rob Sisneros, Mark.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
An Effective Method to Improve the Resistance to Frangibility in Scale-free Networks Kaihua Xu HuaZhong Normal University.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.
Riza Erdem Jappie Klooster Dirk Meulenbelt EVOLVING MULTI-MODAL BEHAVIOR IN NPC S.
MIS 480/580 Final Project Presentation Knowledge Management in Cricket – A Research Project By: Luis Barreda Deepika Nim Jagadish Ramamurthy James Sanford.
9-3 Testing a proportion. What if you wanted to test - the change in proportion of students satisfied with the french fry helpings at lunch? - the recent.
IHSAA Tournament Officials Rating System The ranking of contest officials applying for an IHSAA tournament series event shall have two components: 1.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
An AI Game Project. Background Fivel is a unique hybrid of a NxM game and a sliding puzzle. The goals in making this project were: Create an original.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
By: Jonathan Judge GLASC 2017
The Baltimore Orioles, Relationship of Wins and Loses, Batting Average, Earned Run Average, and Errors Stalanic Anu, Matthew Beeman, Jonathon Chudoba,
The PageRank Citation Ranking: Bringing Order to the Web
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
Bellwork Suppose that the average blood pressures of patients in a hospital follow a normal distribution with a mean of 108 and a standard deviation of.
CASE − Cognitive Agents for Social Environments
The Math of Baseball Will Cranford 11/1/2018.
Exploring Numerical Data
Presentation transcript:

GameRank: Ranking and Analyzing Baseball Network Zifei Shan, Shiyingxue Li, Yafei Dai

Outline Introduction Algorithm and Evaluation Analysis and Visualization Future work and conclusion

INTRODUCTION

Background A baseball game: –two teams, take turns to attack and defend. –Players are batters in attacking phase, and pitchers/fielders in defending phase. Major League Baseball: the most attendance of any sports league. More than 70 million fans. Most previous research focuses on game video analysis. Full game records available on the Internet.

Questions How to rank baseball players? How to construct networks out of baseball games? What’s special of baseball networks? What can we know from baseball network analysis? How about other sports networks?

Ranking Assumption Ranking players’ pitching and batting ability separately: –a player is good at batting if he wins over good pitchers; –a player is good at pitching if he wins over good batters. A good batter doesn’t necessarily make (and usually isn’t) a good pitcher.

Traditional Rankings Traditional Baseball Ranking: –Based on statistics –Hard to reflect the relationship of players. –E.g. Batting average: Hits / at bats

AVG: 0.66 AVG: 0.66 Stonger Weaker

So we want a model to take the relationships between players into consideration --- A network.

Network Construction Nodes  Players –Two attributes: pitching ability, batting ability –A player can be a pitcher as well as a batter Links  Win-lose relationships between players –Two types of links: Pitching link A->B: A wins B when A is pitching Batting link A->B: A wins B when A is batting

P P P Offensive

Player Ranking: PageRank? PageRank? Fail to separate two abilities: only have one indicator! See sample:

Got a PR for each player. How to see their Pitching / Batting ability separately?

Player Ranking: Two PageRanks? Separate PageRank in two networks? Fail to describe the interplay between pitching and batting! See the following Sample:

A A B B Z Z C C Y Y X X

Cannot distinct batters’ abilities!

Player Ranking: HITS? We need a stronger ranking algorithm! HITS! –HITS: Hubs and authorities in Web Good hubs links to good authorities Good authorities are linked by good hubs –Similarly, baseball network: Good pitchers wins good batters Good batters wins good pitchers

Why not use HITS? We want two indicators that has sound probabilistic meaning. A random walk model like PageRank!

ALGORITHM: GAMERANK

GameRank: Overview We use the intuition of HITS, and build random walk models across the two (pitching and batting) networks.

Intuition: Random Walk Random walk in baseball (teams) network: A baseball fan Ellie is trying to find the strongest player, by watching single plays through win-over relation (pitching/batting links) of players. She starts randomly from batter A, and randomly picks a pitcher B who has won over A. And pick batter C who has won over pitcher B, etc. If she finds a batter (pitcher) X that no one wins X, she will jump to a random pitcher (batter). Sometimes she gets bored with the batter (pitcher) she’s currently watching, and randomly picks another pitcher (batter). We can calculate The probability that she is watching a batter/pitcher after a long time = The frequency that she watches the player after a long time

A A B B Y Y X X GRIt1BIt1PIt2BIt2PIt3BIt3P BA PA BB PB BX PX BY PY

Definition Our formula: β = 0.15

For Weighted Network Add edge weights –By modifying edge weights, we can make the rankings more precise with domain-specific knowledge

Formula for weighted network

Computation Start from a initial distribution, then iterately calculate GRB, GRP based on above formula. Will converge no matter what the initial distribution looks like. Can be easily parallelized with MapReduce model, similar to PageRank.

EVALUATION

Evaluation We evaluate our ranking algorithm in real- world, open-source MLB game records on retrosheet.org. We compare our result to ESPN Ratings, a prestigious ranking system.

Network of MLB data Pick year 2011 for evaluation –1295 nodes –~80000 aggregated edges Generate rankings for pitchers and batters with GameRank for 2011 Get the ESPN ranks for 2011 from Internet

ESPN Ratings Algorithm ESPN Ratings uses a complex set of statistics. –E.g. the ESPN rating of batters includes the following factors: batting bases accumulated, runs produced, OBP, BA, HRs, RBIs, runs, hits, net steals, team win percentage, difficulty of defensive position, etc. –Hard to reflect relationships between players Not every player can get a ESPN score.

Comparison: Ranked Players Ranking Algorithm Ranked Batters Ranked Pitchers GameRank ESPN310161

Comparison: top players Top batters and pitchers found by GR, and their ESPN ranks.

Comparison: Difference BattingPitching (Scatter of difference between GR and ESPN)

Comparison: Abs. Difference CDF

More comparison We already see that GR rankings achieves similar results with ESPN rankings. Now we want to prove that GR has better results than ESPN, with the intuition: players with better rankings should have higher probability to win in games. –if a ranking system is good, then under this system: Pitchers with high ranks are more likely to win than pitchers with low ranks, and vise versa. Pitchers at similar ranks are more likely to win batters with low ranks than with high ranks.

Comparison: Wining Rate GR RankESPN Rank Frequency for pitchers to win batters at different rank levels in GameRank/ESPN. Pitcher ranks are divided by 10; batter ranks are divided by 20.

Evaluation: Conclusion GameRank achieves at least similar results with ESPN rankings GameRank is even better than ESPN in terms of batting rankings, if we set the criteria as wining frequency. GameRank can rank more (all) players. GameRank has a stronger model considering relationships between players.

ANALYSIS / DATA MINING

Analysis conclusions We analyze the networks with GR ranks, and found interesting results: –By studying the network’s out-degree distribution in different years, we found that recent players are getting closer in their skills than before. –By analyzing the pitchers’ GR batting values, we found that: good pitchers are better than normal pitchers at batting. Some bottom pitchers are great batters, because they do not usually pitch.

Analysis: out-degree distribution

Analysis: Pitchers’ batting ability Better pitchers bat better.

Analysis: bottom pitchers who bats well Among the bottom pitchers, there are 7 pitchers who bats really well. –We manually check them and found: most of them do not take pitchers as their major fielding positions, although they once pitched in 2011 regular season.

VISUALIZATION: MLBILLUSTRATOR

Visualization We built an online website MLBillustrator to visualize the network and GameRank values for players: – Then we do simple and initial analysis based on visualization.

Visualization

Visual Analysis In every year, the network consists of two large communities. –Because in MLB there is American League (AL) and National League (NL), and the two clusters are almost exactly AL and NL communities. Both AL and NL play more inside themselves, but less across leagues. Players in the middle of two communities: change teams across the league during the year.

OTHER USE CASES / FUTURE WORK / CONCLUSION

Other Use Cases GameRank algorithm is applicable for ranking networks with multiple indicators interplaying with each other. Other sports networks –Soccer –Volleyball –Basketball

Future work More analysis: find players that are overvalued/undervalued, etc. Test the robustness of each team in the network of in-team supports. Put players and teams into one heterogeneous network, and discover relationships between players and teams. Use specific knowledge in baseball games to optimize the parameters (edge weights).

Contribution We propose a ranking algorithm for networks with multiple indicators interplaying with each other. We initially regard baseball games as a network, and rank the pitching and batting ability of players. We analyze the baseball network and find interesting results.