Get to know the rating system in the model

Slides:

Advertisements

Similar presentations

6-1 Stats Unit 6 Sampling Distributions and Statistical Inference - 1 FPP Chapters 16-18, 20-21, 23 The Law of Averages (Ch 16) Box Models (Ch 16) Sampling.

Advertisements

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 16 Mathematics of Normal Distributions 16.1Approximately Normal.

Rene Plowden Joseph Libby. Improving the profit margin by optimizing the win ratio through the use of various strategies and algorithmic computations.

QR 38, 2/13/07 Rationality and Expected Utility I. Rationality II. Expected utility III. Sets and probabilities.

1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.

Today: Central Tendency & Dispersion

Measures of Central Tendency

Chapter 8 Introduction to Hypothesis Testing. Hypothesis Testing Hypothesis testing is a statistical procedure Allows researchers to use sample data to.

Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.

Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.

Central Tendency & Dispersion

Game tree search Thanks to Andrew Moore and Faheim Bacchus for slides!

Introduction to Probability – Experimental Probability.

Copyright © 2010 Pearson Education, Inc. Chapter 16 Random Variables.

Statistics 16 Random Variables. Expected Value: Center A random variable assumes a value based on the outcome of a random event. –We use a capital letter,

Mixed Strategies Keep ‘em guessing.

Demand and supply analysis – part 1

Types of risk Market risk

Chapter 15 Random Variables.

8.2 Estimating Population Means

Statistical Data Analysis - Lecture /04/03

Chapter 23 Comparing Means.

The Duality Theorem Primal P: Maximize

Game Theory Just last week:

Great Theoretical Ideas in Computer Science

Bridge (and Card Games) in General

Review You run a t-test and get a result of t = 0.5. What is your conclusion? Reject the null hypothesis because t is bigger than expected by chance Reject.

Descriptive Statistics (Part 2)

Perfect Competition: Short Run and Long Run

SCIT1003 Chapter 2: Sequential games - perfect information extensive form Prof. Tsang.

Extensive-form games and how to solve them

Understanding Randomness

Sampling Distributions

Student Activity 1: Fair trials with two dice

Madhura Gajendragadkar Tianqi Wang

Fundaments of Game Design

Slide 1: Thank you Elizabeth for the introduction, and hello everybody. So, I have been a PhD student with Charles Semple and Mike Steel at the UoC since.

Strategies for Poker AI player

C nnect 4 Group 9-18 See Zhuo Rui Jorelle 3S3 (Leader)

Types of risk Market risk

Artificial Intelligence

An Introductory Look at Curling Analytics

Read the quote and with the person next to you, discuss what you think it means. Do you agree? Why / why not? Be prepared to share your thoughts with the.

Checkers Move Prediction Algorithms

Gaussian (Normal) Distribution

ID1050– Quantitative & Qualitative Reasoning

Deal or No Deal? Fair or Not Fair?

Alpha-Beta Search.

Artificial Intelligence

Sampling Distributions

Random Variables Binomial Distributions

Alpha-Beta Search.

Extra Brownie Points! Lottery To Win: choose the 5 winnings numbers from 1 to 49 AND Choose the "Powerball" number from 1 to 42 What is the probability.

One-Way Analysis of Variance

Warm Up Imagine a family has three children. 1) What is the probability the family has: 3 girls and 0 boys 2 girls and 1 boy 1 girl and 2 boys 0 girls.

Alpha-Beta Search.

Uncertainty and Error

Team Dont Block Me, National Taiwan University

Alpha-Beta Search.

The American Statistician (1990) Vol. 44, pp

CS 416 Artificial Intelligence

Exploring Numerical Data

GAMES AND STRATEGIES.

GAMES AND STRATEGIES.

Characters The 5 Character Traits.

Type I and Type II Errors

Alpha-Beta Search.

Lesson 15-3 Decisions That Affect Net Income

Unit II Game Playing.

Chapter Outline The Normal Curve Sample and Population Probability

Presentation transcript:

Get to know the rating system in the model Glicko vs Elo Get to know the rating system in the model Paritosh Walvekar Akshay Chopra

Elo Rating System

What is Elo? Named after its creator Arpad Elo, a Hungarian-American physics professor. The Elo rating system is a method for calculating the relative skill levels of players in zero-sum games such as chess.

General Idea Everybody gets a number which is used by system to predict the odds of the player beating the other player based on difference between the player’s number and that of someone else.

If a player wins a game, he is assumed to have performed at a higher level than his opponent for that game. Conversely if he loses, he is assumed to have performed at a lower level. If the game is a draw, the two players are assumed to have performed at nearly the same level.

Major Assumption The central assumption was that the chess performance of each player in each game is a normally distributed random variable. Although a player might perform significantly better or worse from one game to the next, Elo assumed that the mean value of the performances of any given player changes only slowly over time.

Elo waved his hands at several details He did not specify exactly how close two performances ought to be to result in a draw rather than a decisive result.

How to compute Elo? Notations R-A : Rating of Player A R-B : Rating of Player B E-A : Expected outcome of player A ( Percentage of win) E-B : Expected outcome of player B (Percentage of win) K : Weighing Factor S-A : Outcome of game for player A ( win - 1, Draw 0.5, Loss-0) S-B : Outcome of game for player B R’A : New Rating of player a R’B : New Rating of player b

Steps Involved Finding the E-A and E-B ( Percentage of win) Finding the new rating of both players

Calculating the E-A and E-B Note :

Calculating New Rating

Different Values of K Players below 2100: K-factor of 32 used Players between 2100 and 2400: K-factor of 24 used Players above 2400: K-factor of 16 used.

Issues with ELO Rating Accurate Distribution System Normal Distribution used by ELO does not accurately represent the result achieved particularly by low rated players. Most Accurate K Value Statistician Jeff Sonas believes that if the K-factor coefficient is set too large, there will be too much sensitivity to just a few, recent events, in terms of a large number of points exchanged in each game. Too low a K-value, and the sensitivity will be minimal, and the system will not respond quickly enough to changes in a player's actual level of performance.

Can there be another rating system?

Glicko Rating System

Introduction to Glicko Developed in 1995 by Mark Glickman Currently implemented on the FICS Coincidentally, Elo turns out to be a special case of Glicko Why Glicko? Elo balances out for the outcome Reliability of ratings Situations aren’t so extreme, but you get the intuition! Glicko is a best guess along with an uncertainty measure

Getting Started Rating RD - Rating Divergence Ratings affected with the game outcome RD affected with the game outcome as well as time for which player doesn’t/does play Rating changes after the game is governed by both player’s RD The player has both rating and a rating divergence

How to compute Glicko? Algorithm If the player is unrated, set the rating to 1500 and the RD to 350. These are default reasonable choices, but the RD of 350 in particular can be determined through optimizing predictability of game outcomes (not described here). Otherwise, use the player’s rating from the last period, and calculate the new RD from the RD at the last period (RDold) by the formula...

Can we replace Elo with Glicko? Glicko solves the problem in a different way Communicating how Glicko works is difficult than Elo How to prove if a rating system really reflects the accurate ability? Ratings is a touchy issue! Will the GMs (top 100 players) still remain in the reckoning after we replace Elo with Glicko

Should we really change the ranking system?

Thank you!