Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski.

Slides:



Advertisements
Similar presentations
Sampling Distributions
Advertisements

Table of Contents Exit Appendix Behavioral Statistics.
6-1 Stats Unit 6 Sampling Distributions and Statistical Inference - 1 FPP Chapters 16-18, 20-21, 23 The Law of Averages (Ch 16) Box Models (Ch 16) Sampling.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 16 Mathematics of Normal Distributions 16.1Approximately Normal.
Business Statistics for Managerial Decision
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Artificial Intelligence in Game Design
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Chapter 7 Sampling Distributions
Simulating Normal Random Variables Simulation can provide a great deal of information about the behavior of a random variable.
Chapter 7 Sampling and Sampling Distributions
Introduction to Inference Estimating with Confidence Chapter 6.1.
Chapter Sampling Distributions and Hypothesis Testing.
Sampling Distributions
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
QMS 6351 Statistics and Research Methods Chapter 7 Sampling and Sampling Distributions Prof. Vera Adamchik.
Sampling Distributions
Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means.
BCOR 1020 Business Statistics
12.3 – Measures of Dispersion
Inferential Statistics
PSY 307 – Statistics for the Behavioral Sciences
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Chapter 5 Sampling Distributions
Determining Sample Size
Prediction concerning Y variable. Three different research questions What is the mean response, E(Y h ), for a given level, X h, of the predictor variable?
MBA7020_04.ppt/June 120, 2005/Page 1 Georgia State University - Confidential MBA 7020 Business Analysis Foundations Descriptive Statistics June 20, 2005.
Dan Piett STAT West Virginia University
Estimation of Statistical Parameters
Estimates and Sample Sizes Lecture – 7.4
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
PARAMETRIC STATISTICAL INFERENCE
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.
© 2010 Pearson Prentice Hall. All rights reserved Chapter Sampling Distributions 8.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Essential Statistics Chapter 131 Introduction to Inference.
CHAPTER 14 Introduction to Inference BPS - 5TH ED.CHAPTER 14 1.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
1 Estimation From Sample Data Chapter 08. Chapter 8 - Learning Objectives Explain the difference between a point and an interval estimate. Construct and.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Hypothesis Testing. The 2 nd type of formal statistical inference Our goal is to assess the evidence provided by data from a sample about some claim concerning.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
BPS - 3rd Ed. Chapter 131 Confidence Intervals: The Basics.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
GamblingGambling What are the odds? Jessica Judd.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Sampling Distributions: Suppose I randomly select 100 seniors in Anne Arundel County and record each one’s GPA
Chapter 4 Statistical Inference  Estimation -Confidence interval estimation for mean and proportion -Determining sample size  Hypothesis Testing -Test.
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Chapter 7: The Distribution of Sample Means
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Statistics for Business and Economics Module 1:Probability Theory and Statistical Inference Spring 2010 Lecture 4: Estimating parameters with confidence.
Sampling: Distribution of the Sample Mean (Sigma Known) o If a population follows the normal distribution o Population is represented by X 1,X 2,…,X N.
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Department of Electrical Engineering, Portland State University
STAT 206: Chapter 6 Normal Distribution.
Essential Statistics Introduction to Inference
Get to know the rating system in the model
Sampling Distributions
Basic Practice of Statistics - 3rd Edition Introduction to Inference
Investigations using the
Multicriteria Decision Making
Chapter 5: Sampling Distributions
Presentation transcript:

Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski Department of Electrical Engineering, Portland State University

Portland Cyber Theatre

Making science out of robot theater?

How to make a science from robot theatre?

We want to evaluate sound, shape, motion, color,etc.

Behavior Generation and Verification Interactive Genetic Algorithm Behavior expression Behavior Automaton Probabilistic Automaton behavior Generator and verifier robot Human evaluators

Main Idea of this paper A new approach to create fitness function for Interactive Genetic Algorithm in which (possibly) many humans evaluate robot motions via Internet page. Based on ELO rating system known from chess. The robots use: 1. a genetic algorithm, 2. fuzzy logic, 3. probabilistic state machines, 4. a small set of functions for creating picture components, 5. and a user interface which allows the Internet users to rate individual sequences.

Previous work on IEC systems 1. Human-based genetic algorithm. 2. Interactive evolution strategy, 3. Interactive genetic programming, 4. Interactive genetic algorithm. Mostly for music composition and graphics Usually weighted functions were used

Ranking Systems in Sports Rating systems for many sports award points in accordance with subjective evaluations of the 'greatness' of certain achievements. an important golf tournament For example, winning an important golf tournament might be worth an arbitrarily chosen five times as many points as winning a lesser tournament. statistical A statistical endeavor, by contrast, uses a model that relates the game results to underlying variables representing the ability of each player.

Elo rating system The Elo rating system is a method for calculating the relative skill levels of players in two-player games such as chess. American physics It is named after its creator Arpad Elo, a Hungarian-born American physics professor. The Elo system was invented as an improved chess rating system, but today it is also used in many other games.chess rating system It is also used as a rating system for multiplayer competition in a number of video games.multiplayervideo games It has been adapted to team sports including association football, American college football, basketball, and Major League Baseball.association footballMajor League Baseball

Previous works

Pairwise Comparison

Method: Compare each two candidates (players) head-to-head. Award each candidate one point for each head-to-head victory. The candidate with the most points wins. N(N-1)/2 comparisons.

Pairwise Comparison - Example best robot facial expression Selection of best robot facial expression: 4 candidates: {A,B,C,D} and 4 rankings of them 37 voters 5 trials (columns) Table shows the rankings of the candidates (rows) and the number of voters (columns) that ranked the candidates that way # of Voters Rank st ACDBC 2 nd BBCDD 3 rd CDBCB 4 th DAAAA

Pairwise Comparison - Example Compare candidates A & B: 14 voters ranked A higher than B =23 voters ranked B higher than A So, B wins against A # of Voters Rank st ACDBC 2 nd BBCDD 3 rd CDBCB 4 th DAAAA

Pairwise Comparison - Example Next, compare candidates A & C: 14 voters ranked A higher than C =23 voters ranked C higher than A So, C wins against A Continue for next pairs: A vs. D, B vs. C, B vs. D, C vs. D Exclude: permutations (e.g. C vs. A = A vs. C) comparison with itself (e.g. A vs. A) # of Voters Rank st ACDBC 2 nd BBCDD 3 rd CDBCB 4 th DAAAA

Pairwise Comparison - Example Record points: wins=1, lose=0 # of Voters (total=37) Rank st ACDBC 2 nd BBCDD 3 rd CDBCB 4 th DAAAA ABCD Wins over Lost against Points A14 B C D23912 Cell values: number of voters that ranked candidate (row) over candidate (column)

Pairwise Comparison - Example Record points: wins=1, lose=0 # of Voters (total=37) Rank st ACDBC 2 nd BBCDD 3 rd CDBCB 4 th DAAAA ABCD Wins over Lost against Points A14 -B,C,D0 B C D23912 Cell values: number of voters that ranked candidate (row) over candidate (column)

Pairwise Comparison - Example Record points: wins=1, lose=0 # of Voters (total=37) Rank st ACDBC 2 nd BBCDD 3 rd CDBCB 4 th DAAAA ABCD Wins over Lost against Points A14 -B,C,D0 B231828A,DC2 C231925A,B,D-3 D23912AB,C1 Cell values: number of voters that ranked candidate (row) over candidate (column)

Pairwise Comparison - Example Record points: wins=1, lose=0 # of Voters (total=37) Rank st ACDBC 2 nd BBCDD 3 rd CDBCB 4 th DAAAA ABCD Wins over Lost against Points A14 -B,C,D0 B231828A,DC2 C231925A,B,D-3 D23912AB,C1 Cell values: number of voters that ranked candidate (row) over candidate (column) C wins!

Pairwise Comparison - Example Record points: wins=1, lose=0 # of Voters (total=37) Rank st ACDBC 2 nd BBCDD 3 rd CDBCB 4 th DAAAA ABCD Wins over Lost against Points A-BCD-B,C,D0 B--CBA,DC2 C---CA,B,D-3 D----AB,C1 Another way to calculate the winner: use half the table triangle, mark the winner, and count the number of times the player appears C wins!

Other possible scenario A three-way tie: Inconsistency: A wins over B, B wins over C, C wins over A ABC A-AC B--B C---

ELO Rating System

Overview of ELO A player’s skill is assumed to be a normal distribution: True skill is around the mean Elo System gives two things: A players expected chance of winning A method to update a player’s Elo Rating

Basic Ideas of ELO cannot look at a sequence One cannot look at a sequence of moves and say, "That performance is 2039." Performance can only be inferred from wins, draws and losses. Therefore, if a player wins a game, he is assumed to have performed at a higher level than his opponent for that game. Conversely if he loses, he is assumed to have performed at a lower level. If the game is a draw, the two players are assumed to have performed at nearly the same level.

A player’s ranking is updated based on its: Expected value of winning (E) Which depends on the ranking difference with the opponent Outcome of the match (S for ‘score’) 1 = win 0 = lose 0.5 = draw Scores and ranking of players

Expected score (E) Where: E A, E B = expected score for player A and B, respectively R A, R B = Rating of player A and B, respectively Remember: 1=win, 0=lose, 0.5=draw Expected scores in Elo Rating rating

Characteristics of ELO A player with higher Elo ranking than his opponent has a higher expected value (i.e. chance of winning), and vice versa. having a draw When both players have similar Elo rankings, the chance of having a draw is higher. After the match, both players’ rankings are updated with the same amount, but: the winner gains the rank (rating), the loser loses the rank. If a higher ranking player (‘stronger’) wins against a weaker player, the rank changes are smaller than when the weaker player wins against the higher ranking player. Subjective value K

Basic Assumptions of ELO normally distributed random variable. Elo's central assumption was that the chess performance of each player in each game is a normally distributed random variable. Although a player might perform significantly better or worse from one game to the next, ELO assumed that the mean value of the performances of any given player changes only slowly over time. A further assumption is necessary, because chess performance in the above sense is still not measurable. Our question: “Is ELO good for human evaluation of robot art (motion, behavior)?”

How ELO Works probability of winning half A player's expected score is his probability of winning plus half his probability of drawing. Thus an expected score of 0.75 could represent a 75% chance of winning, 25% chance of losing, and 0% chance of drawing. On the other extreme it could represent a 50% chance of winning, 0% chance of losing, and 50% chance of drawing. probability of drawing The probability of drawing, as opposed to having a decisive result, is not specified in the Elo system. Instead a draw is considered half a win and half a loss.

How ELO Works The relative difference in rating between two players determines an estimate for the expected score between them. Both the average and the spread of ratings can be arbitrarily chosen. Elo suggested scaling ratings so that a difference of 200 rating points in chess would mean that the stronger player has an expected score (which basically is an expected average score) of approximately 0.75, The USCF initially aimed for an average club player to have a rating of 1500.

Elo Rating - Example Suppose a Robot Boxing league: The league has tens, hundreds, or more robots Each robot has a ranking (higher number = higher rank) A robot’s ranking is updated after each match But it can also be done after multiple matches A match is a one-vs-one battle

Expected score (E) Suppose: Robot A rank: 1500 Robot B rank: 1320 Then: E A = 1 / ( ( )/400 ) = E B = = Elo Rating Example: scores for robots Expected to win

Elo Rating Example: Adjusting ratings after match Next, the match is held. After the match, the ratings of both robots will be adjusted by: Where: R’ A = Robot A’s new rating R A = Robot A’s old/current rating K = some constant*, for practical reasons we choose K=24 in this example S = Score/match result (1=win, 0=lose, 0.5=draw) E A = Expected score Similarly for robot B

Elo Rating Example: Adjusting scores after one match Suppose the outcome of the match: Robot A wins! Robot B wins! It’s a draw! Robot A rank: 1500 Robot B rank: 1320 Remember before the match it was:

Elo Rating Example: adjusting rankings after five matches Suppose rank update is done after 5 matches: Robot A current rank: 1500 Opponent/match Opponent rank (R B) EAEA Score/match outcome (1=win, 0=lose, 0.5=draw) Total

About K in chess K is the rate of adjustments to one’s rating. Example when Robot A wins (B loses): Some Elo implementations adjust K based on some criteria. For example: FIDE (World Chess Federation): K = 30 for a player new to the rating list until s/he has completed events with a total of at least 30 games. K = 15 as long as a player's rating remains under K = 10 once a player's published rating has reached 2400, and s/he has also completed events with a total of at least 30 games. Thereafter it remains permanently at 10. USCF (United States Chess Federation): Players below > K-factor of 32 used Players between 2100 and > K-factor of 24 used Players above > K-factor of 16 used. How about robot art?

Picture Drawing Robots

Audience votes through a Webpage

ELO for art (motion) scoring Score of 194

Score of 0 ELO for art (motion) scoring

Physical Robot DERPY Derpy with a sharpie marker

Fuzzy/Probabilistic state Machine operates differently in dark and light areas. Image with dark and light areas. Examples of fuzzy variables.

Fuzzy and Probabilistic Machines Simple probabilistic machine of Derpy

“Robot art” on butcher paper located on a floor.

Another piece of art from Derpy

Now use Part 2 of slides

Auxiliary Slides

Microsoft TrueSkill

Addressing: Subjective K value - instead, based on players’ skill Ranking of multiple players (>2) Can find “interesting” matches - balanced, where either player have comparable chance of winning the match. Build “Leaderboards” (ranking of all players)

Microsoft TrueSkill Player’s skill is modeled as normal distribution, with mean as the player’s “true skill” and standard deviation as uncertainties (about the player’s skill) Player start with some “mean skill” and uncertainty values. As player plays more games/matches, the mean skill gets adjusted, and the uncertainty (i.e. std. dev) decreases.

Microsoft TrueSkill Updating mean and standard deviation β 2 is unknown, which is the variance of the performance around the skill of each player.

Microsoft TrueSkill v and w

Microsoft TrueSkill

Microsoft TrueSkill us/projects/trueskill/Details.aspx#How_to_Update_Skillshttp://research.microsoft.com/en- us/projects/trueskill/Details.aspx#How_to_Update_Skills unbalanced matches (can't win or can't lose) are not interesting balanced matches are interesting (even chance of winning) accommodates two or more players a module to track skills of all players based on game outcomes between players (update) TA module to arrange interesting matches for its members (Matchmaking) module to recognize and potentially publish skills of members (leader boards) Truskill is skill-based ranking system so interesting matches can reliably arranged within a league uses Bayesian inference for ranking

Microsoft TrueSkill The intuition is that the greater the difference between two player’s μ values – assuming their σ value are similar – the greater the chance of the player with the higher μ value performing better in a game. This principle holds true in the TrueSkill ranking system. But, this does not mean that the players with the larger μ's are always expected to win, but rather that their chance of winning is higher than that of the players with the smaller μ's. The TrueSkill ranking system assumes that the performance in a single match is varying around the skill of the player, and that the game outcome (relative ranking of all players participating in a game) is determined by their performance. Thus, the skill of a player in the TrueSkill ranking system can be thought of as the average performance of the player over a large number of games. The variation of the performance around the skill is, in principle, a configurable parameter of the TrueSkill ranking system.

Microsoft TrueSkill mu and sigma are updated based on outcome of game (win/lose). score difference makes no impact. 1. assumes skill of each player may change slightly between current and previous game -> sigma is increased (a configurable parameter) "It is this parameter that both allows the TrueSkill system to track skill improvements of gamers over time and ensures that the skill uncertainty σ never decreases to zero ("maintaining momentum")." 2. determine the probability of game outcome for given skills of participating players, and weight by probability of corresponding skill beliefs. --> average over all possible performances (weighted by their probability - Bayes Law) and derive the game outcome from performances: player with highest performance is winner, second highest is first tuner up, and s on. 3. if player performance are very close, true skill considers the outcome to be draw. The larger the draw margin is defined in a league, the more likely a draw is to occur. The size of margin is configurable and adjusted by game mode.

Measuring consistency

Measuring consistency in Pairwise Comparison Can be done when comparison is done with “degree of importance”. E.g.: 1=equally important, 2=somewhat more important, 3=more important, 4=most important Example: Determining important criteria in buying a car PriceMPGComfortStyle Price322 MPG Comfort4 Style42 Values in cells are importance with respect to the row item

Measuring consistency in Pairwise Comparison Complete the values in the matrix Example: Determining important criteria in buying a car PriceMPGComfortStyle Price1322 MPG1 Comfort41 Style421 Criterion compared to itself is “equally important” 1=equally important, 2=somewhat more important, 3=more important, 4=most important

Measuring consistency in Pairwise Comparison Complete the values in the matrix Example: Determining important criteria in buying a car PriceMPGComfortStyle Price1322 MPG1/311/4 Comfort1/241 Style1/2421 1=equally important, 2=somewhat more important, 3=more important, 4=most important Importance of less important criterion is reciprocal of the importance of the more important criterion e.g.: Price vs. Style => Style is two times more imporant than Price (2). So, Price is one half as imporant than Style (1/2)

Measuring consistency in Pairwise Comparison Example: Determining important criteria in buying a car Calculate weights of each criterion: PriceMPGComfortStyle Price1322 MPG1/311/4 Comfort1/241 Style1/2421

Evaluation Criteria for Ranking Methods The Method of Pairwise Comparisons satisfies the Majority Criterion. (A majority candidate will win every pairwise comparison.) The Method of Pairwise Comparisons satisfies the Condorcet Criterion. (A Condorcet candidate will win every pairwise comparison -- that's what a Condorcet candidate is!) The Method of Pairwise Comparisons satisfies the Public-Enemy Criterion. (If there is a public enemy, s/he will lose every pairwise comparison.) The Method of Pairwise Comparisons satisfies the Monotonicity Criterion. (Ranking Candidate X higher can only help X in pairwise comparisons.)

ELO

Agenda on ELO Overview How it works Details Mathematic al details

How it Works Elo formulas Expected value Score How to update the rank