The Wisdom of Crowds in the Aggregation of Rankings

The Wisdom of Crowds in the Aggregation of Rankings
Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Michael Lee, Brent Miller, Pernille Hemmer

Example ranking problem in our research
What is the correct chronological order? James Garfield Ulysses S. Grant Rutherford B. Hayes Andrew Johnson Abraham Lincoln Ulysses S. Grant James Garfield Rutherford B. Hayes time Emphasize humans providing rankings Talk more about the ordered list Why study ranking? natural for humans to express Abraham Lincoln Andrew Johnson Alice Healy and Roddy Roediger performed similar experiments to study serial recall with preexisting knowledge

Rank aggregation problem
Goal is to combine many different rank orderings on the same set of items in order to obtain a “better” ordering Example applications Combining voters rankings: social choice theory Aggregating search engine rankings* Difficult problem: with N items, there are N! orderings efficient models to explore the space of all permutations Another application example multi-object tracking e.g. Huan, Guestrin, Guibas (2009); Kondor, Howard, Jebara (2007) *e.g. Lebanon & Mao (2008); Klementiev, Roth et al. (2008; 2009), Dwork et al. (2001)

Wisdom of crowds phenomenon
Aggregating over individuals often leads to an estimate that is better than the majority of individual estimates Galtons Ox (1907): Median of individual weight estimates came close to true answer

Aggregating ranking data
group answer ground truth ? A B C D = A B C D Aggregation Algorithm D A B C A B D C B A D C A C B D A D B C

Modeling Approach latent truth insert psychology: e.g.
- noise processes - individual differences - cognitive processes ? ? ? ? Generative Model D A B C A B D C B A D C A C B D A D B C

General Approach No communication between individuals
There is always a true answer (ground truth) ground truth only used in evaluation Unsupervised weighting of individuals* exploit relationship between expertise and consensus experts tend to be closer to the truth and therefore produce more similar judgments No need to screen for expertise (similar to Bayesian truth serum) * Klementiev, Roth et al. (2008, 2009); Dani, Madani, Pennock et al. (2006). Bayesian truth serum (Prelec et al., 2004); Cultural Consensus Theory (Batchelder and Romney, 1986)

Overview of talk Aggregating Memory Judgments
Aggregating NBA predictions “Wisdom Within” Effect of communication between individuals

Experiment: 26 individuals order all 44 US presidents
George Washington John Adams Thomas Jefferson James Madison James Monroe John Quincy Adams Andrew Jackson Martin Van Buren William Henry Harrison John Tyler James Knox Polk Zachary Taylor Millard Fillmore Franklin Pierce James Buchanan Abraham Lincoln Andrew Johnson Ulysses S. Grant Rutherford B. Hayes James Garfield Chester Arthur Grover Cleveland 1 Benjamin Harrison Grover Cleveland 2 William McKinley Theodore Roosevelt William Howard Taft Woodrow Wilson Warren Harding Calvin Coolidge Herbert Hoover Franklin D. Roosevelt Harry S. Truman Dwight Eisenhower John F. Kennedy Lyndon B. Johnson Richard Nixon Gerald Ford James Carter Ronald Reagan George H.W. Bush William Clinton George W. Bush Barack Obama Original reference: Roediger & Crowder, 1976 Requested 159 college students to recall the names of all the US presidents, in either chronological or any order. Data analysis produced a classical serial position curve with best performance at the beginning and end of the series. Except for extraordinarily high recall of Lincoln, memorability of presidents was strongly related to their chronological position in history. Results extend generality of the serial position effect to semantic memory and, if one seeks a general explanation of serial position effects in semantic and long-term episodic memory experiments, rules out several theoretical candidates. It appears most congruent with the hypothesis that end points of a series serve as distinct positional cues around which memory search is begun

Measuring performance
Kendall’s Tau: The number of adjacent pair-wise swaps Ordering by Individual A B E C D A B E C D Other distances: Spearman Footrule Hamming Cayley Ulam = 1 = 1+1 = 2 A B C D E A B E C D True Order A B C D E

t Empirical Results (random guessing) Best tau = 0
Tau based on random guessing = N * (N-1) / 4 Note the individual differences If Barack Obama was placed as first president and all other presidents were in correct order, you already have a tau of 43 (43 swaps needed to get Barack Obama to the end of the list)

Models for ranking data
Statistical models: Mallows (1957); Fligner and Verducci (1986) Voting theoretic methods e.g. Borda count (1770) Psychological models Thurstone (1927) Perturbation model (Estes) We will focus on Thurstonian models implemented as graphical models MCMC inference

Thurstonian Model A. George Washington B. James Madison
C. Andrew Jackson Each item has a true coordinate on some dimension

C. Andrew Jackson … but there is noise because of encoding errors or partial knowledge

C. Andrew Jackson A B C Each persons mental encoding is based on a single sample from each distribution

C. Andrew Jackson A A < C < B B C The observed ordering is based on the ordering of the samples

C. Andrew Jackson A A < B < C B C The observed ordering is based on the ordering of the samples

C. Andrew Jackson each person samples from normal distributions with means shared by the group but with a personal variance Key extension is to allow for individual-level differences in accuracy of knowledge The means are fixed, and the variance is fixed for the same person, but differs between people

Graphical Model of Extended Thurstonian Model
Latent truth Individual Expertise Mental samples Complication: deterministic output variable Mention MCMC inference Observed order j individuals

Inferred Distributions for 44 US Presidents
George Washington (1) John Adams (2) Thomas Jefferson (3) James Madison (4) James Monroe (6) John Quincy Adams (5) Andrew Jackson (7) Martin Van Buren (8) William Henry Harrison (21) John Tyler (10) James Knox Polk (18) Zachary Taylor (16) Millard Fillmore (11) Franklin Pierce (19) James Buchanan (13) Abraham Lincoln (9) Andrew Johnson (12) Ulysses S. Grant (17) Rutherford B. Hayes (20) James Garfield (22) Chester Arthur (15) Grover Cleveland 1 (23) Benjamin Harrison (14) Grover Cleveland 2 (25) William McKinley (24) Theodore Roosevelt (29) William Howard Taft (27) Woodrow Wilson (30) Warren Harding (26) Explain numbers in parentheses Calvin Coolidge (28) Herbert Hoover (31) Franklin D. Roosevelt (32) Harry S. Truman (33) Dwight Eisenhower (34) John F. Kennedy (37) Lyndon B. Johnson (36) Richard Nixon (39) Gerald Ford (35) James Carter (38) Ronald Reagan (40) George H.W. Bush (41) William Clinton (42) George W. Bush (43) Barack Obama (44) error bars = median and minimum sigma

Wisdom of crowds effect
baseline tau = 44 * 43 / 4 = 473

Calibration of individuals
People who agree with each other are regarded as experts, and are “up-weighted” in forming the aggregate order t individual t distance to ground truth s inferred noise level for each individual

Heuristic Models Many heuristic methods from voting theory
E.g., Borda count method Suppose we have 10 items assign a count of 10 to first item, 9 for second item, etc add counts over individuals order items by the Borda count i.e., rank by average rank across people

t Model Comparison Borda why is Borda count worse?
no individual differences

Ordering Ten Amendments

Ordering Ten Commandments

Mixture Model to Extract Multiple Subjective Beliefs
latent belief 1 latent belief 2 latent belief 3 ? ? ? ? ? ? ? ? ? ? ? ? Generative Model

Two groups of individuals for ten commandments
82% 18% t=7 t=32

Recollecting Order from Episodic Memory
Study this sequence of images

How good is your memory? Place the images in the correct sequence (by reading order)
F G H change to reading order? I J

Average results across 6 problems
Mean

Calibration of individuals
distance to ground truth s inferred noise level (pizza sequence; perturbation model)

Human forecasting experiment
Forecast end-of-season rankings for 15 NBA teams Eastern conference Western conference Participants were college undergraduates Varied basketball knowledge 172 individuals for Eastern conference 156 individuals for Western conference Experiment conducted Feb 2010 teams have played about a bit over half of games in regular season

Actual outcome Cleveland Orlando Atlanta Boston Miami Milwaukee Charlotte Chicago Toronto Indiana New York Detroit Philadelphia Washington New Jersey Borda Boston Cleveland Orlando Miami Detroit Chicago Philadelphia Atlanta New York New Jersey Indiana Washington Toronto Charlotte Milwaukee Thurstonian Model Cleveland Boston Orlando Miami Atlanta Chicago Detroit Charlotte Toronto Philadelphia Washington Indiana New York Milwaukee New Jersey

East 73% t 93% West Note that East leads to worse performance overall (perhaps less expertise overall). t 87% 94%

Calibration Results East West t t s s

Problem Current model identifies expertise by consensus
Non-experts might use simple heuristics that also lead to agreement Solution: apply a mixture model assign each individual to a mixture component

A solution with two components (East)
47% 53% t=17 t=65

A solution with two components (West)
67% 33% t=14 t=60

East 73% 93% t 97% West t 87% 94% 97%

Minimum Spanning Trees (MST)
Goal: create a network that minimizes the total length of connections

Wisdom of the Crowd Within
Vul and Pashler (2008): repeated responses from the same individual may also result in a wisdom of the crowd effect We performed a “wisdom within” experiment on MST Subjects solved the same MST problem eight times Each repetition involved a rotation/reflection of the original problem

Example repetitions of same MST problem
(random reflections and rotations applied to the original problem)

Solutions from one individual
(Subjects were not aware they were solving the same problem)

Comparing Between vs. Within Individual Aggregation
Large individual differences cause wide error bars for within-subjects POA measurement.

Overview of talk General knowledge tasks Episodic memory
reconstructing order of US presidents Episodic memory reconstructing order of personally experienced events Forecasting NBA outcomes Traveling salesman problems Effect of communication between individuals

Influence of communication
Many researchers argue best aggregation is achieved by complete independence between individuals But does sharing of information always lead to worse aggregates?

Iterated Learning Experiment: each individual refines the previous ordering
Abraham Lincoln Ulysses S. Grant Abraham Lincoln Abraham Lincoln R. B. Hayes R. B. Hayes Andrew Johnson James Garfield James Garfield James Garfield Andrew Johnson Andrew Johnson R. B. Hayes Andrew Johnson Abraham Lincoln Ulysses S. Grant Ulysses S. Grant Related to work by Griffiths and colleagues on iterated learning

Influence of information sharing Comparing independent judgments and an iterated learning task
Number of individuals

Conclusions Three psychological ideas
individual differences in expertise allowing for divergence in beliefs effect of communication Goal is to build a better theory of the wisdom of crowds phenomenon fundamentally a cognitive modeling problem Prior knowledge: downweight individuals with “wrong” prior knowledge

Do the experiments yourself:

Predicting problem difficulty
city size rankings t t distance of inferred truth to actual truth ordering states geographically std( s ) dispersion of expertise

Summary Combine ordering / ranking data
going beyond numerical estimates or multiple choice questions Incorporate individual differences assume some individuals might be “experts” going beyond models that treat every vote equally Incorporate prior knowledge downweight individuals with “wrong” prior knowledge correct judgments towards natural prior orderings

Effect of Group Size t No need for more than 40 individuals

Heuristic Approach Idea: find tours with edges for which many individuals agree Calculate agreement matrix A A = n × n matrix, where n is the number of cities aij indicates the number of participants that connect cities i and j. use a non-linear transform function f() to emphasize high agreement edges Find tour that maximizes (this itself is a non-Euclidian TSP problem)

Forecasting NCAA tournament (March Madness)
64 US college basketball teams are placed in a set of four seeded brackets, and play an elimination tournament. Midwest bracket: Mention 4 brackets

Data Predictions from 16,718 Yahoo users Two scoring systems
Each individual predicts the winner of all games We use the predictions for the first four rounds (60 games total) Two scoring systems Number of correct predictions Points: 1 point per correct winner in 1st round 2 points in 2nd 4 points in 3rd 8 points in 4rd

Data and Results of Heuristic Strategies
majority rule 73% Obama 83% prior seeding 61% #correct predictions majority rule 71% prior seeding 66% Obama 47% points individuals

Thurstonian Model Team A Team B Team C
Each team has a mean on a single “strength” dimension Each person has single variance

Thurstonian Model Team A Team B B wins over A Team C
The probability a person will choose team A over team B is the probability their strength for team A will be sampled above team B

Thurstonian Model Team A Team B Team C C wins over B
The probability a person will choose team A over team B is the probability their strength for team A will be sampled above team B

Modeling Results individuals #correct predictions points
Thurstonian model inform. priors 90% Thurst model 83% majority rule 73% prior seeding 61% #correct predictions Thurst. model inform. priors 81% Thurst. model 78% majority rule 71% prior seeding 66% points individuals

Modeling Results #correct predictions points individuals
Thurstonian model 90% majority rule 73% prior seeding 61% #correct predictions Thurst. model 81% majority rule 71% prior seeding 66% points individuals

Rank aggregation applications in this research
Combining rankings related to general knowledge, e.g. Orderings by time (e.g. US Presidents) Orderings by length / size (e.g. famous rivers, countries) Combining forecasted team rankings in sports Combining multiple eyewitness testimonies memory for order of events

Find the shortest route between cities
Individual 5 Individual 60 Individual 83 Optimal B30-21

Idea: analyze the agreement on edges that are part of the tour
Line thickness = agreement

Blue = Tour that maximizes agreement
Line thickness = agreement

Results averaged across 7 problems
aggregate

Problem What if we have only a small number of individuals?
How can we guard against individuals with poor memory? Idea: use prior knowledge incorporate prior knowledge in aggregation process

Measuring Prior Knowledge for Event Sequences
“Place the images in a sequence that looks natural to you” A B C D E F G H change to reading order? I J

Modeling Approach ? Prior Studied Event Sequence (latent)
Prior Orderings from N individuals Memory Reconstructions from K individuals

Results when picking K worst “witnesses”
without prior knowledge with prior knowledge t Number of “witnesses” (K)

Results when randomly selecting individuals
without prior knowledge with prior knowledge t Group size

The Wisdom of Crowds in the Aggregation of Rankings

Similar presentations

Presentation on theme: "The Wisdom of Crowds in the Aggregation of Rankings"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Wisdom of Crowds in the Aggregation of Rankings

Similar presentations

Presentation on theme: "The Wisdom of Crowds in the Aggregation of Rankings"— Presentation transcript:

Similar presentations

About project

Feedback