Mark Steyvers Department of Cognitive Sciences

Wisdom of Crowds in Human Memory: Reconstructing Events by Aggregating Memories across Individuals
Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Brent Miller, Pernille Hemmer, Mike Yi Michael Lee, Bill Batchelder, Paolo Napoletano

What is the correct chronological order?
James Garfield Ulysses S. Grant Rutherford B. Hayes Andrew Johnson Abraham Lincoln Ulysses S. Grant James Garfield Rutherford B. Hayes time Talk more about the ordered list Abraham Lincoln Andrew Johnson

Research goal: aggregating responses
group answer ground truth ? A B C D = A B C D Aggregation Algorithm D A B C A B D C B A D C A C B D A D B C

Task constraints No communication between individuals
There is always a true answer (ground truth) Aggregation algorithm is unsupervised ground truth only used for evaluation

Wisdom of crowds phenomenon
Group estimate often performs as well as or better than best individual in the group

Examples of wisdom of crowds phenomenon
Galton’s Ox (1907): Median of individual estimates comes close to true answer Who wants to be a millionaire?

Relation to Cultural Consensus Theory (CCT)
Developed by Batchelder and Romney CCT can recover the answer key of a multiple choice test by analyzing responses across individuals Key assumption: questions vary in difficulty and individuals vary in ability Our models will be similar to the ideas of CCT, but the emphasis is different Each problem studied has a ground truth We focus on “wisdom of crowds” phenomenon

Overview of talk Ordering problems – general knowledge
what is the order of US presidents? Ordering problems – episodic memory what is the order of events you have experienced? Matching problems memory for pairs: what object was paired with what person? Recognition memory problems what words were studied?

Experiment: 26 individuals order all 44 US presidents
George Washington John Adams Thomas Jefferson James Madison James Monroe John Quincy Adams Andrew Jackson Martin Van Buren William Henry Harrison John Tyler James Knox Polk Zachary Taylor Millard Fillmore Franklin Pierce James Buchanan Abraham Lincoln Andrew Johnson Ulysses S. Grant Rutherford B. Hayes James Garfield Chester Arthur Grover Cleveland 1 Benjamin Harrison Grover Cleveland 2 William McKinley Theodore Roosevelt William Howard Taft Woodrow Wilson Warren Harding Calvin Coolidge Herbert Hoover Franklin D. Roosevelt Harry S. Truman Dwight Eisenhower John F. Kennedy Lyndon B. Johnson Richard Nixon Gerald Ford James Carter Ronald Reagan George H.W. Bush William Clinton George W. Bush Barack Obama Original reference: Roediger & Crowder, 1976 Requested 159 college students to recall the names of all the US presidents, in either chronological or any order. Data analysis produced a classical serial position curve with best performance at the beginning and end of the series. Except for extraordinarily high recall of Lincoln, memorability of presidents was strongly related to their chronological position in history. Results extend generality of the serial position effect to semantic memory and, if one seeks a general explanation of serial position effects in semantic and long-term episodic memory experiments, rules out several theoretical candidates. It appears most congruent with the hypothesis that end points of a series serve as distinct positional cues around which memory search is begun

Measuring performance
Kendall’s Tau: The number of adjacent pair-wise swaps Ordering by Individual A B E C D A B E C D Other distances: Spearman Footrule Hamming Cayley Ulam = 1 = 1+1 = 2 A B C D E A B E C D True Order A B C D E

t Empirical Results (random guessing) Best tau = 0
Tau based on random guessing = N * (N-1) / 4 Note the individual differences If Barack Obama was placed as first president and all other presidents were in correct order, you already have a tau of 43 (43 swaps needed to get Barack Obama to the end of the list)

A Bayesian (generative) approach
latent “input” ? ? ? ? Model Model Model Model … D A B C A B D C B A D C A C B D

Bayesian models We extend two models: Thurstone’s (1927) model
Estes (1972) perturbation model

Bayesian Thurstonian Approach
Each item has a true coordinate on some dimension

Person 1 A B C … but there is noise because of encoding and/or retrieval error

Person 1 A B C A B C Each person’s mental representation is based on (latent) samples of these distributions

Person 1 A B C Observed Ordering: A < B < C A B C The observed ordering is based on the ordering of the samples

Person 1 A B C Observed Ordering: A < B < C A B C Person 2 A B C Observed Ordering: A < C < B A C B People draw from distributions with common means but different variances

Graphical Model Notation
j=1..3 Complication: deterministic output variable shaded = observed not shaded = latent

Graphical Model of Bayesian Thurstonian Model
Latent group means Individual noise level Mental representation Complication: deterministic output variable Observed ordering j individuals

Inference Need the posterior distribution Markov Chain Monte Carlo
Gibbs sampling on Metropolis-hastings on and

Inferred Distributions for 44 US Presidents
George Washington (1) John Adams (2) Thomas Jefferson (3) James Madison (4) median and minimum sigma James Monroe (6) John Quincy Adams (5) Andrew Jackson (7) Martin Van Buren (8) William Henry Harrison (21) John Tyler (10) James Knox Polk (18) Zachary Taylor (16) Millard Fillmore (11) Franklin Pierce (19) James Buchanan (13) Abraham Lincoln (9) Andrew Johnson (12) Ulysses S. Grant (17) Rutherford B. Hayes (20) James Garfield (22) Chester Arthur (15) Grover Cleveland 1 (23) Benjamin Harrison (14) Grover Cleveland 2 (25) William McKinley (24) Theodore Roosevelt (29) William Howard Taft (27) Woodrow Wilson (30) Warren Harding (26) Calvin Coolidge (28) Herbert Hoover (31) Franklin D. Roosevelt (32) Harry S. Truman (33) Dwight Eisenhower (34) John F. Kennedy (37) Lyndon B. Johnson (36) Richard Nixon (39) Gerald Ford (35) James Carter (38) Ronald Reagan (40) George H.W. Bush (41) William Clinton (42) George W. Bush (43) Barack Obama (44)

Model can predict individual performance
distance to ground truth s inferred noise level for each individual

(Weak) Wisdom of Crowds Effect
baseline tau = 44 * 43 / 4 = 473 model’s ordering is as good as best individual (but not better)

Extension of Estes (1972) Perturbation Model
True order Main idea: item order is perturbed locally Our extension: perturbation noise varies between individuals and items A A B B C C D D E E Recalled order

Modified Perturbation Model

Inferred Perturbation Matrix and Item Accuracy
Abraham Lincoln Richard Nixon James Carter

Strong wisdom of crowds effect
Perturbation baseline tau = 44 * 43 / 4 = 473 Perturbation model’s ordering is better than best individual

Alternative Heuristic Models
Many heuristic methods from voting theory E.g., Borda count method Suppose we have 10 items assign a count of 10 to first item, 9 for second item, etc add counts over individuals order items by the Borda count i.e., rank by average rank across people

t Model Comparison Borda why is Borda count worse?
no individual differences

Ordering Ten Amendments
Freedom of speech & religion (1) Right to bear arms (2) No quartering of soldiers (4) No unreasonable searches (3) Due process (5) Trial by Jury (6) Civil Trial by Jury (7) No cruel punishment (8) Right to non-specified rights (10) Power for the States & People (9)

Ordering Ten Commandments

Recollecting order from episodic memory
Correct answer: B A D C

Place scenes in correct order (serial recall)
B C D Correct answer: B A D C time

Recollecting Order from Episodic Memory
Study this sequence of images

Place the images in correct sequence (serial recall)
B C D E Place the images in correct sequence (serial recall) F G H I J

Average results across 6 problems
Mean

Example calibration result for individuals
distance to ground truth s inferred noise level (pizza sequence; perturbation model)

Study these combinations

Find all matching pairs
B C D E 1 2 3 4 5

Bayesian Matching Model
Proposed process: match “known” items guess between remaining ones Individual differences some items easier to know some participants know more

Graphical Model item easiness person ability Prob. of knowing
i items Prob. of knowing Latent answer key Knowledge State Observed matching j individuals

Results across 8 problems

General Knowledge Matching Problems
Dutch Danish Yiddish Thai Vietnamese Chinese Georgian Russian Japanese A B C D E F G H I godt nytår gelukkig nieuwjaar a gut yohr С Новым Годом สวัสดีปีใหม่ Chúc Mừng Nǎm Mới გილოცავთ ახალ წელს

Modeling Results – General Knowledge Tasks

Systematic Errors and Biases
Some memory errors are systematic When averaging over biased individuals, the group estimate will also be systematically biased … unless the aggregation model can explain the bias

Listen to these words…

Associative structure influences false memories
bull calf herd cow graze pasture milk cattle

Experiment Study list Recognition memory test Confidence ratings
10 lists of 15 spoken words Recognition memory test Targets (15 items) Lure (1 item) Related distractors (15 items) Unrelated distractors (15 items) Confidence ratings 5-point confidence ratings 1=definitely not on list; 2 = probably not on list; 3 = not sure; 4 = probably on list; 5 = sure it was on the list

Mean Confidence ratings for 12 individuals

Signal Detection Aggregation Model
new (z=0) old (z=1) 1 2 3 4 5 Important: model needs to infer z, whether an item is old or new

Incorporating Associative Structure
bull calf herd cow graze pasture milk cattle

Incorporating Associative “Boost”
new (z=0) old (z=1) 1 2 3 4 5 Associative “boost” depends on set of items that are considered “old” vulnerability of individuals to associative influences

Inferred target status over mcmc iterations

ROC Curves for SDT Aggregation Models

Performance of Individuals and Aggregate

Summary Aggregation of combinatorially complex data
going beyond numerical estimates or multiple choice questions Incorporate individual differences going beyond models that treat every vote equally assume some individuals might be “experts” Take cognitive processes into account going beyond mere statistical aggregation allows us to correct for systematic errors and biases

Do the experiments yourself:
That’s all Do the experiments yourself:

Predictive Rankings: fantasy football
Australian Football League (29 people rank 16 teams) South Australian Football League (32 people rank 9 teams)

Experiment 78 participants 17 ordering problems each with 10 items
Chronological Events Physical Measures Purely ordinal problems, e.g. Ten Amendments Ten commandments

Ordering states west-east
Oregon (1) Utah (2) Nebraska (3) Iowa (4) Alabama (6) Ohio (5) Virginia (7) Delaware (8) Connecticut (9) Maine (10)

Question How many individuals do we need to average over?

Effect of Group Size: random groups
No need for more than 40 individuals

How effective are small groups of experts?
Want to find experts endogenously – without feedback Approach: select individuals with the smallest estimated noise levels based on previous tasks We are identifying general expertise (“Pearson’s g”)

Group Composition based on prior performance
# previous tasks T = 0 T = 2 T = 8 t Group size (best individuals first)

t t Endogenous no feedback
required Exogenous selecting people based on actual performance t t

Online Experiments Experiment 1 (Prior knowledge)
Experiment 2a (Serial Recall) study sequence of still images Experiment 2b (Serial Recall) study video

MDS solution of pairwise tau distances
distance to truth

MDS solution of pairwise tau distances

Thurstonian Model – stereotyped event sequences

Thurstonian Model – “random” videos

Heuristic Aggregation Approach
Combinatorial optimization problem maximizes agreement in assigning N items to N responses Hungarian algorithm construct a count matrix M Mij = number of people that paired item i with response j find row and column permutations to maximize diagonal sum O( n3 )

Hungarian Algorithm Example
= correct = incorrect

What are methods for finding experts?
1) Self-reported expertise: unreliable  has led to claims of “myth of expertise” 2) Based on explicit scores by comparing to ground truth but ground truth might not be immediately available 3) Endogenously discover experts Use the crowd to discover experts Small groups of experts can be effective

Predicting problem difficulty
city size rankings t t distance of group answer to ground truth ordering states geographically std( s ) dispersion of noise levels across individual

Mean p( “yes” ) note: confidence ratings were converted to yes/no judgments. Yes = rating >= 3; No = rating < 3

Recollection of 9/11 Event Sequence (Altmann, 2003)
Most frequent response (i.e, mode) A = One plane hits the WTC B = A second plane hits the WTC C = One plane crashes into the Pentagon D = One tower at the WTC collapses E = One plane crashes in Pennsylvania F = A second tower at the WTC collapses Correct 157 subjects total Tested Nov of 2001 Similar results when subjects are tested in Jan 2002 A C E B D F

Example tasks studied in our research
Ordering problems what is the order of US presidents? Matching problems memory for pairs: what object was paired with what person? Recognition memory problems what set of words were studied?  problems involving combinatorially complex inference problems

Mark Steyvers Department of Cognitive Sciences

Similar presentations

Presentation on theme: "Mark Steyvers Department of Cognitive Sciences"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mark Steyvers Department of Cognitive Sciences

Similar presentations

Presentation on theme: "Mark Steyvers Department of Cognitive Sciences"— Presentation transcript:

Similar presentations

About project

Feedback