Download presentation
Presentation is loading. Please wait.
Published byPatrick Shepherd Modified over 9 years ago
1
Hierarchical Bayesian Models for Aggregating Retrieved Memories across Individuals Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Michael Lee Brent Miller Pernille Hemmer Bill Batchelder Paolo Napoletano
2
Thomas Jefferson Andrew Jackson James Monroe George Washington John Adams Andrew Jackson Thomas Jefferson James Monroe John Adams George Washington Ordering problem: time what is the correct order of these Presidents?
3
Goal: aggregating responses 3 D A B C A B D C B A D CA C B D A D B C Aggregation Algorithm A B C D ground truth = ? group answer
4
Bayesian Approach 4 D A B C A B D C B A D CA C B D A D B C Generative Model A B C D ground truth = latent common cause
5
Important notes: No communication between individuals There is always a true answer (ground truth) Aggregation algorithm never has access to ground truth ground truth only used for evaluation 5
6
Matching problem: 6 RembrandtVan GoghMonetRenoir A B C D
7
Wisdom of crowds phenomenon Crowd estimate is often better than any individual in the crowd (Think of independent noise influencing each individual) 7
8
Examples of wisdom of crowds phenomenon 8 Who wants to be a millionaire? Galton’s Ox (1907): Median of individual estimates comes close to true answer
9
Limitations of Current “Wisdom of Crowds” Research Studies restricted to numeric or categorical judgments simple averaging schemes: Mode Median Mean No treatment of individual differences every “vote” is treated equally downplayed role of expertise 9
10
Cultural Consensus Theory (CCT) E.g. Romney, Batchelder, and Weller (1987) Finds the “answer key” to multiple choice questions when ground truth is lost takes person and item differences into account Informal version of CCT also developed for ranking data 10
11
Research Goals Generalize “wisdom of crowds” effect to more complex data Aggregation of permutations Ranking data Matching (assignment) data 11
12
Hierarchical Bayesian Models Probability distributions over all permutations of items with N items, there are N! combinations e.g., when N=44, we have 44! > 10^53 combinations Approximate inference methods: MCMC Cognitively plausible generative processes Treatment of individual differences 12
13
Part I Ordering Problems 13
14
Experiment 1 Task: order all 44 US presidents Methods 26 participants (college undergraduates) Names of presidents written on cards Cards could be shuffled on large table 14
15
= 1= 1+1 Measuring performance Kendall’s Tau: The number of adjacent pair-wise swaps Participant Ordering 1 25 34 Ground Truth 1 23 45 3451 2 1 25 34 1 23 45 = 2
16
Empirical Results 16 (random guessing)
17
Probabilistic models Thurstone (1927) Mallows (1957) Plackett-Luce (1975) Lebanon-Mao (2008) Spectral methods Diaconis (1989) Heuristic methods from voting theory Borda count … however, many of these approached developed for preference rankings Many approaches for analyzing rank data… 17
18
Bayesian Thurstonian Approach 18 Each item has a true coordinate on some dimension A B C
19
Bayesian Thurstonian Approach 19 A B C … but there is noise because of encoding and/or retrieval error Person 1
20
Bayesian Thurstonian Approach 20 Each person’s mental representation is based on (latent) samples of these distributions B C A B C Person 1 A
21
Bayesian Thurstonian Approach 21 B C A B C The observed ordering is based on the ordering of the samples A < B < C Observed Ordering: Person 1 A
22
Bayesian Thurstonian Approach 22 People draw from distributions with common mean but different variances Person 1 B C A B C A < B < C Observed Ordering: Person 2 A B C B C Observed Ordering: A < C < B A A
23
Graphical Model Notation 23 j=1..3 shaded = observed not shaded = latent
24
Graphical Model of Bayesian Thurstonian Model 24 j individuals Latent ground truth Individual ability Mental representation Observed ordering
25
Inference Need the posterior distribution Markov Chain Monte Carlo Gibbs sampling on Metropolis-hastings on and Draw 400 samples group ordering based on average of across samples 25
26
Wisdom of Crowds effect 26 model’s ordering is as good as best individual
27
Inferred Distributions for 44 US Presidents 27 George Washington (1) John Adams (2) Thomas Jefferson (3) James Madison (4) James Monroe (6) John Quincy Adams (5) Andrew Jackson (7) Martin Van Buren (8) William Henry Harrison (21) John Tyler (10) James Knox Polk (18) Zachary Taylor (16) Millard Fillmore (11) Franklin Pierce (19) James Buchanan (13) Abraham Lincoln (9) Andrew Johnson (12) Ulysses S. Grant (17) Rutherford B. Hayes (20) James Garfield (22) Chester Arthur (15) Grover Cleveland 1 (23) Benjamin Harrison (14) Grover Cleveland 2 (25) William McKinley (24) Theodore Roosevelt (29) William Howard Taft (27) Woodrow Wilson (30) Warren Harding (26) Calvin Coolidge (28) Herbert Hoover (31) Franklin D. Roosevelt (32) Harry S. Truman (33) Dwight Eisenhower (34) John F. Kennedy (37) Lyndon B. Johnson (36) Richard Nixon (39) Gerald Ford (35) James Carter (38) Ronald Reagan (40) George H.W. Bush (41) William Clinton (42) George W. Bush (43) Barack Obama (44) median and minimum sigma
28
Model is calibrated 28 Individuals with large sigma are far from the truth
29
Alternative Models Many heuristic methods from voting theory E.g., Borda count method Suppose we have 10 items assign a count of 10 to first item, 9 for second item, etc add counts over individuals order items by the Borda count i.e., rank by average rank across people 29
30
Model Comparison 30
31
Experiment 2 78 participants 17 problems each with 10 items Chronological Events Physical Measures Purely ordinal problems, e.g. Ten Amendments Ten commandments 31
32
Ordering states west-east 32 Oregon (1) Utah (2) Nebraska (3) Iowa (4) Alabama (6) Ohio (5) Virginia (7) Delaware (8) Connecticut (9) Maine (10)
33
Ordering Ten Amendments 33
34
Ordering Ten Commandments 34 Worship any other God (1) Make a graven image (7) Take the Lord's name in vain (2) Break the Sabbath (3) Dishonor your parents (4) Murder (6) Commit adultery (8) Steal (5) Bear false witness (9) Covet (10)
35
Average results over 17 Problems 35 Individuals Mean Thurstonian Model Borda count Mode Individuals
36
Effect of Group Composition How many individuals do we need to average over? 36
37
Effect of Group Size: random groups 37
38
Experts vs. Crowds Can we find experts in the crowd? Can we form small groups of experts? Approach Form a group for some particular task Select individuals with the smallest sigma (“experts”) based on previous tasks Vary the number of previous tasks 38
39
Group Composition based on prior performance 39 T = 0 # previous tasks T = 2 T = 8 Group size (best individuals first)
40
Methods for Selecting Experts 40 Endogenous: no feedback required Exogenous: selecting people based on actual performance
41
Model incorporating overall person ability 41 j individuals Overall ability Task specific ability m tasks j individuals
42
Average results over 17 Problems 42 Mean new model
43
Part II Ordering Problems in Episodic Memory 43
44
Another ordering problem: 44 http://www.youtube.com/watch?v=29VGZtnCD30&feature=related A B C D time
45
Experiment 3 26 participants 6 videos 3 videos with stereotyped event sequences (e.g. wedding) 3 videos “unpredictable” videos (e.g., example video) extracted 10 stills for testing Method study video followed by immediate ordering test of 10 items 45
46
Bayesian Thurstonian Model 46 = 3
47
Two other examples 47 = 1 = 0
48
Overall Results 48 Mean
49
Part III Matching Problems 49
50
Example Matching Problem (one-to-one) 50 Dutch Danish Yiddish Thai Vietnamese Chinese Georgian Russian Japanese A B C D E F G H I godt nytår gelukkig nieuwjaar a gut yohr С Новым Годом สวัสดีปีใหม่ Chúc Mừng Nǎm Mới გილოცავთ ახალ წელს
51
Experiment 17 Participants 8 matching problems, e.g. car logo’s and brand names first and last names philosophers flags and countries greek symbols and letter names Number of items varied between 10 and 24 with 24 items, we have 24! possibilities 51
52
Overall Results 52
53
Heuristic Aggregation Approach Combinatorial optimization problem maximizes agreement in assigning N items to N responses Hungarian algorithm construct a count matrix M M ij = number of people that paired item i with response j find row and column permutations to maximize diagonal sum O( n 3 ) 53
54
Hungarian Algorithm Example 54 = correct= incorrect
55
Hungarian Algorithm Results (2) 55
56
Bayesian Matching Model 56 Proposed process: - match “known” items - guess between remaining ones Individual differences: -some items easier to know -some participants know more Dutch Danish Yiddish Russian godt nytår gelukkig nieuwjaar a gut yohr С Новым Годом
57
Graphical Model 57 i items Latent ground truth Observed matching Knowledge State Prob. of knowing j individuals person ability item easiness
58
Overall Modeling Results 58
59
Calibration at level of items and people (for paintings problem) 59 ITEMS INDIVIDUALS
60
How predictive are subject provided confidence ratings? 60 # guesses estimated by individual Accuracy # guesses estimated by model (based on variable A) r=-.42 r=-.77
61
Part IV Open Issues 61
62
When do we get wisdom of crowds effect? Independent errors different people knowing different things Population response centered around ground truth Some minimal number of individuals 10-20 individuals often sufficient 62
63
What are methods for finding experts? 1) Self-reported expertise: unreliable has led to claims of “myth of expertise” 2) Based on explicit scores by comparing to ground truth but ground truth might not be immediately available 3) Endogenously discover experts Use the crowd to discover experts Small groups of experts can be effective 63
64
What to do about systematic biases? In some tasks, individuals systematically distort the ground truth spatial and temporal distortions memory distortions (e.g. false memory) decision-making distortions Does this diminish the wisdom of crowds effect? maybe… but a model that predicts these systematic distortions might be able to “undo” them 64
65
Can we build domain specific models? Thurstonian model applied to wide variety of problems How about domain specific models? e.g., apply serial recall models to serial recall better specify sources of noise model systematic biases 65
66
That’s all 66 Do the experiments yourself: http://psiexp.ss.uci.edu/
67
Other slides 67
68
Results separated by problem 68
69
Notes Noise in Thurstonian models acquisition / encoding noise retrieval noise Link to crowd within (Ed Vul) are our results due to wisdom of crowds or individuals? Probably a bit of both and we cannot tell with our experiments However, there is probably a fair amount of encoding noise that would not benefit from repeated measurements within individuals Different individuals probably do know different things 69
70
To Do Compare explicitly estimated number of guesses with latent confidence Identifiability issue fix mean A? Hierarchical model test on small numbers of subjects Model comparisons on small sets of subjects 70 TO DO: look at kurtosis of sigma distributions
71
Modeling Group Serial Recall Goal: infer distribution over orderings of events given verbal reports i.e., P( original order | verbal report ) Many models for serial recall, e.g. Estes Perturbation model (1972) Shiffrin & Cook (1978) SOB (2002) Simple (2007) but many of these models do not have a likelihood function p( item 1, item 2, …, item N | memory contents ) 71
72
Bayesian Algorithm: not every person has equal weight 72 = correct= incorrect
73
Summary of Findings Extended wisdom of crowds to combinatorial problems approximate inference (MCMC) to infer probability distributions over permutations Bayesian methods that are calibrated we can tell who is likely to be accurate without having ground truth available 73
74
Graphical Model 74 i items Latent ground truth Observed matching Knowledge State Prob. of knowing j individuals item and person parameters
75
When do we get Wisdom of Crowds effect? Analyze model performance in a variety of tasks 75
76
MDS solution of pairwise tau distances 76 distance to truth
77
MDS solution of pairwise tau distances 77
78
Modeling Performance Across Task Current model is applied independently across tasks Extend hierarchical model with random effects approach to tasks Each person has a an overall ability (Pearson’s “g” ) Ability in a specific task is varies around overall ability 78
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.