Henrik Singmann (University of Warwick)

Slides:



Advertisements
Similar presentations
OPC Koustenis, Breiter. General Comments Surrogate for Control Group Benchmark for Minimally Acceptable Values Not a Control Group Driven by Historical.
Advertisements

J. Daunizeau Institute of Empirical Research in Economics, Zurich, Switzerland Brain and Spine Institute, Paris, France Bayesian inference.
Brief introduction on Logistic Regression
Statistical Decision Theory Abraham Wald ( ) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
Lecture 5: Learning models using EM
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
J. Daunizeau Wellcome Trust Centre for Neuroimaging, London, UK Institute of Empirical Research in Economics, Zurich, Switzerland Bayesian inference.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Review of Lecture Two Linear Regression Normal Equation
School of Education Archive Analysis: On EEF Trials Adetayo Kasim, ZhiMin Xiao, and Steve Higgins.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Link Reconstruction from Partial Information Gong Xiaofeng, Li Kun & C. H. Lai
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
BIOE 293 Quantitative ecology seminar Marm Kilpatrick Steve Munch Spring Quarter 2015.
Statistical Decision Theory
The Triangle of Statistical Inference: Likelihoood
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Introduction We present a package for the analysis of (non-hierarchical) Multinomial Processing Tree (MPT) models for the statistical programming language.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Bayesian Approach For Clinical Trials Mark Chang, Ph.D. Executive Director Biostatistics and Data management AMAG Pharmaceuticals Inc.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning 5. Parametric Methods.
- 1 - Calibration with discrepancy Major references –Calibration lecture is not in the book. –Kennedy, Marc C., and Anthony O'Hagan. "Bayesian calibration.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Discrete-State Model A Bayesian Discrete-State Model for Working Memory Eda Mızrak 1, Henrik Singmann 2, Ilke Öztekin 1 Koç University 1, University of.
Matrix Models for Population Management & Conservation March 2014 Lecture 10 Uncertainty, Process Variance, and Retrospective Perturbation Analysis.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
CS Ensembles and Bayes1 Ensembles, Model Combination and Bayesian Combination.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS479/679 Pattern Recognition Dr. George Bebis
Vertical Scaling in Value-Added Models for Student Learning
Quentin Frederik Gronau1
ICS 280 Learning in Graphical Models
Ch3: Model Building through Regression
Maximum Likelihood Estimation
Maximum Likelihood & Missing data
Henrik Singmann Karl Christoph Klauer Sieghard Beller
Henrik Singmann Karl Christoph Klauer Sieghard Beller
Henrik Singmann David Kellen Karl Christoph Klauer
Working Independence versus modeling correlation Longitudinal Example
Henrik Singmann Sieghard Beller Karl Christoph Klauer
Henrik Singmann David Kellen Christoph Klauer Johannes Falck
Using Ensembles of Cognitive Models to Answer Substantive Questions
Measuring Belief Bias with Ternary Response Sets
I. Statistical Tests: Why do we use them? What do they involve?
Henrik Singmann Karl Christoph Klauer Sieghard Beller
David Kellen, Henrik Singmann, Sharon Chen, and Samuel Winiger
A Hierarchical Bayesian Look at Some Debates in Category Learning
PSY 626: Bayesian Statistics for Psychological Science
Parametric Methods Berlin Chen, 2005 References:
Bayesian inference J. Daunizeau
CS639: Data Management for Data Science
New Results from the Bayesian and Frequentist MPT Multiverse
Presentation transcript:

A Bayesian and Frequentist Multiverse Pipeline for MPT models Applications to Recognition Memory Henrik Singmann (University of Warwick) Daniel W. Heck (University of Mannheim) Marius Barth (Universität zu Köln) Julia Groß (University of Mannheim) Beatrice G. Kuhlmann (University of Mannheim)

Multiverse Approach Statistical analysis usually requires several (more or less) arbitrary decisions between reasonable alternatives: Data processing and preparation (e.g., exclusion criteria, aggregation levels) Analysis framework (e.g., statistical vs. cognitive model, frequentist vs. Bayes) Statistical analysis (e.g., testing vs. estimation, fixed vs. random effects, pooling) Combination of decisions spans multiverse of data and results (Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016): Usually one path through multiverse (or 'garden of forking paths', Gelman & Loken, 2013) is reported Valid conclusions cannot be contingent on arbitrary decisions Multiverse analysis attempts exploration of possible results space: Reduces problem of selective reporting by making fragility or robustness of results transparent. Conclusions arising from many paths are more credible than conclusions arising from few paths. Helps identification of most consequential choices. Limits of multiverse approach: Implementation not trivial. Results must be commensurable across multiverse (e.g., estimation versus hypothesis testing).

Current Project DFG Scientific Network grant to Julia Groß and Beatrice Kuhlmann "Hierarchical MPT Modeling – Methodological Comparisons and Application Guidelines" 6 meetings over 3 years with 15 people plus external experts Multinomial processing tree (MPT) models: class of discrete-state cognitive models for multinomial data (Riefer & Batchelder, 1988) MPT models traditionally analysed with frequentist methods (i.e., 𝜒 2 / 𝐺 2 ) and aggregated data Several hierarchical-Bayesian approaches exist. Do we need those? Today: First results from model for recognition memory

Our Multiverse Statistical framework: Pooling: Results: Frequentist (i.e., maximum-likelihood) Bayesian (i.e., MCMC) Pooling: Complete pooling (aggregated data) No pooling (individual-level data) Partial pooling (hierarchical-modelling): Individual-level parameters with group-level distribution Results: Parameter point estimates: MLE and posterior mean Parameter uncertainty: ML-SE and MCMC-SE Model adequacy: 𝐺 2 p-value and posterior predictive p-value (Klauer, 2010)

Our Multiverse Pipeline (in R) Frequentist (uses MPTinR; Singmann & Kellen, 2013): (1) Traditional approach: Frequentist asymptotic complete pooling (2) Frequentist asymptotic no pooling (3) Frequentist no-pooling with parametric bootstrap (4) Frequentist no-pooling with non-parametric bootstrap Bayesian (uses TreeBUGS; Heck, Arnold, & Arnold, 2018): (5) Bayesian complete pooling (custom C++ sampler) (6) Bayesian no pooling (unique method, custom C++ sampler) (7) Bayesian partial pooling I a, Jags: Beta-MPT Jags (Smith & Batchelder, 2010) (8) Bayesian partial pooling I b, C++: Beta-MPT C++ (9) Bayesian partial pooling II: Latent trait MPT (Klauer, 2010; Jags) (10) Bayesian partial pooling III: Latent trait MPT w/o correlation parameters (Jags) (11) latent-class approach (Klauer, 2006) All implemented in R packages MPTmultiverse: (1)-(10): https://cran.r-project.org/package=MPTmultiverse hmpt (for latent-class only): https://github.com/mpt-network/hmpt

Example Application: Recognition Memory

6-point ROCs Total N = 459 Mean trials = 350 Still missing: 4 studies Total N: 459 reasons: more model variants possible, takes longer Data set Sample N Mean No. of Trials Dube & Rotello (2012, E1, Pictures (P)) 27 400 Dube & Rotello (2012, E1, Words (W)) 22 Heathcote et al. (2006, Exp. 1) 16 560 Heathcote et al. (2006, Exp. 2) 23 Jaeger et al. (2012, Exp. 1, no cue) 63 120 Jang et al. (2009) 33 140 Koen & Yonelinas (2010, pure study) 32 320 Koen & Yonelinas (2011) 20 600 Koen et al. (2013, Exp. 2, full attention) 48 200 Koen et al. (2013, Exp. 4, immediate test) 300 Pratte et al. (2010) 97 480 Smith & Duncan (2004, Exp. 2) 30

2-high threshold model (2HTM) for 6-point confidence-rating data (e. g 2-high threshold model (2HTM) for 6-point confidence-rating data (e.g., Bröder, et al., 2013) 3 core parameters: 𝐷𝑜: Probability to detect old item as old 𝐷𝑛: Probability to detect new item as new 𝑔: Probability to guess item is old (conditional on non-detection) 8 response mapping parameters (at least one needs to be equated for identifiability)

Each data point is one study Value corner: CCC = concordance correlation coefficient (measure of absolute agreement)

Conclusions Overall, methods appear to agree with each other (maximal difference ≈ .25) Agreement between estimation method, depends on parameter Even for structurally very similar core parameters (Do versus Dn) we see differences Compared to latent trait methods, Dn and g are slightly underestimated by other methods Effect of over/under-estimation does not appear to be related to sample size Parameters that show imprecision in estimation, seem to show larger parameter trade-offs Recommendation: Use multiverse approach to take uncertainty of modeling framework into account

Parameter trade-off/fungibility (based on Latent-trait group-level posteriors )

2-high threshold model (2HTM) for 6-point confidence-rating data (e. g 2-high threshold model (2HTM) for 6-point confidence-rating data (e.g., Bröder, et al., 2013) 𝑟1= 𝑟 1 𝑟2= 1− 𝑟 1 𝑟 2 𝑟3= 1− 𝑟 1 1− 𝑟 2 same for 𝑞 𝑟6= 𝑟 6 𝑟5= 1− 𝑟 6 𝑟 5 𝑟4= 1− 𝑟 6 1− 𝑟 5 same for 𝑞 For 𝑟, 𝑟 1 and 𝑟 6 expected to have most data For 𝑞, 𝑞 5 and 𝑞 2 expected to have most data Data provides 10 independent data points, full model has 11 free parameters: 3 core parameters: 𝐷𝑛, 𝐷𝑜, and 𝑔. 8 response mapping parameters For parameter identifiability: at least one parameter needs to be equated

Identifiability Restrictions 𝑟1= 𝑟 1 𝑟2= 1− 𝑟 1 𝑟 2 𝑟3= 1− 𝑟 1 1− 𝑟 2 same for 𝑞 𝑟6= 𝑟 6 𝑟5= 1− 𝑟 6 𝑟 5 𝑟4= 1− 𝑟 6 1− 𝑟 5 same for 𝑞 original Bröder et al. (2013) variant: 𝑞 5 = 𝑞 2 𝑟 5 = 𝑟 2 only q-restricted: only r-restricted: