Thinning Measurement Models and Questionnaire Design Ricardo Silva University College London CSML PI Meeting, January 2013.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.
VALIDITY AND RELIABILITY
CS479/679 Pattern Recognition Dr. George Bebis
Pattern Recognition and Machine Learning
AGC DSP AGC DSP Professor A G Constantinides©1 Modern Spectral Estimation Modern Spectral Estimation is based on a priori assumptions on the manner, the.
ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.
Dynamic Bayesian Networks (DBNs)
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
Planning under Uncertainty
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Sensor placement applications Monitoring of spatial phenomena Temperature Precipitation... Active learning, Experiment design Precipitation data from Pacific.
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Reporter : Mac Date : Multi-Start Method Rafael Marti.
Conditional Random Fields
Independent Component Analysis (ICA) and Factor Analysis (FA)
Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Optimal Placement and Selection of Camera Network Nodes for Target Localization A. O. Ercan, D. B. Yang, A. El Gamal and L. J. Guibas Stanford University.
Bayesian Model Selection in Factorial Designs Seminal work is by Box and Meyer Seminal work is by Box and Meyer Intuitive formulation and analytical approach,
THE STRUCTURE OF THE UNOBSERVED Ricardo Silva – Department of Statistical Science and CSML HEP Seminar February 2013.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Mean Field Variational Bayesian Data Assimilation EGU 2012, Vienna Michail Vrettas 1, Dan Cornford 1, Manfred Opper 2 1 NCRG, Computer Science, Aston University,
A unifying framework for hybrid data-assimilation schemes Peter Jan van Leeuwen Data Assimilation Research Center (DARC) National Centre for Earth Observation.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
Randomized Algorithms for Bayesian Hierarchical Clustering
A reinforcement learning algorithm for sampling design in Markov random fields Mathieu BONNEAU Nathalie PEYRARD Régis SABBADIN INRA-MIA Toulouse
Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.
Sparse Approximations to Bayesian Gaussian Processes Matthias Seeger University of Edinburgh.
Quality of LP-based Approximations for Highly Combinatorial Problems Lucian Leahu and Carla Gomes Computer Science Department Cornell University.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Lecture 2: Statistical learning primer for biologists
Gaussian Processes For Regression, Classification, and Prediction.
Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.
Week Aug-24 – Aug-29 Introduction to Spatial Computing CSE 5ISC Some slides adapted from the book Computing with Spatial Trajectories, Yu Zheng and Xiaofang.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Sparse Approximate Gaussian Processes. Outline Introduction to GPs Subset of Data Bayesian Committee Machine Subset of Regressors Sparse Pseudo GPs /
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
The SweSAT Vocabulary (word): understanding of words and concepts. Data Sufficiency (ds): numerical reasoning ability. Reading Comprehension (read): Swedish.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
CS479/679 Pattern Recognition Dr. George Bebis
Learning Deep Generative Models by Ruslan Salakhutdinov
Probability Theory and Parameter Estimation I
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Latent Variables, Mixture Models and EM
Thinning Measurement Models and Questionnaire Design
CSCI 5822 Probabilistic Models of Human and Machine Learning
CSCI 5822 Probabilistic Models of Human and Machine Learning
Modern Spectral Estimation
Bayesian Models in Machine Learning
Estimation Error and Portfolio Optimization
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Master Thesis Presentation
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
Learning From Observed Data
Probabilistic Surrogate Models
Presentation transcript:

Thinning Measurement Models and Questionnaire Design Ricardo Silva University College London CSML PI Meeting, January 2013

The Joy of Questionnaires

Postulated Usage  A measurement model for latent traits  Information of interested postulated to be in the latents

Task  Given a measurement model, select a subset of given size K that preserves information about latent traits  Criterion: “Longer surveys take more time to complete, tend to have more missing data, and have higher refusal rates than short surveys. Arguably, then, techniques to reducing the length of scales while maintaining psychometric quality are wortwhile.” (Stanton, 2002) X = latent variables, Y = observations, Y z = a subset of size K

Optimization Problem  Solve for subject to each z i  {0, 1} and where H M (Y z ) is marginal entropy  Notice: equivalent to minimising conditional entropy  Other divergence functions could be used in principle

Issues  Hard combinatorial optimization problem  Even objective function cannot be calculated exactly  Focus: models where small marginals of Y can be calculated without major effort  Working example: multivariate probit model

An Initial Approach  Gaussian approximation maximize this by greedy search  Covariance matrix: use “exact” covariance matrix  Doable in models such as the probit without an Expectation- Propagation scheme

Alternatives  More complex approximations  But how far can we intertwine the approximation with the optimisation procedure?  i.e., use a single global approximation that can be quickly calculated to any choice of Z

Alternative: Using Entropy Bounds  Exploit this using a given ordering e of variables (Globerson and Jaakkola, 2007) and a choice of neighborhood N(e, i)  Optimize ordering using a permutation-optimization method to get tightest bound  E.g., Teyssier and Koller (2005), Jaakkola et al., (2010), etc.  Notice role of previous assumption: each entropy term can be computed in a probit model for small N(e, i)

Method I: Bounded Neighborhood  Plug in approximate entropy term into objective function where  ILP formulation: first define power set Glover and Woolsey (1974)

Method I: Interpretation  Entropy for full model:  That of a Bayesian network (DAG model) approximation  Entropy for submodels (K items):  That of a crude marginal approximation to the DAG approximation X1X1 Y1Y1 Y2Y2 Y3Y3 Y1Y1 Y2Y2 Y3Y3 Starting model DAG approximation with |N| = 1, z = (1, 1, 1) Y1Y1 Y3Y3 Marginal with |N| = 1, z = (1, 0, 1) Y1Y1 Y3Y3 Actual marginal used, z = (1, 0, 1)

Method II: Tree-structures  Motivation: different selections for z don’t wipe out marginal dependencies

Method II: Interpretation  For a fixed ordering, find a directed tree-structured submodel of size K Y1Y1 Y2Y2 Y3Y3 Y4Y4 Tree approximation for z = (1, 1, 1, 1) Y1Y1 Y2Y2 Y4Y4 Tree approximation for z = (1, 1, 0, 1) if Y 4 more strongly associated with Y 2 than Y 1 Y1Y1 Y2Y2 Y4Y4 Alternative

Evaluation  Synthetic models, evaluation against a fairly good heuristic, the reliability score  Evaluation:  for different choices of K, obtain selection Y z for each of the four methods  given selection, evaluate how well latent posterior expectations are with respect to model using all Y

Evaluation

Evaluation: NHS data  Model fit using 50,000 points, pre selection of 60+ items  Structure following structure of questionnaire  Relative improvement over reliability score and absolute mean squared errors

Conclusion  How to combine the latent information criterion with other desirable criterion?  Integrating the order optimization step with the item selection step  Incorporating uncertainty in the model  Relation to adaptive methods in computer-based questionnaires