ISCTSC Workshop A7 Best Practices in Data Fusion.

Slides:



Advertisements
Similar presentations
1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.
Advertisements

Jörg Drechsler (Institute for Employment Research, Germany) NTTS 2009 Brussels, 20. February 2009 Disclosure Control in Business Data Experiences with.
Paul Smith Office for National Statistics
What is a review? An article which looks at a question or subject and seeks to summarise and bring together evidence on a health topic.
Split Questionnaire Designs for Consumer Expenditure Survey Trivellore Raghunathan (Raghu) University of Michigan BLS Workshop December 8-9, 2010.
Research developments at the Census Bureau Roderick J. Little Associate Director for Research & Methodology and Chief Scientist Bureau of the Census.
GROUP-LEVEL DESIGNS Chapter 9.
Comments on Hierarchical models, and the need for Bayes Peter Green, University of Bristol, UK IWSM, Chania, July 2002.
Deliverable 2.8: Outliers Gary Brown Office for National Statistics UK.
Data Collection Six Sigma Foundations Continuous Improvement Training Six Sigma Foundations Continuous Improvement Training Six Sigma Simplicity.
An Approach for Base Transit Trip Matrix Development: Sound Transit EMME/2 Model Experience Sujay Davuluri Parsons Brinckerhoff Inc., Seattle October,
Barteld Braaksma and Kees Zeelenberg “Re-make / Re-model”: Should big data change the modelling paradigm in official statistics?
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
Frank Yu Australian Bureau of Statistics Unstructured Data 1.
+ Evidence Based Practice University of Utah Presented by Will Backner December 2009 Training School Psychologists to be Experts in Evidence Based Practices.
March 2013 ESSnet DWH - Workshop IV DATA LINKING ASPECTS OF COMBINING DATA INCLUDING OPTIONS FOR VARIOUS HIERARCHIES (S-DWH CONTEXT)
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.
Stat 301 – Day 15 Comparing Groups. Statistical Inference Making statements about the “world” based on observing a sample of data, with an indication.
Winter Institute on Statistical Literacy for Librarians, February 17-19, 2010 Review of the Pre- workshop Readings Chuck Humphrey.
Chapter 10 Collecting Quantitative Data. SURVEY QUESTIONNAIRES Establishing Procedures to Collect Survey Data Recording Survey Data Establishing the Reliability.
© John M. Abowd 2005, all rights reserved Recent Advances In Confidentiality Protection John M. Abowd April 2005.
1 BIOS 164 Developing a Sample Design. 2 Presentation #8 Lecture Notes:12.
CONFIDENCE INTERVALS What is the Purpose of a Confidence Interval?
Opportunities & Challenges Using Passively Collected Data In Travel Demand Modeling 15 th TRB Transportation Planning Applications Conference Atlantic.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
WRITING PROPOSALS WITH STRONG METHODOLOGY AND IMPLEMENTATION Kusum Singh, Virginia Tech Gavin W. Fulmer, National Science Foundation 1.
Trade and business statistics: use of administrative data Lunch Seminar Enrico Giovannini Italian National Statistical Institute (ISTAT) New York, February,
Chapter 8 Introduction to Hypothesis Testing
Copyright © 2007 Pearson Education Canada 3-1 Marketing Research Marketing research serves many roles. It can: 1.Link companies with customers via information.
1 Presentation to OG6 Canberra, Australia May 2011 Statistical Uses of Administrative Data in Canada.
Copyright 2010, The World Bank Group. All Rights Reserved. Business tendency surveys, part 2 1 Business statistics and registers.
Longitudinal Data Recent Experience and Future Direction August 2012.
Challenges in Collecting Police-Reported Crime Data Colin Babyak Household Survey Methods Division ICES III - Montreal – June 20, 2007.
Data Quality & dissemination D. Sahoo Dy. Director General Central Statistical Organization, India.
Comments: The Big Picture for Small Areas Alan M. Zaslavsky Harvard Medical School.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
FDOT Transit Office Modeling Initiatives The Transit Office has undertaken a number of initiatives in collaboration with the Systems Planning Office and.
Generalizing Observational Study Results Applying Propensity Score Methods to Complex Surveys Megan Schuler Eva DuGoff Elizabeth Stuart National Conference.
Strategies for Managing the Online Workload CADE 2003 St. John’s Newfoundland June, 2003.
1 1 A statistical approach to surrogate data Li-Chun Zhang Statistics Norway
SURVEYS WORKSHOP Aberdeen, 18 th May 2009 ScotStat Network of Analysts from Local Government and Other Public Bodies.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
Disclosure Limitation in Microdata with Multiple Imputation Jerry Reiter Institute of Statistics and Decision Sciences Duke University.
Topic (i): Selective editing / macro editing Discussants Orietta Luzi - Italian National Statistical Institute Rudi Seljak - Statistical Office of Slovenia.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
QUALITY ASSESSMENT OF THE REGISTER-BASED SLOVENIAN CENSUS 2011 Rudi Seljak, Apolonija Flander Oblak Statistical Office of the Republic of Slovenia.
Towards a Process Oriented View on Statistical Data Quality Michaela Denk, Wilfried Grossmann.
ReCap Part II (Chapters 5,6,7) Data equations summarize pattern in data as a series of parameters (means, slopes). Frequency distributions, a key concept.
NC-BSI: TASK 3.5: Reduction of False Alarm Rates from Fused Data Problem Statement/Objectives Research Objectives Intelligent fusing of data from hybrid.
The 2011 Census: Estimating the Population Alexa Courtney.
SURVEYS WORKSHOP Edinburgh City Council, 16 th April 2009 ScotStat Network of Analysts from Local Government and Other Public Bodies.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Workshop on MDG, Bangkok, Jan.2009 MDG 3.2: Share of women in wage employment in the non-agricultural sector National and global data.
IAOS Shanghai – Reshaping Official Statistics Some Initiatives on Combining Data to Support Small Area Statistics and Analytical Requirements at.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
Research Methods & Design Outline
New data sources (such as Big Data) and Traditional Sources Work Package 2.
Applications in Mobile Technology for Travel Data Collection 2012 Border to Border Transportation Conference South Padre Island, Texas November, 13, 2012.
Metadata models to support the statistical cycle: IMDB
Methods for Data-Integration
Data Analysis.
National Institute of Standards and Technology (NIST) Advanced Manufacturing Technology Consortia (AMTech) Program Award Number: 70NANB14H056 Development.
Multiple Imputation Using Stata
MANAGING KNOWLEDGE FOR THE DIGITAL FIRM
Estimation methods for the integration of administrative sources
The European Statistical Training Programme (ESTP)
3.4 Modernisation of Social Statistics
Using clinical trial data as real-world evidence
Presentation transcript:

ISCTSC Workshop A7 Best Practices in Data Fusion

Objectives Indentify the state of the art and the state of practice Identify key research challenges and opportunities Identify tangible ways to accelerate methodological innovation and adoption in practice

What exactly is data fusion? Using more than one data source to estimate a parameter of interest

What exactly is data fusion? Using more than one data source to estimate a parameter of interest

SOP & SOA (1) There is a long history of data fusion in transport, but very fragmented Examples –Synthetic population generation –OD matrix updating –Data enrichment in discrete choice model estimation –Network state estimation –Activity pattern feature extraction from trace data –Use of multiple survey modes –Activity and time use survey consolidation –Population exposure modelling –Public transport (e.g. UK bus) OD matrix estimation

Summary: SOP & SOA (2) Problem types: –Direct observation by multiple methods Requires error model Does not in general require system process model –Direct and indirect observation Requires error model Requires additionally a system process model to link indirect observations to parameters of interest Methods: –‘Record linking’ methods (e.g., statistical matching, data mining, imputation, fuzzy logic) –Model-based inference (e.g., FIML, filtering, Bayesian inference)

Research needs (1) Enabling research –Better meta data (survey/data collection process + context) to support informed fusion (specially important in era of web 2.0) –More professional and disciplined protocols in reporting data treatments in published work –Better techniques of disclosure management –Understanding how to make the business case for data fusion Benefits - sample size, precision; Barriers – perception of ‘made up data’, threat to incumbent data providers

Research needs (2) Methodological research –Detecting genuinely conflicting information (not fuseable) – a form of specification test –Better means of validating fused data –Better methods for modelling the propagation of data and model uncertainty during data fusion – enhance confidence in fused data –Are deterministic/’mean imputation’ approaches adequate – how seriously do they distort the covariance structure? –Better re-sampling/Bayesian methods in high dimensions –Integrate methods from SAE –Opportunities to reduce respondent burden by split designs and ex-post fusion (a la SP surveys and analysis) – question substitutability –For record matching, what are the key connecting variables?

Research needs (3) Research infrastructure –Establish to more consistent and complete taxonomy of data fusion problems, methods, outcomes –Establish reference datasets and reference ‘cases’