Functional Databases for Longitudinal Analyses and Tips of the Trade: The Case of the NPHS in Canada. Amélie Quesnel-Vallée McGill University Émilie Renahy.

Slides:



Advertisements
Similar presentations
Statistical Software Packages: How do I get this into that? Gillian Byrne Memorial University of Newfoundland Atlantic DLI Training - April 23, 2004.
Advertisements

1 Session 10 Sampling Weights: an appreciation. 2 To provide you with an overview of the role of sampling weights in estimating population parameters.
Mobile Surveyor A Windows PDA/Mobile based survey Software for easy, fast and error free data collection.
Labour Force Historical Review Sandra Keys, University of Waterloo DLI OntarioTraining University of Guelph, Guelph, ON April 12, 2006.
The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Preparing Data for Quantitative Analysis
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
Willingness to pay for private voluntary health insurance in southeast Nigeria Obinna Onwujekwe a and Edit V. Velényi b a Health policy Research Group/Department.
McGraw-Hill/Irwin McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
Advantages of Monitoring Vegetation Restoration With the Carolina Vegetation Survey Protocol M. Forbes Boyle, Robert K. Peet, Thomas R. Wentworth, and.
Introduction to SPSS Descriptive Statistics. Introduction to SPSS Statistics Program for the Social Sciences (SPSS) Commonly used statistical software.
2014 Nordic and Baltic Stata Users Group Metting Working sideways in Stata Jakob Hjort DataManager, MPH Department of Cardiology Aarhus University Hospital.
15a.Accessing Data: Frequencies in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
The Practice of Statistics
213Sampling.pdf When one is attempting to study the variable of a population, whether the variable is qualitative or quantitative, there are two methods.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Getting Started with your data
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
UDM MSC COURSE IN EDUCATION & DEVELOPMENT 2013 –
Joint Canada/U.S. Health Survey Catherine Simile, National Center for Health Statistics Patrice Mathieu, Statistics Canada Ed Rama, Statistics Canada NCHS.
World Bank, Africa Region, Africa Household Survey Databank - The World Bank - Africa.
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
The Mimix Command Reference Based Multiple Imputation For Sensitivity Analysis of Longitudinal Trials with Protocol Deviation Suzie Cro EMERGE.
Learning Outcomes Assessment in WEAVEonline
System-level and RESA Administrators Functions. Topics Manually creating new student account Manually creating new teacher account Importing data Viewing.
Introduction to SPSS Edward A. Greenberg, PhD
Ts_print in a few easy steps There are four screens: Entities, Data Items, Date, and Report Format.
10/3/20151 PUAF 610 TA Session 4. 10/3/20152 Some words My –Things to be discussed in TA –Questions on the course and.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
Niraj J. Pandya, Element Technologies Inc., NJ.  Summarize all possible combinations of class level variables even if few categories are altogether missing.
DLI Workshop -- Mar Hosted by Dalhousie University March 2000 DLI Training Workshop.
Consumer behavior studies1 CONSUMER BEHAVIOR STUDIES STATISTICAL ISSUES Ralph B. D’Agostino, Sr. Boston University Harvard Clinical Research Institute.
DLI Boot Camp 2011 Finding Statistics: Tools and Techniques Jean Blackburn Vancouver Island University Library SDA.
Framework of Statistical Information. This is a typology of the categories or classes of statistical information. Remember the relationship between statistics.
Chap 1-1 Statistics for Managers Using Microsoft Excel ® 7 th Edition Chapter 1 Defining & Collecting Data Statistics for Managers Using Microsoft Excel.
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
 Statistics The Baaaasics. “For most biologists, statistics is just a useful tool, like a microscope, and knowing the detailed mathematical basis of.
SOC 503 Techniques & Methods of Social Science Data Resources at Princeton University.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.
June 21, Objectives  Enable the Data Analysis Add-In  Quickly calculate descriptive statistics using the Data Analysis Add-In  Create a histogram.
Blackboard 8: Grade Center This workshop is for existing users of Blackboard interested in keeping track of student grades online. Blackboard replaced.
Hands-on Tool Training: Preventable Hospitalization Costs: A County-Level Mapping Tool State Healthcare Quality Improvement Workshop: Tools You Can Use.
Lynn Lethbridge SHRUG November, What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.
Disclosure Analysis: What do RDC Analysts do? Research Data Centre Program, Statistics Canada James Chowhan Ontario DLI Training, Queen's University
National Boot camp Vancouver Heather Dryburgh and Michel B. Séguin May 31 st, 2011 Survey Life cycle.
Data Collection and Experimental Design. Data Collection Methods 1. Observational study 2. Experiment 3. Simulation 4. Survey.
Analysis of Experiments
Proposed Statistical Methodology for the Canadian Heart Health Surveys Follow-up Study
Carrying out a Survey We carry out surveys to enable us to gain more information on topics that are of particular interest to us e.g. eating habits, exercise.
Using Data from the National Survey of Children with Special Health Care Needs Centers for Disease Control and Prevention National Center for Health Statistics.
: LSS1 Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, Data and Variable Management Paul Lambert.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 1 Statistics: The Art and Science of Learning from Data Section 1.1 Using Data to Answer.
CACI Proprietary Information | Date 1 PD² SR13 Client Upgrade Name: Semarria Rosemond Title: Systems Analyst, Lead Date: December 8, 2011.
Statistics Canada National Population Health Surveys (NPHS) Amir Erfani, PhD. Department of Sociology Nipissing University North Bay,
Real Time Remote Access: Educational resources Susan Mowers, University of Ottawa.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
T3/Tutorials: Data Submission
Repeating Forms/Events
COMMUNITY ACCOUNTABILITY PLANNING SUBMISSIONS (CAPS) & MULTI-SECTOR SERVICE ACCOUNTABILITY AGREEMENTS (MSAA) CAPS And Schedule Refresh.
ADE EDIS READ & Optimizer TRAINING Colorado Department of Education
Jonathan W. Duggins; James Blum NC State University; UNC Wilmington
Section 1.3 Data Collection and Experimental Design.
Introduction to Stata Spring 2017.
Amélie Quesnel-Vallée
This presentation document has been prepared by Vault Intelligence Limited (“Vault") and is intended for off line demonstration, presentation and educational.
Stata Basic Course Lab 2.
Day 2: introduction.
Presentation transcript:

Functional Databases for Longitudinal Analyses and Tips of the Trade: The Case of the NPHS in Canada. Amélie Quesnel-Vallée McGill University Émilie Renahy University of Toronto

Data matrix structures: “wide” and “long” formats Wide format Long format Source:

Preparing data for longitudinal analyses One basic, common variable naming rule for reshaping from wide to long Marker for time of data collection (cycle, calendar year, etc) is: – a numerical stub, – at the end of the variable name – Ex: VARNAME2012

The National Population Health Survey “In the fall of 1991, the National Health Information Council recommended that an ongoing national survey of population health be conducted.” – Motivated by “economic and fiscal pressures on the health care systems and the requirement for information with which to improve the health status of the population in Canada.” In 1992, Statistics Canada received funding to carry out the NPHS It is composed of three components: the Households, the Health Institutions, and the North components. Source:

The Longitudinal Household Component of the NPHS Biennial, from 1994/ /11 (9 cycles) n=17,276 for the longitudinal household component (69.7% response rate in cycle 9) Multistage, stratified random sampling, designed to ensure adequate representation across major urban centers, smaller towns, and rural areas in all provinces. People living in Native reserves, military bases, institutions, and some remote areas of Ontario and Québec were excluded. Source:

Preparing NPHS data for longitudinal analyses NPHS variable naming rules: xxxCYCLEzzzz, where – xxx refers to the questionnaire section – CYCLE refers to the data collection cycle – zzzz refers to the specific question Two idiosyncratic challenges: – Location: cycle is positioned in the middle – Identifier: One digit, either a number or a letter, depending on the period of data collection From 1994 to 2002, numbers are used (4, 6, 8, 0, or 2 respectively) From , letters (A-D) are used because numbers would not have provided unique cycle identifiers

Solution: Development of a SAS macro Two options: – User-specific list of variables: Recommended! – Full data matrix: Time consuming and prone to errors with time-invariant variables in long format Available in both official languages To be made available to RDC users across Canada

Using the package, easy as 1, 2, 3 1.Read important comments and warning For instance, if the variable was not measured in a given cycle, the macro will create a variable with all missing values 2.Replace all XXX by the relevant info. Hint: use the 'Find' option (Ctrl+F) to find them all! 3.Run the macro in SAS: Select all (Ctrl+A) then click on the menu Run \ Submit (or F3 button). -> Three pairs of wide and long format datasets will be created, allowing the use of any statistical software: 2 SAS dataset 2 Comma Separated Values (.cvs) 2 Tab Delimited File (.txt)

WARNING! It is the researcher's responsibility to verify: 1.Whether the question was asked in all cycles 2.Whether the response categories were the same across all cycles To this end, consult the NPHS documentation.

Summarizing longitudinal information Using egen in Stata on a wide matrix – anycount: Count the number of events (e.g. poor health) experienced by a respondent over time – anymatch: Detect presence or absence of event over a time period – concat: Creates a summary “trajectory” of events for an individual over a time period. Source:

WARNING Missing values are often turned into “0” in egen Always declare missing values on created variables

Row* commands in egen rowmiss: Gives the number of missing values in varlist for each observation (row). rownonmiss: Gives the number of nonmissing values in varlist for each observation (row) -- this is the value used by rowmean() for the denominator in the mean calculation. rowmean, rowmedian, rowmax, rowmin: Respectively creates the (row) means, medians, max and min of the variables in varlist, ignoring missing values.

Acknowledgements