20.7.04: LSS1 Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, 20.7.04 Data and Variable Management Paul Lambert.

Slides:



Advertisements
Similar presentations
Multiple Indicator Cluster Surveys Data Entry and Processing.
Advertisements

For the e-Stat meeting of 6-7 April 2011 Paul Lambert / DAMES Node inputs 1)Updates on DAMES 2)Bringing DAMES inputs to e-Stat 3)Misc. feedback - Stat-JR.
Poverty trajectories after risky life events in Germany, Spain, Denmark and the United Kingdom: a latent class approach Leen Vandecasteele Post-doctoral.
What is Event History Analysis?
ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government.
Multilevel Event History Modelling of Birth Intervals
What is Event History Analysis?
Multilevel Event History Models with Applications to the Analysis of Recurrent Employment Transitions Fiona Steele.
Research Methods Lecture 3 More STATA Ian Walker Room S2.109   Slides available at:
BHPS User Group 19 January Overview News on progress with data availability and access. Uses of the new sub-samples and weighting Open session -
Role of NSOs in Analysis John Cornish. Analysis underpins effective NSO operations Analysis is broad in extent, and it supports all phases of the production.
Work–life harmonisation and fertility in Australia: an event history analysis using HILDA data Hideki Nakazato, Konan University, Kobe, Japan
Non-employment and the welfare state: the UK and Germany compared J Clasen, J. Davidson, H. Ganssmann, A. Mauer Journal of European Social Policy, 16,
Epidemiologic study designs
1 Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
1 Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
1 Quantitative Approaches to Longitudinal Research Session 2 of RCBN training workshop Longitudinal Research in Education, University of York,
Quantitative Longitudinal Data Paul Lambert and Vernon Gayle Stirling University Prepared for “Longitudinal Data Analysis for Social Science Researchers:
Software for data management: The contribution of Stata Dr Karen Robson, Senior Research Fellow, The Geary Institute, University College Dublin, Ireland.
Biostatistics ~ Types of Studies. Research classifications Observational vs. Experimental Observational – researcher collects info on attributes or measurements.
Main Points to be Covered
Chapter 1: Data Collection
Main Points to be Covered Cumulative incidence using life table method Difference between cumulative incidence based on proportion of persons at risk and.
Basic Concept of Data Coding Codes, Variables, and File Structures.
Analysis of Complex Survey Data
Social Research Methods
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Complexities of Complex Survey Design Analysis. Why worry about this? Many government studies use these designs – CDC National Health Interview Survey.
Geo479/579: Geostatistics Ch13. Block Kriging. Block Estimate  Requirements An estimate of the average value of a variable within a prescribed local.
Multilevel Modeling Using HLM and MLwiN Xiao Chen UCLA Academic Technology Services.
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories
1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
Sep 2005:LDA - ONS1 Event history data structures and data management Paul Lambert Stirling University Prepared for “Longitudinal Data Analysis for Social.
The 2006 National Health Interview Survey (NHIS) Paradata File: Overview And Applications Beth L. Taylor 2008 NCHS Data User’s Conference August 13 th,
Key Data Management Tasks in Stata
Parents’ basic skills and children’s test scores Augustin De Coulon, Elena Meschi and Anna Vignoles.
Modeling Developmental Trajectories: A Group-based Approach Daniel S. Nagin Carnegie Mellon University.
Longitudinal Data Analysis Professor Vernon Gayle
Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007.
Data Collection and Sampling
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Functional Databases for Longitudinal Analyses and Tips of the Trade: The Case of the NPHS in Canada. Amélie Quesnel-Vallée McGill University Émilie Renahy.
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
ELSA ELSA datasets and documentation available from the archive or by special arrangement Kate Cox National Centre for Social.
Sep 2006: LDA1 Data sources and Data structure: Panel data Paul Lambert Stirling University Prepared for “Longitudinal Data Analysis for Social Science.
Essex Dependent Interviewing Workshop 17/09/2004 British Household Panel Survey.
BUSI 6480 Lecture 8 Repeated Measures.
Discrete Choice Modeling William Greene Stern School of Business New York University.
GEODE - Durban ISA RC33, July 2006 Utilising a Grid Enabled Occupational Data Environment GEODE – Paper presented.
AN EXAMPLE OF COOPERATION & SOME WIDER ISSUES Ian Plewis (Bedford Group, Institute of Education) & Stephen Morris (Social Research Division, Department.
More complex event history analysis. Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working UNEMPLOYMENT AND RETURNING TO WORK STUDY Spell or Episode.
Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,
Stretching Your Data Management Skills Chuck Humphrey University of Alberta Atlantic DLI Workshop 2003.
Analysis of Experiments
Joint UNECE-Eurostat worksession on confidentiality, 2011, Tarragona Sampling as a way to reduce risk and create a Public Use File maintaining weighted.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survey Design: Some Implications for.
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
A research and policy informed discussion of cross-curricular approaches to the teaching of mathematics and science with a focus on how scientific enquiry.
1 Data Quality Report Quality Assurance Report Live Data Download Site Datasets (SAS) Research Datasets Customized Cohort Reports Outcome Analytics Patient.
Linking data resources Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 3 on.
Adjusting for coverage error in administrative sources in population estimation Owen Abbott Research, Development and Infrastructure Directorate.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survey Design: Some Implications for.
Multiple Imputation Using Stata
Introduction to Stata Spring 2017.
The Right Way to code simulation studies in Stata
Presentation transcript:

: LSS1 Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, Data and Variable Management Paul Lambert

: LSS2 Data Management for Longitudinal Data 1. The Nature of ‘Large and Complex’ Data 2. Data management & STATA – getting started 3. Longitudinal Data Types 4. Merging Datasets

: LSS3 The nature of ‘large and complex’ longitudinal resources: complicating the variable by case matrix Cases   Variables  A B C A N

: LSS4 Large and complex =  Complexity in: Multiple hierarchies of measurement Array of variables / operationalisations Relations between / subgroups of cases Multiple points of measurement –Balanced or unbalanced repeated contacts –Censored duration data Sample collection and weighting

: LSS5 i) Multiple hierarchies (levels) of measurement n Common examples: Both individuals and households Schools and pupils People and local districts and regions n Solutions: Separate VxC matrix for each level, eg BHPS Merged VxC matrix at lowest level

6 Illustration: Hierarchical dataset ClusterPerson  Person-level Vars  n1=3n2=8

: LSS7 ii) Array of variables n Vast number of variable responses, eg 1K+ Recoding multiplies these up, eg dummies Multiple response var.s (‘all that apply’) Categorisations / indexes (eg occupations) n Implication: Either separate files for separate var. groups Or very long and difficult files…

: LSS8 iii) Relations between cases n All respondents in a household n Husbands and wives both sampled n Fellow school pupils sampled n Longitudinal: differing relations with others at different times n Outcomes: Link information between related cases

: LSS9 iv) Multiple measurement points n Longitudinal: information on same cases for multiple time points  Panel or cohort: several records via repeated contact for each individual Problems of ‘unbalanced’ panels  Life history / retrospective: Durations in spells: multistate / multiepisode, overlapping spells; time varying covariates Left or right censoring of durations in spells

: LSS10 v) Sample collection / weighting n Multistage cluster particularly popular n Sample may have been clustered, stratified n Longitudinal: uneven inclusion of cases over time n Sample weights designed to solve, but: Complex in application Not suited to all applications

: LSS11 Data Management for Longitudinal Data 1. The Nature of ‘Large and Complex’ Data 2. Data management & STATA – getting started 3. Longitudinal Data Types 4. Merging Datasets

: LSS12 STATA data management examples: see datmanag_part1.do Claim: For data management, STATA is powerful, but not always well designed n Batch files / interactive syntax / programs n Data entry / browsing n Variable labels n Computing / recoding n Missing values n Weighting data n Survey estimators (svy)

: LSS13 Data Management for Longitudinal Data 1. The Nature of ‘Large and Complex’ Data 2. Data management & STATA – getting started 3. Longitudinal Data Types 4. Merging Datasets

: LSS14 Typology of longitudinal data files n 3 Sets of contrasts : 1. Repeated X-section / Panel / Cohort Event History / Time Series 2. Wide v’s Long 3. Discrete v’s Continuous time See datmanag_part 2.do

: LSS15 Contrast 1 Type A: Repeated x-sect data SurveyPerson  Person-level Vars  N_s=3N_c=8

: LSS16 C1 Type B: Panel dataset (Unbalanced) CasesYear  Variables  n1=3n2=8

: LSS17 C1 Type C : Event history data analysis n Alternative data sources: Panel / cohort (more reliable) Retrospective (cheaper, but recall errors) n Aka: ‘Survival data analysis’; ‘Failure time analysis’; ‘hazards’; ‘risks’;.. Focus shifts to length of time in a ‘state’ - analyses determinants of time in state

: LSS18 Key to event histories is ‘state space’

: LSS19 C1 Type D: Time series data **Exact equivalence to panel data format Examples: n Unemployment rates by year in UK n University entrance rates by year by country Statistical summary of one particular concept, collected at repeated time points from one or more subjects

: LSS20 Contrast 2: ‘Wide’ versus ‘Long’ format Relevant to all types of dataset:  ‘Wide’ = 1 case per record (person), additional vars for time points : Person 1 Sex YoB Var1_92 Var1_93 Var1_94 … Person 2 …  ‘Long’ = 1 case per time point within person (as panel data example) STATA: ‘reshape’ command allows transfer between the two formats

: LSS21 Contrast 3: Continuous v’s Discrete time Primarily in terms of event history datasets n Continuous time (‘spell files’, ‘event oriented’)  One episode per case, time in case is a variable n Discrete time  One episode per time unit, type of event and event occurrence as variables n Analyses: Most packages can handle either format comfortably

: LSS22

: LSS23

: LSS24 Data Management for Longitudinal Data 1. The Nature of ‘Large and Complex’ Data 2. Data management & STATA – getting started 3. Longitudinal Data Types 4. Merging Datasets

: LSS25 Matching files n Complex data inevitably involves more than one related data file n A vital data analysis skill!! n Link data between files by connecting them according to key linking variable(s) n Eg, ‘person identifier’ variable ‘pid’ n Eg : See datmanag_part3.do

: LSS26 Types of file matching n Case-to-case matching One-to-one link, eg two files with different sets of variables for same people STATA: append or merge n Table distribution One-to-many link, eg one file has individuals, another has households, and match household info to the individuals STATA: merge

: LSS27 Types of file matching ctd n Aggregating Summarise over multiple cases then link summaries back to cases STATA: collapse n Related cases matching Link info from one related case to another case, eg info on spouse put on own case STATA: merge or joinby

: LSS28 STATA file matching crib: _merge = indicator of cases present for: 1 = Master file but not input file 2 = Input file but not Master file 3 = Master and input file