Download presentation
Presentation is loading. Please wait.
Published byFerdinand Nicholson Modified over 8 years ago
1
20.7.04: LSS1 Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, 20.7.04 Data and Variable Management Paul Lambert
2
20.7.04: LSS2 Data Management for Longitudinal Data 1. The Nature of ‘Large and Complex’ Data 2. Data management & STATA – getting started 3. Longitudinal Data Types 4. Merging Datasets
3
20.7.04: LSS3 The nature of ‘large and complex’ longitudinal resources: complicating the variable by case matrix Cases Variables 11171.73A.... 21181.85B.... 32171.60C.... 42181.69A............. N
4
20.7.04: LSS4 Large and complex = Complexity in: Multiple hierarchies of measurement Array of variables / operationalisations Relations between / subgroups of cases Multiple points of measurement –Balanced or unbalanced repeated contacts –Censored duration data Sample collection and weighting
5
20.7.04: LSS5 i) Multiple hierarchies (levels) of measurement n Common examples: Both individuals and households Schools and pupils People and local districts and regions n Solutions: Separate VxC matrix for each level, eg BHPS Merged VxC matrix at lowest level
6
6 Illustration: Hierarchical dataset ClusterPerson Person-level Vars 1113811 1223422 1326-- 2114513 2224111 3112022 3212522 3312011 n1=3n2=8
7
20.7.04: LSS7 ii) Array of variables n Vast number of variable responses, eg 1K+ Recoding multiplies these up, eg dummies Multiple response var.s (‘all that apply’) Categorisations / indexes (eg occupations) n Implication: Either separate files for separate var. groups Or very long and difficult files…
8
20.7.04: LSS8 iii) Relations between cases n All respondents in a household n Husbands and wives both sampled n Fellow school pupils sampled n Longitudinal: differing relations with others at different times n Outcomes: Link information between related cases
9
20.7.04: LSS9 iv) Multiple measurement points n Longitudinal: information on same cases for multiple time points Panel or cohort: several records via repeated contact for each individual Problems of ‘unbalanced’ panels Life history / retrospective: Durations in spells: multistate / multiepisode, overlapping spells; time varying covariates Left or right censoring of durations in spells
10
20.7.04: LSS10 v) Sample collection / weighting n Multistage cluster particularly popular n Sample may have been clustered, stratified n Longitudinal: uneven inclusion of cases over time n Sample weights designed to solve, but: Complex in application Not suited to all applications
11
20.7.04: LSS11 Data Management for Longitudinal Data 1. The Nature of ‘Large and Complex’ Data 2. Data management & STATA – getting started 3. Longitudinal Data Types 4. Merging Datasets
12
20.7.04: LSS12 STATA data management examples: see datmanag_part1.do Claim: For data management, STATA is powerful, but not always well designed n Batch files / interactive syntax / programs n Data entry / browsing n Variable labels n Computing / recoding n Missing values n Weighting data n Survey estimators (svy)
13
20.7.04: LSS13 Data Management for Longitudinal Data 1. The Nature of ‘Large and Complex’ Data 2. Data management & STATA – getting started 3. Longitudinal Data Types 4. Merging Datasets
14
20.7.04: LSS14 Typology of longitudinal data files n 3 Sets of contrasts : 1. Repeated X-section / Panel / Cohort Event History / Time Series 2. Wide v’s Long 3. Discrete v’s Continuous time See datmanag_part 2.do
15
20.7.04: LSS15 Contrast 1 Type A: Repeated x-sect data SurveyPerson Person-level Vars 1113811 1223422 1326-- 2414513 2524111 3612022 3712522 3812011 N_s=3N_c=8
16
20.7.04: LSS16 C1 Type B: Panel dataset (Unbalanced) CasesYear Variables 1111711 1211821 131192- 2111713 2211811 3222022 3322122 3422211 n1=3n2=8
17
20.7.04: LSS17 C1 Type C : Event history data analysis n Alternative data sources: Panel / cohort (more reliable) Retrospective (cheaper, but recall errors) n Aka: ‘Survival data analysis’; ‘Failure time analysis’; ‘hazards’; ‘risks’;.. Focus shifts to length of time in a ‘state’ - analyses determinants of time in state
18
20.7.04: LSS18 Key to event histories is ‘state space’
19
20.7.04: LSS19 C1 Type D: Time series data **Exact equivalence to panel data format Examples: n Unemployment rates by year in UK n University entrance rates by year by country Statistical summary of one particular concept, collected at repeated time points from one or more subjects
20
20.7.04: LSS20 Contrast 2: ‘Wide’ versus ‘Long’ format Relevant to all types of dataset: ‘Wide’ = 1 case per record (person), additional vars for time points : Person 1 Sex YoB Var1_92 Var1_93 Var1_94 … Person 2 … ‘Long’ = 1 case per time point within person (as panel data example) STATA: ‘reshape’ command allows transfer between the two formats
21
20.7.04: LSS21 Contrast 3: Continuous v’s Discrete time Primarily in terms of event history datasets n Continuous time (‘spell files’, ‘event oriented’) One episode per case, time in case is a variable n Discrete time One episode per time unit, type of event and event occurrence as variables n Analyses: Most packages can handle either format comfortably
22
20.7.04: LSS22
23
20.7.04: LSS23
24
20.7.04: LSS24 Data Management for Longitudinal Data 1. The Nature of ‘Large and Complex’ Data 2. Data management & STATA – getting started 3. Longitudinal Data Types 4. Merging Datasets
25
20.7.04: LSS25 Matching files n Complex data inevitably involves more than one related data file n A vital data analysis skill!! n Link data between files by connecting them according to key linking variable(s) n Eg, ‘person identifier’ variable ‘pid’ n Eg : http://iserwww.essex.ac.uk/bhps/doc/http://iserwww.essex.ac.uk/bhps/doc/ See datmanag_part3.do
26
20.7.04: LSS26 Types of file matching n Case-to-case matching One-to-one link, eg two files with different sets of variables for same people STATA: append or merge n Table distribution One-to-many link, eg one file has individuals, another has households, and match household info to the individuals STATA: merge
27
20.7.04: LSS27 Types of file matching ctd n Aggregating Summarise over multiple cases then link summaries back to cases STATA: collapse n Related cases matching Link info from one related case to another case, eg info on spouse put on own case STATA: merge or joinby
28
20.7.04: LSS28 STATA file matching crib: _merge = indicator of cases present for: 1 = Master file but not input file 2 = Input file but not Master file 3 = Master and input file
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.