Presentation is loading. Please wait.

Presentation is loading. Please wait.

20.7.04: LSS1 Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, 20.7.04 Data and Variable Management Paul Lambert.

Similar presentations


Presentation on theme: "20.7.04: LSS1 Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, 20.7.04 Data and Variable Management Paul Lambert."— Presentation transcript:

1 20.7.04: LSS1 Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, 20.7.04 Data and Variable Management Paul Lambert

2 20.7.04: LSS2 Data Management for Longitudinal Data 1. The Nature of ‘Large and Complex’ Data 2. Data management & STATA – getting started 3. Longitudinal Data Types 4. Merging Datasets

3 20.7.04: LSS3 The nature of ‘large and complex’ longitudinal resources: complicating the variable by case matrix Cases   Variables  11171.73A.... 21181.85B.... 32171.60C.... 42181.69A............. N

4 20.7.04: LSS4 Large and complex =  Complexity in: Multiple hierarchies of measurement Array of variables / operationalisations Relations between / subgroups of cases Multiple points of measurement –Balanced or unbalanced repeated contacts –Censored duration data Sample collection and weighting

5 20.7.04: LSS5 i) Multiple hierarchies (levels) of measurement n Common examples: Both individuals and households Schools and pupils People and local districts and regions n Solutions: Separate VxC matrix for each level, eg BHPS Merged VxC matrix at lowest level

6 6 Illustration: Hierarchical dataset ClusterPerson  Person-level Vars  1113811 1223422 1326-- 2114513 2224111 3112022 3212522 3312011 n1=3n2=8

7 20.7.04: LSS7 ii) Array of variables n Vast number of variable responses, eg 1K+ Recoding multiplies these up, eg dummies Multiple response var.s (‘all that apply’) Categorisations / indexes (eg occupations) n Implication: Either separate files for separate var. groups Or very long and difficult files…

8 20.7.04: LSS8 iii) Relations between cases n All respondents in a household n Husbands and wives both sampled n Fellow school pupils sampled n Longitudinal: differing relations with others at different times n Outcomes: Link information between related cases

9 20.7.04: LSS9 iv) Multiple measurement points n Longitudinal: information on same cases for multiple time points  Panel or cohort: several records via repeated contact for each individual Problems of ‘unbalanced’ panels  Life history / retrospective: Durations in spells: multistate / multiepisode, overlapping spells; time varying covariates Left or right censoring of durations in spells

10 20.7.04: LSS10 v) Sample collection / weighting n Multistage cluster particularly popular n Sample may have been clustered, stratified n Longitudinal: uneven inclusion of cases over time n Sample weights designed to solve, but: Complex in application Not suited to all applications

11 20.7.04: LSS11 Data Management for Longitudinal Data 1. The Nature of ‘Large and Complex’ Data 2. Data management & STATA – getting started 3. Longitudinal Data Types 4. Merging Datasets

12 20.7.04: LSS12 STATA data management examples: see datmanag_part1.do Claim: For data management, STATA is powerful, but not always well designed n Batch files / interactive syntax / programs n Data entry / browsing n Variable labels n Computing / recoding n Missing values n Weighting data n Survey estimators (svy)

13 20.7.04: LSS13 Data Management for Longitudinal Data 1. The Nature of ‘Large and Complex’ Data 2. Data management & STATA – getting started 3. Longitudinal Data Types 4. Merging Datasets

14 20.7.04: LSS14 Typology of longitudinal data files n 3 Sets of contrasts : 1. Repeated X-section / Panel / Cohort Event History / Time Series 2. Wide v’s Long 3. Discrete v’s Continuous time See datmanag_part 2.do

15 20.7.04: LSS15 Contrast 1 Type A: Repeated x-sect data SurveyPerson  Person-level Vars  1113811 1223422 1326-- 2414513 2524111 3612022 3712522 3812011 N_s=3N_c=8

16 20.7.04: LSS16 C1 Type B: Panel dataset (Unbalanced) CasesYear  Variables  1111711 1211821 131192- 2111713 2211811 3222022 3322122 3422211 n1=3n2=8

17 20.7.04: LSS17 C1 Type C : Event history data analysis n Alternative data sources: Panel / cohort (more reliable) Retrospective (cheaper, but recall errors) n Aka: ‘Survival data analysis’; ‘Failure time analysis’; ‘hazards’; ‘risks’;.. Focus shifts to length of time in a ‘state’ - analyses determinants of time in state

18 20.7.04: LSS18 Key to event histories is ‘state space’

19 20.7.04: LSS19 C1 Type D: Time series data **Exact equivalence to panel data format Examples: n Unemployment rates by year in UK n University entrance rates by year by country Statistical summary of one particular concept, collected at repeated time points from one or more subjects

20 20.7.04: LSS20 Contrast 2: ‘Wide’ versus ‘Long’ format Relevant to all types of dataset:  ‘Wide’ = 1 case per record (person), additional vars for time points : Person 1 Sex YoB Var1_92 Var1_93 Var1_94 … Person 2 …  ‘Long’ = 1 case per time point within person (as panel data example) STATA: ‘reshape’ command allows transfer between the two formats

21 20.7.04: LSS21 Contrast 3: Continuous v’s Discrete time Primarily in terms of event history datasets n Continuous time (‘spell files’, ‘event oriented’)  One episode per case, time in case is a variable n Discrete time  One episode per time unit, type of event and event occurrence as variables n Analyses: Most packages can handle either format comfortably

22 20.7.04: LSS22

23 20.7.04: LSS23

24 20.7.04: LSS24 Data Management for Longitudinal Data 1. The Nature of ‘Large and Complex’ Data 2. Data management & STATA – getting started 3. Longitudinal Data Types 4. Merging Datasets

25 20.7.04: LSS25 Matching files n Complex data inevitably involves more than one related data file n A vital data analysis skill!! n Link data between files by connecting them according to key linking variable(s) n Eg, ‘person identifier’ variable ‘pid’ n Eg : http://iserwww.essex.ac.uk/bhps/doc/http://iserwww.essex.ac.uk/bhps/doc/ See datmanag_part3.do

26 20.7.04: LSS26 Types of file matching n Case-to-case matching One-to-one link, eg two files with different sets of variables for same people STATA: append or merge n Table distribution One-to-many link, eg one file has individuals, another has households, and match household info to the individuals STATA: merge

27 20.7.04: LSS27 Types of file matching ctd n Aggregating Summarise over multiple cases then link summaries back to cases STATA: collapse n Related cases matching Link info from one related case to another case, eg info on spouse put on own case STATA: merge or joinby

28 20.7.04: LSS28 STATA file matching crib: _merge = indicator of cases present for: 1 = Master file but not input file 2 = Input file but not Master file 3 = Master and input file


Download ppt "20.7.04: LSS1 Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, 20.7.04 Data and Variable Management Paul Lambert."

Similar presentations


Ads by Google