Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysing Households with the SARs Jo Wathan SARs support team University of Manchester.

Similar presentations


Presentation on theme: "Analysing Households with the SARs Jo Wathan SARs support team University of Manchester."— Presentation transcript:

1 Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

2 In this session What are the SARs? What would you use them for? How do you work with them? –For household level analysis –For heirarchical analysis Hands-on session

3 Census Microdata background Census outputs have historically been aggregate tables – safe but inflexible –Well suited to analyses at small geographical detail Microdata permits more flexibility –Longitudinal Survey links data from 1971 good for process but has to be secure http://www.celsius.lshtm.ac.uk/ http://www.celsius.lshtm.ac.uk/ –Demand for a cross-sectional dataset that can be used on own desktop Samples of anonymised records first available from 1991 Census –2% individual file (SAR areas) –1% household file (Region)

4 General Features of the SARs Microdata –Can produce your own tables, recode and group data –Can use models –Full individual information for all census topics –Need to be analysed using a statistics package Very large samples –Good for looking at small subpopulations Can be used alongside other census data 2 time points

5 The SARs family 2001 FileSample typeGeographyAvailability Individual licenced 3% sample of individuals UK GOR (+ Wales, Scot, NI, Inner/Outer London) EUL CCSR Small area microdata 5% sample of individuals UK: LA (or constituency in NI) EUL CCSR Household licensed 1% hierarchical file None: England & Wales only Special licence UKDA Individual CAMS Same sample as Individual licenced SAR LA (GB) or Constituency (NI) IMD info for SOA In house at ONS Household CAMS 1% hierarchical file All of UKIn house at ONS

6 Individual licenced file Geography GOR (also Inner/Outer London, Scotland, Wales and NI) Age Grouped: 8 bands for ages 16-74 Ethnicity All 16 categories in v2 (England & Wales), 16 cats of COB Employment SOCminor, 40 cats NSSEC, 17 cats of Industry Notes Slight variation in the sampling fraction for each country: 3.125 in England and Wales; 3.246 in Scotland 3.139 in Northern Ireland

7 Small area microdata file GeographyLA (Parliamentary Constituency in NI) – 3 LAs merged due to size AgeAll ages banded: 11 bands Ethnicity13 cats (England & Wales), 5 cats of COB EmploymentNSSEC 8 cats NotesMost recent file – published 2006.

8 The Licence All users need to be licensed Academics complete license as part of the Census Registration System Process Non-academic users sign license as part of the data registration process Cannot pass the data to an unlicensed user Cannot attempt to identify an individual

9 Access Arrangements Data distributed by CCSR Academics, no charge –Register for the data under Census Registration System –Access the data online from CCSR website Non-academics –Not for profit £500 per file –Business users £1000 per file –10 users per application, incl. software –Download End User License from web

10 Special licence – Household SAR GeogNone – England & Wales only Age2 year bands (e.g. 0-1, 2-3...) Ethnicity16 cats Employ- ment SOCMinor,96cats(ISCO), 17cats(SIC92), 40 cats NSSEC NotesDownload access provided through UKDA & UKDA charges apply (free for not for profit). Requires a full paper application. Data supported by SARs team at CCSR. Users must agree to a much higher level of data stewardship than for EUL files.

11 What are the CAMS? Contain data which was seen as too disclosive to release outside ONS Use limited to research questions which cannot be satisfied with another data source ONS vet applications Data accessed at a Virtual Microdata Laboratory at ONS – data cannot be removed Results vetted by ONS prior to release Users must get OK from ONS before publishing/presenting results Further information and appropriate forms at http://www.statistics.gov.uk/census2001/sar_cams.asp Contact sars@ons.gsi.gov.uk for more details

12 Content of CAMs files Files contains much more detail; e.g. –Individual year of age (topcoded at 95) –Full coding on country of birth –SOC Unit Group –Local authority geography –Index of Deprivation for SOAs –Index of Deprivation for migrants last address –Full household matrix

13 CAMS Good practice Use the licensed SARs... –to exhaust the potential of other datasets –to write your syntax files check the disclosure guidelines before writing your application Avoid complex tables –small cell counts arent reliable –unique cells will usually be suppressed Do use models

14 Using SARs to understand households FileHousehold level analysis Can create new household variables? Look at intra- household characteristics Individual licenced Yes – select HRPv. LimitedNo Small area microdata Yes –select HRPv. LimitedNo Household licensed Yes select any representative or change to hhd file Yes Individual CAMS Yes –select HRPv. limitedNo Household CAMS Yes select any representative or change to hhd file Yes

15 Using the SARs 1991-2001 Changes –Principles –Defining a population base –Ethnicity Coverage –ONC & Imputation –Difference between 1991/2001 Good practice issues –Documentation –Data stewardship –Dealing with sample data –Reporting

16 Comparisons between 1991 and 2001 Population base changed –Imputation (no imputed values in 1991 SARs) –Students – enumerated at term-time address –Residents only (choice in 1991) Variable continuity –Variable names have been changed where the variable is not exactly the same –Some variables (e.g. age, LLI) are easy to compare by grouping 1991 values –Some variables are harder to compare as the question has changed (eg qualifications)

17 Ethnicity 91/01 Different questions asked in 1991 and 2001 No agreed and perfect correspondence Simpson and Akinwale use LS to show how 1991 maps on to 2001 www.statistics.gov.uk/events/ls_census2001/agenda.asp

18 Define your population base You need to define the population base –In 1991 we had an issue with visitors being double counted (filter using residsta) –In 2001 students who are living away from home are double counted (filter using stulawy in Ind licenced or popbase in other files) –2001 Household file contains dummy form households with no usual residents, e.g. holiday homes (filter using popbase) –Note popbase categories vary across files

19 Census coverage Major effort to improve coverage in 2001 One Number Census Use of large Census Coverage Survey to correct census results, 300K households –Design independent of census; –Used matched census and CCS data to estimate total population in each area, –adjusted all results for census non-response using imputation of households and individuals –Results in final database for UK adjusted for non- response

20 Census coverage Coverage before imputation: –94% households returned forms, with another 4% estimated to be in households identified by enumerators. Response rate lowest for –Young people in their early 20s (men aged 20- 24 resp. rate of 87%) –Inner London (resp rate of 78%) Once imputed cases are included estimated to be 100% coverage

21 Non-response 1991 SARs selected from 10% sample –Did not include imputed households –96% coverage 2001 SARs selected from 100% ONC database –Imputed individuals/hholds are identified using oncperim variable –Imputed items are flagged using z variables (zvar=1 if imputed) – available in the larger *impflag* version of the data

22 Percentage ONC imputed, 2001 SARs Not ONC imputed ONC imputed White94.85.2 Mixed91.58.5 Asian84.615.4 Black76.513.5 Chinese/Other85.614.4 All93.86.2

23 Percentage with ethnicity variable imputed, 2001 SARs Not imputed (zeth*=0) Imputed (zeth*=1) White97.52.5 Mixed88.311.7 Asian94.85.2 Black92.67.4 Chinese/Other89.011.0 All97.12.9

24 PRAMMing PRAMMing is perturbation designed to deal with very unusual cases, eg widowed 16-year olds Avoids additional broad-banding Perturbation is constrained to –preserve univariate distributions –Preserve multivariate distributions on control variables –prevents strange results (like 5 year old widows) Affects 15 variables –Primary economic activity – 1% cases

25 General advice PRAMMed cases are flagged as imputed in z var Imputation is better than not imputing unless you have evidence to the contrary –Known exception is ethnicity (Simpson and Akinwale) If unsure about impact of PRAMMing and imputation –Do a sensitivity test –use the z var to exclude cases with imputed variables and then repeat your analysis –Use ONCPERIM to exclude imputed individuals and repeat your analysis

26 Get to know the data Use the documentation SARs User Guide –Use Census schedules to check questions –Check univariate frequencies –Do exploratory analyses –Contact sars-helpdesk@man.ac.uk if you cant find the information you need in the online documentation Contact sars-helpdesk@man.ac.uk if you think there is a problem with the data

27

28

29 SARs as a LARGE dataset A few Million cases can cause trouble! Use Nesstar to do initial data exploration Extract a subset using NESSTAR or take a subset from the downloaded file For serious analysis using a syntax ( or.do) file to record syntax makes re-running easier –Create a single syntax file which starts with the original data –Use file naming conventions that will enable you to trace versions –Keep a record of work done

30 SARs as sample data Geographically stratified sample –approximates to simple random sample –no clustering in Individual file –Household file – clustering within households –Although large sample you may have small sample sizes when using sub-groups –use standard errors and confidence intervals

31 Reporting Census data is crown copyright Data should be cited (reference on web site) Let us know when you publish Before presenting or publishing results based on the CAMS contact ONS beforehand

32 User support www.ccsr.ac.uk/sars –Resources and links added as we go Seminar invitations welcome! Regional workshop invites welcome! SARs Helpdesk –sars-helpdesk@man.ac.uk –(0161) 275 4735 Join email and newsletter lists

33 Questions …before we talk about using the SARs for hierarchical analysis?

34 Using hierarchical microdata Units of analysis Flat files vs. hierarchical files Using household hierarchy –Different aims –Examples –How to achieve

35 Types Units of analysis Individual Family A group of people consisting of a married or cohabiting couple with or without child(ren), or a lone parent with child(ren). It also includes a married or cohabiting couple with their grandchild(ren) or a lone grandparent with his or her grandchild(ren) where there are no children in the intervening generation in the household. Household A household is defined as one person living alone, or a group of people (not necessarily related) living at the same address with common housekeeping - that is, sharing either a living room or sitting room or at least one meal a day. Local authority district (SAM/CAMS) Others? Definitions from 2001 Definitions Volume, National Stats (2004)

36 HOUSEHOLD LEVEL: 1 observation per household –What proportion of households contain only 1 person? 29.2% –What is the mean household size? 2.34 INDIVIDUAL LEVEL: 1 observation per person –What proportion of individuals live alone? 12.5% –What is the average household size for individuals in the sample? 3.05 Source: QLFS 2005 Spring Quarter Choice of unit matters

37 Non-hierarchical files Individual SAR/CAMS and Small Area Microdata, 1991 Individual SAR Can be used to analyse household characteristics if and only if those characteristics –can be represented by those of HRP or… –are already stored in the data Need also to select only HRP to avoid large households being over represented

38 Example: The relationship between occupancy and social grade using the SAM The SAM contains 2 occupancy derived variables as well as HRPs social grade Limit analyses to the Household Reference Person to over-representation of large households (select if reltohr=1) Tabulate the already present variables against each other Easier access, UK wide with geography (without CAM) and larger n

39 Results 2001 Small Area Microdata Occupancy Rating of Hhd Social Grade of Hhd Reference Person No emp recordA&BC1C2DETotal 2+ rms> req'd46.363.150.345.33739.348.2 1 rm > req'd28.220.224.828.128.827.725.7 n(rms) = req'd2011.917.819.224.122.818.7 n(rms) < req'd5.54.87.17.310.210.37.4 Total100 N=1405542513412982501776262172451368001221816 Filter: ( Relationship to HRP = Household reference person )

40 But more flexible than tables… Can limit to owner occupiers in England and Wales…

41 What sort of household variables are on the individual files? e.g. EUL Individual file Region Household Resources –Accomodation type, tenure, lowest floor of accomodation, Furnished, No. rooms –Sole use of bath/shower/toilet, full/part central heating, self contained –Cars Household membership –No. of residents, number who are; carers, 65+, employed adults, LT ill, poor health –No. families –Students living away Household indicators –Education, employment, health/disability, housing –Social grade of HRP –Multiple ethnicity in hhd Density –No. residents per room, occupancy rating

42 c.f.Hierarchical files Household SL file, Household CAM, 1991 Household file Contains individuals within households, so considerably more flexible Can be used to create new household variables based on information about the household and/or information about all the individuals within the household Can be used to describe intra-household relationships

43 The hierarchy of the household SAR Household 1 North West Social rented Household 2 Wales Owner occupier Person 1 HRP Family 1 Female 28 No quals No LTILL Person 2 Son of HRP Family 1 Male 12 N/A No LTILL Person 1 HRP Family 1 Male 34 Degree No LTILL Person 2 Spouse of HRP Family 1 Female 30 Degree P/T Employee No LTILL Person 3 Parent of HRP Family 2 Female 72 No quals Econ Inactive LTILL Individuals grouped into household groups Family units identified within households

44 What does it look like?

45 Looking at the data For the 20 cases in the previous screenshot: How many households? How many individuals in the largest household? What kind of family lives in hnum 41? Thinking of the census definition of family unit, did any household have more than one family unit?

46 What sort of analysis? Describing the household better Describing an individual in relation to other members of the household Describing partnerships

47 Household composition & position No. Genera- tions in Hhd Position within genera- tions 2001: WBCInPa 1991: WBCIPa 1 gen snk <363.17.62.81.72.46.21.30.6 snk 36+6.311.72.71.34.16.21.10.8 cpnok <368.82.97.06.89.55.04.33.5 cpnok 36+17.25.76.23.414.36.54.41.5 2 gen upper 2g52.058.160.063.355.259.258.865.2 lower 2g8.48.711.412.611.612.113.714.1 3 gen upper 3g0.61.01.92.30.72.33.83.9 mid 3g1.11.86.27.02.02.312.110.2 lower 3g0.1 0.60.90.20.10.40.2 unrel2.42.21.30.70.1 0.0 Total (100%) 126,0861,7342,7451,638119,3191,4522,118955 Household SAR 91/01: Female residents 16-59 Excludes F/T students

48 Mixed couples – SL-HSAR

49 ...and UK born

50 Principles of working with hierarchical data Can create variables which represent a summary across a household –Min, max, average, sum, count May need to prepare the data first Can also work within families within households Need a unique identifier(s) to work this way

51 ... in SPSS Aggregate will create a new file at household (or family...) level Match will allow you to link household (or family...) level and individual files Aggregate addvar subcommand allows you to do it all in one

52 Example 1: Add oldest person in hhd var to all individuals in the household Only possible in recent versions of SPSS Within each household (indicated by hnum) Compute maximum value of age: AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=hnum /AGEH_max = MAX(AGEH). Defines new variable Break by Household ID variable Add new variable To current person- level file Aggregate command produces summary variables across higher level units – MUST SORT BY UNIT FIRST

53 Which gets us... For each Value of hnum Take the max value of ageh To create new variable

54 Example 2: Oldest male in the household Same principle as before but ensure that female ages are excluded (set them to system missing first) DO IF (sex = 1). RECODE AGEH (ELSE=Copy) INTO mageh. END IF. VARIABLE LABELS mageh 'male age'. EXECUTE. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=hnum /maxmage 'maximum male age in hhd' = MAX(mageh). mage = ageh for males only (otherwise system missing) For each value of Hnum: -Take max value of mage -Distribute this value to all with that value of hnum

55 Which gets us... First compute age for males only Aggregate command takes maximum value of mage within each value of hnum and distributes across whole household

56 Can extend this principle... To create a variable showing characteristics of HRP/Household Head –Create a new variable for HoH/HRP which copies the relevant characteristic –Take maximum value of new variable across household To create a variable showing characteristics of Family head –Create a variable for the family head/FRP containing the value –Aggregate over household number AND family unit


Download ppt "Analysing Households with the SARs Jo Wathan SARs support team University of Manchester."

Similar presentations


Ads by Google