Presentation is loading. Please wait.

Presentation is loading. Please wait.

Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

Similar presentations


Presentation on theme: "Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland."— Presentation transcript:

1 Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland synthpop an R package for generating synthetic microdata

2 What is synthpop?  A software tool for producing synthetic versions of sensitive microdata Administrative Data Research Centre - Scotland | Beata Nowok | 5-7 October 2015

3 SexAgeEducation Marital status IncomeLife satisfaction FEMALE57VOCATIONAL/GRAMMARMARRIED800PLEASED MALE41SECONDARYUNMARRIED1500MIXED FEMALE18VOCATIONAL/GRAMMARUNMARRIEDNAPLEASED FEMALE78PRIMARY/NO EDUCATIONWIDOWED900MIXED FEMALE54VOCATIONAL/GRAMMARMARRIED1500MOSTLY SATISFIED MALE20SECONDARYUNMARRIED-8PLEASED FEMALE39SECONDARYMARRIED2000MOSTLY SATISFIED MALE39SECONDARYMARRIED1197MIXED FEMALE38VOCATIONAL/GRAMMARMARRIEDNAMOSTLY DISSATISFIED FEMALE73VOCATIONAL/GRAMMARWIDOWED1700PLEASED FEMALE54SECONDARYWIDOWED2000MOSTLY SATISFIED MALE30VOCATIONAL/GRAMMARUNMARRIED900MOSTLY SATISFIED MALE68SECONDARYMARRIED-8DELIGHTED MALE61PRIMARY/NO EDUCATIONMARRIED-8MIXED Observed (input) SexAgeEducation Marital status IncomeLife satisfaction MALE81PRIMARY/NO EDUCATIONMARRIED2100PLEASED MALE54VOCATIONAL/GRAMMARMARRIED1700PLEASED FEMALE32VOCATIONAL/GRAMMARDIVORCED870MIXED FEMALE98PRIMARY/NO EDUCATIONMARRIED800MOSTLY DISSATISFIED FEMALE50PRIMARY/NO EDUCATIONMARRIEDNAMOSTLY SATISFIED FEMALE37VOCATIONAL/GRAMMARMARRIED158PLEASED MALE28VOCATIONAL/GRAMMARNA1500MOSTLY SATISFIED FEMALE62PRIMARY/NO EDUCATIONMARRIED830MOSTLY SATISFIED MALE78PRIMARY/NO EDUCATIONMARRIEDNAPLEASED FEMALE29SECONDARYMARRIED580MOSTLY SATISFIED MALE59PRIMARY/NO EDUCATIONMARRIED1300MOSTLY SATISFIED MALE41SECONDARYUNMARRIED1500MIXED MALE18SECONDARYUNMARRIED-8PLEASED FEMALE73PRIMARY/NO EDUCATIONWIDOWED1350MOSTLY SATISFIED Synthetic (output) Data that look (structurally) like original data but contain artificial units only

4 Generating synthetic data: method Sequentially replacing original data values with synthetic values generated from conditional probability distributions fit draw Y j ~ (Y 0,Y 1,...,Y j−1 ) synthetic observed

5 http://cran.r-project.org/package=synthpop Generating synthetic versions of sensitive microdata for statistical disclosure control

6

7

8 Generating synthetic data: synthpop synthetic syn () observed

9  Synthesis can be run with default parameters (CART – Classification and Regression Trees) syn(data) Generating synthetic data: synthpop Administrative Data Research Centre - Scotland | Beata Nowok | 5-7 October 2015

10

11

12 syn() & common data problems  Missing-data codes: cont.na  categorical variables: additional factor level(s)  continuous variables: specified by cont.na and modelled separately  Semi-continuous variables: semicont  Restricted values (interrelationships between variables): rules & rvalues  Linear constraints: denom  Non-negativity / non-normality: method set to ‘ lognorm’, ‘ sqrtnorm’ or ‘ cubertnorm’  Deterministic relations: method set to “~I(…)”

13 syn()

14 Overview of synthpop functions synthetic read.obs() write.syn() sdc() compare.synds()summary.synds() compare.fit.synds() glm.synds() summary.fit.synds() descriptive models syn () observed utility.synds() data structure

15 compare()

16

17

18 utility.synds()

19 sdc() & statistical disclosure control  Data labelling: label  Removing replicated uniques: rm.replicated.uniques  Bottom- and top-coding: recode.vars, bottom.top.coding, recode.exclude  At synthesis stage: smoothing, minbucket

20 sdc()

21 Conclusions  The synthpop package for R: facilitating generation, evaluation and analysis of synthetic data Administrative Data Research Centre - Scotland | Beata Nowok | 5-7 October 2015


Download ppt "Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland."

Similar presentations


Ads by Google