Data Analysis to NTA Sang-Hyop Lee 41 Summer Seminar June 8, 2010.

Slides:



Advertisements
Similar presentations
Private and Familial Transfers Andrew Mason with assistance of Nicole Mun-Sim Lai.
Advertisements

Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data.
Methods of Economic Investigation Lecture 2
A first estimate of LCD by gender (Uruguay) Marisa Bucheli Cecilia González dECON, FCS, Udelar.
N ational T ransfer A ccounts Data Review (Hands On) Amonthep Chawla East-West Center & Nihon University Population Research Institute.
CHAPTER 1: THE NATURE OF REGRESSION ANALYSIS
United States Country Team Report Third NTA Workshop Honolulu, Hawaii January 20, 2006.
Chapter 2: The Data of Macroeconomics
Bridging the Gaps: Dealing with Major Survey Changes in Data Set Harmonization Joint Statistical Meetings Minneapolis, MN August 9, 2005 Presented by:
Palestinian Central Bureau of Statistics (PCBS) Palestine Poverty Maps 2009 March
N ational T ransfer A ccounts Comments on Intergenerational Transfers, Age Structure, and Macroeconomics Sang-Hyop Lee University of Hawaii at Manoa WEAI.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Market-based NTA Labor Income and Consumption by Gender Gretchen Donehower Day 4, Session 1, NTA Time Use and Gender Workshop Thursday, May 24, 2012 Institute.
Labor Income Profiles Sang-Hyop Lee November 5, 2007 Prepared for NTA 5 th Workshop SKKU, Seoul, Korea.
Constructing the Welfare Aggregate Part 2: Adjusting for Differences Across Individuals Bosnia and Herzegovina Poverty Analysis Workshop September 17-21,
Consumption calculations with real data – CORRECTED VERSION (CORRECTIONS IN RED) Gretchen Donehower Day 3, Session 2, NTA Time Use and Gender Workshop.
The new HBS Chisinau, 26 October Outline 1.How the HBS changed 2.Assessment of data quality 3.Data comparability 4.Conclusions.
Complexities of Complex Survey Design Analysis. Why worry about this? Many government studies use these designs – CDC National Health Interview Survey.
Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.
Fundamentals of Data Analysis. Four Types of Data Alphabetical / Categorical / Nominal data: –Information falls only in certain categories, not in-between.
Dr. Engr. Sami ur Rahman Assistant Professor Department of Computer Science University of Malakand Research Methods in Computer Science Lecture: Research.
N ational T ransfer A ccounts 1 The Lifecycle Deficit: A Review Sang-Hyop Lee University of Hawaii at Manoa.
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
GETTING STARTED Workshop Track A Wednesday, June 5, 9am-10am Gretchen Donehower University of California at Berkeley, Demography United States.
Manual on National Transfer Accounts: Lifecycle Account Training Workshop 10 th NTA Meeting Beijing, November 2014 Andrew Mason University of Hawaii at.
3 rd Meeting Macroeconomic Aspects of Intergenerational Transfers Country Report: Indonesia Honolulu January 2006 Maliki Turro Wongkaren Suahasil Nazara.
N ational T ransfer A ccounts 1 Data and Estimation Issues Sang-Hyop Lee University of Hawaii at Manoa.
Welcome and time use data orientation Gretchen Donehower NTA Time Use and Gender Workshop Tuesday, October 23, 2012 Facultad de Ciencias Sociales, Universidad.
Aging and Social Policy: An International Perspective Andrew Mason Sang-Hyop Lee Ronald Lee Chong-Bum An.
Poverty measurement: experience of the Republic of Moldova UNECE, Measuring poverty, 4 May 2015.
Data Collection and Sampling
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
NTA by SES (NTASES, N project) Sang-Hyop Lee University of Hawii at Manoa East-West Center November 12, 2014 NTA 10, Beijing, PRC.
Welcome and time use data orientation Gretchen Donehower Day 1, Session 1, NTA Time Use and Gender Workshop Monday, May 21, 2012 Institute for Labor, Science.
1 Dealing with Item Non-response in a Catering Survey Pauli Ollila Statistics Finland Kaija Saarni Finnish Game and Fisheries Research Institute Asmo Honkanen.
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Do Individual Accounts Postpone Retirement? Evidence from Chile Alejandra C. Edwards and Estelle James.
Summary Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures of variability.
N ational T ransfer A ccounts National Transfer Accounts: An Overview Andrew Mason East-West Center University of Hawaii at Manoa.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
March 2005Mason et al.1 Population Aging and Intergenerational Transfers: Introducing Age into National Accounts Andrew Mason, University of Hawaii and.
Consumption calculations with real data Gretchen Donehower Day 3, Session 2, NTA Time Use and Gender Workshop Wednesday, May 23, 2012 Institute for Labor,
Application 3: Estimating the Effect of Education on Earnings Methods of Economic Investigation Lecture 9 1.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
Copyright 2010, The World Bank Group. All Rights Reserved. Part 1 Sample Design Produced in Collaboration between World Bank Institute and the Development.
SAMPLE SELECTION in Earnings Equation Cheti Nicoletti ISER, University of Essex.
Design and Assessment of the Toronto Area Computerized Household Activity Scheduling Survey Sean T. Doherty, Erika Nemeth, Matthew Roorda, Eric J. Miller.
PEP-PMMA Training Session Introduction to Distributive Analysis Lima, Peru Abdelkrim Araar / Jean-Yves Duclos 9-10 June 2007.
Medical Expenditure Burdens: The Impact of Tax Subsidies, Within-Year Expenditure Concentration, and More Thomas M. Selden, Ph.D. Division of Modeling.
N ational T ransfer A ccounts Data and Estimation Issues May 15, 2009 Sang-Hyop Lee University of Hawaii at Manoa.
Transfers and Asset-based Age Profiles by Gender Gretchen Donehower Day 4, Session 2, NTA Time Use and Gender Workshop Thursday, May 24, 2012 Institute.
Expert Group Meeting on MDG, Astana, 5-8 Oct.2009 MDG 3.2: Share of women in wage employment in the non-agricultural sector Sources of discrepancies between.
The Most Important Graph in the World: US Life Cycle Deficits, Gretchen Donehower UC Berkeley Department of Demography September 27, 2006.
SESSION 8: MACROECONOMIC INDICATORS: GDP, CPI, AND THE UNEMPLOYMENT RATE Talking Points Macroeconomic Indicators: GDP, CPI, and the Unemployment Rate 1.
Constructing the Welfare Aggregate Part 2: Adjusting for Differences Across Individuals Salman Zaidi Washington DC, January 19th,
Finalizing Results, Review and Sensitivity Testing Gretchen Donehower Day 2, Session 2, NTA Time Use and Gender Workshop Tuesday, May 22, 2012 Institute.
Family in the changing world, November 28-29, Moscow Mortality in Russia: Evidence from Micro Data Irina Denisova Center for Economic and Financial Research.
Demand Forecasting Prof. Ravikesh Srivastava Lecture-11.
: LSS1 Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, Data and Variable Management Paul Lambert.
Workshop on MDG, Bangkok, Jan.2009 MDG 3.2: Share of women in wage employment in the non-agricultural sector National and global data.
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
1 Consumption and Production Profiles Andrew Mason Sang-Hyop Lee Maliki January 14, 2005 NTA meeting at Berkeley.
1 Study Design Imre Janszky Faculty of Medicine, ISM NTNU.
Copyright 2010, The World Bank Group. All Rights Reserved. Producer prices, part 2 Measurement issues Business Statistics and Registers 1.
The Lifecycle Deficit: A Review
Introduction to Survey Data Analysis
The Language of Sampling
Lifecycle Deficit (Consumption & Labor Income)
Asset-based Reallocations
Presentation transcript:

Data Analysis to NTA Sang-Hyop Lee 41 Summer Seminar June 8, 2010

Data Sets for Statistical Analysis ► Cross section vs. Time series ► Cross section time series; useful for analysis of aggregate variables (useful for cohort analysis). ► Panel (longitudinal)  Repeated cross-section design: most common  Rotating panel design (Cote d’Ivore 1985 data)  Supplemental cross-section design (Kenya & Tanzania 1982/83 data, Malaysia and Indonesia LS) ► Cross section with retrospective information ► Micro vs. Macro

Quality of Survey Data ► Requires individual or household micro survey data sets. ► Zvi Griliches, “A good survey data set has the properties of...”  Extent (richness): it has the variables of interest at a certain level of detail.  Reliability: the variables are measured without error.  Validity: the data set is representative.

Data Problem (An example) ► FIES (64,433 household with 233,225 individuals)  Measured for only urban areas (Valid?)  No single person households (Valid?)  No information on income for family owned business (Rich?)  Measured for up to 8 household members: (Reliable? Valid?)

Extent (Richness): Missing/Change of Variables ► Missing variables  Not measured or measured for a certain group  Labor portion of self-employment income ► Change of variables over time  Institutional/policy change  New consumption items, new jobs, etc ► Change of survey instrument/collapsing

Reliability: Measurement Error ► Response/reporting error  Respondents do not know what is required  Incentive to understate/overstate  Recall bias: related with period of survey  Using wrong/different reporting units  Heaping  Outliers ► Coding error, top coding, missing observation ► Overestimate/underestimate  Parents do not report/register their children until the children have name  Detect by checking survival rate of single ages ► Discrepancy between aggregate and sum of individual values

Validity: Censoring ► Selection based on characteristics ► Censoring due to the time of survey  Duration of unemployment (left and right censoring)  Completed years of schooling ► Attrition (panel data)

Categorical/Qualitative Variables ► Converting categorical to single continuous variables  Grouped by age (population, public education consumption)  Income category (FPL) ► Inconsistency over time ► Categorical  continuous, and vice versa

Units, Real vs. Nominal ► Be careful about the reporting unit  Measurement units  Reporting period units (reference period, seasonal fluctuation, recall bias) ► Nominal vs. Real  Aggregation across items  Quality change (e.g. computer)  Where inflation is a substantial problem

Solution for Missing Variables ► Missing is not usually zero!! ► Ignore it; random non-response ► Give up; find other source of data sets ► Impute; “missing does not mean zero”.  Based on their characteristics or mean value  Based on the value of other peer group  Modified zero order regressions (y on x) - Create dummy variable for missing variables of x (z) - Replace missing variable with 0 (x’) - Regress y on x’ and z, rather than y on x

Individuals, Households, Regions ► Some data sets are individual level, but a lot of data are gathered from household  What is Hh? Definition of Hhh? ► Regional characteristics are often quite important  PSU, clustering ► Or data could be regional level

Headship (Thailand, 1996)

Measuring Consumption ► Durables. ► Underestimation: e.g. British FES  Using aggregate control mitigates the problem. ► Home produced items: both income and consumption. ► Allocation across individuals is difficult ► Estimating some profiles, such as health expenditure is also difficult, partly due to various sources of financing.

Measuring Income ► “All of the difficulties of measuring consumption apply with greater force to the measurement of income” (Deaton, p. 29).  Need detailed information on “transactions” (inflow and outflow): an enormous task  Incentive to understate  Some surveys did not attempt to collect information on asset income

Data Cleaning and Variable Manipulation ► Case by case ► Find out what data sets are available and choose the best one ► Detect outliers and examine them carefully ► A serious examination is required when inflation matters to check whether actual estimation process generates a variable ► Make variables consistent ► Convert categorical variables to continuous variables, vice versa. ► Data merge, data reshape, construct variables…

Weighting and Clustering ► Weight should be used in the summary of variables/direct tabulation/regression/smoothing. ► Frequency Weights; “fw” indicate replicated data. The weight tells us how many observations each observation really represents.. tab edu [w=wgt]  tab edu [fw=wgt] ► Analytic Weights; “aw” are inversely proportional to the variance of an observation. It is appropriate when you are dealing with data containing averages.. su edu [w=wgt]  su edu [aw=wgt]. reg wage edu [w=wgt]  reg wage edu [aw=wgt]

Weighting and Clustering (cont’d) ► Probability Weights; “pw” are the sample weight which is the inverse of the probability that this observation was sampled. Free from heteroschedasticity problem.. reg wage edu [pw=wgt]  reg wage edu [(a)w=wgt], robust. reg wage edu [pw=wgt], cluster(hhid)  reg wage edu [(a)w=wgt], cluster(hhid)  reg wage edu [(a)w=wgt], cluster(hhid)

Smoothing (by Ivan on Thursday) ► Will Shows the pattern more clearly by reducing sampling variance ► Type of smoothing  “lowess” smoothing (Stata) does not allow the incorporation of weight. Expanding the data is computationally burdensome.  Friedman’s super smoothing (R) does.

Rule of smoothing ► Basic components should be smoothed, but not aggregations. For example, private health consumption and public health consumption profiles should be smoothed, but the sum of the two should not be smoothed. ► The objective is to reduce sampling variance but not eliminate what may be “real” features of the data.  Avoid too much smoothing (e.g., health consumption for old ages). Use right bandwidth (span)  Public health spending may increase dramatically when individuals reach an age threshold, e.g., 65. This kind of feature of the data should not be smoothed away.  Due to unusual high health consumption by newborns, we tend not to smooth health consumption by age 0. This could be done by including estimated unsmoothed health consumption by newborns to the age profile of smoothed private health consumption by other age groups.  Only adults (usually ages 15 and older) receive income, pay income taxes and make familial transfer outflows. Thus, when we smooth these age profiles, we begin smoothing from the adults, excluding those younger age group who do not earn income.  However, problem arises when some beginning age group may appear to have negative values for these variables. This could be solved by replacing the negative by the unsmoothed values for the beginning age group.

The End