Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India

Slides:



Advertisements
Similar presentations
Outline of talk The ONS surveys Why should we weight?
Advertisements

Assumptions underlying regression analysis
Stata as a Data Entry Management Tool
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India
ADePT Automated DECs Poverty Tables Michael Lokshin, Zurab Sajaia and Sergiy Radyakin DECRG-PO The World Bank.
Using SPSS. Handy buttons Switch between values & value labels Info about variables (& ‘Go To’)
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) School of Social Sciences (SSS) Jawaharlal Nehru University (JNU) New Delhi -
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: What it Is and How it Works. Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r Assumptions.
Feb 21, 2006Lecture 6Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing.
1 SPSS Recently it has gone through a name change so your icon on your computer may be under a different name (i.e. PASW- Predictive Analytics SoftWare).
A Simple Guide to Using SPSS© for Windows
Labor Statistics in the United States Grace York March 2004.
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
SPSS 202: Data Management by SPSS (Workshop) Dr. Daisy Dai Department of Medical Research 1.
Slide 1 Detecting Outliers Outliers are cases that have an atypical score either for a single variable (univariate outliers) or for a combination of variables.
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India
Regression and Correlation
MATH 1107 Elementary Statistics Lecture 6 Scatterplots, Association and Correlation.
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
Merging census aggregate statistics with postal code-based microdata Laine Ruus University of Toronto. Data Library Service ,
Third Group Training Course in Application of Information and Communications Technology to Production and Dissemination of Official Statistics (06 th May’07.
Chapter 12 Multiple Regression and Model Building.
 Overview of SPSS  Interface  Getting Started  Managing Data  Descriptive Statistics  Basic Analysis  Additional Resources.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Alcohol consumption and HDI story TotalBeerWineSpiritsOtherHDI Lifetime span Austria13,246,74,11,60,40,75580,119 Finland12,524,592,242,820,310,80079,724.
Introduction to SPSS Edward A. Greenberg, PhD
Access to Electricity, Food Security and Poverty Reduction in Rural South-western Nigeria Awotide, B.A., T.T. Awoyemi, and A.O. Obayelu A paper prepared.
Workshop on International Standards, Contemporary Technologies and Regional Cooperation, Noumea, New Caledonia, 04–08 February 2008 Results Generated from.
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics.
Key Data Management Tasks in Stata
Roundtable Meeting on Programme for the 2010 Round of Censuses of Agriculture Bangkok, Thailand 28 November-2 December, 2005 VILLAGE LEVEL SOCIO-ECONOMIC.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Creating a collection of standardized datasets on household consumption Olivier Dupriez World Bank, Development Data Group 6 June.
Functional Databases for Longitudinal Analyses and Tips of the Trade: The Case of the NPHS in Canada. Amélie Quesnel-Vallée McGill University Émilie Renahy.
Bureau of Economic Research, University of Dhaka The Role of Credit in Food Production, Food Security & Dietary Diversity in Bangladesh Authors Dr. Sayema.
Causality and confounding variables Scientists aspire to measure cause and effect Correlation does not imply causality. Hume: contiguity + order (cause.
Using Weighted Data Donald Miller Population Research Institute 812 Oswald Tower, December 2008.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
Using ADePT Labor for Labor Market Analysis Using Rwanda as an Example Paul Cichello May 1, 2009 The Employment Lab: New Diagnostic Tools for Employment.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Disclosure Avoidance at Statistics Canada INFO747 Session on Confidentiality Protection April 19, 2007 Jean-Louis Tambay, Statistics Canada
Week 7Review SessionSlide #1 Review Regression Model Assumptions Calculus –Derivation, critical points (Excel, plots, etc.) –Deriving OLS estimators (b.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Analysis Introduction Data files, SPSS, and Survey Statistics.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Analytical Example Using NHIS Data Files John R. Pleis.
FRAMES FOR AGRICULTURAL CENSUS AND SURVEYS. INTRODUCTION  The Samoan households survey framework incorporates the following sampling schemes ٭ Stratified.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
Stretching Your Data Management Skills Chuck Humphrey University of Alberta Atlantic DLI Workshop 2003.
Analysis of Experiments
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
Multiple Indicator Cluster Surveys Data Processing Workshop Overview of SPSS structural check programs and frequencies MICS Data Processing Workshop.
Understanding SPSS Workshop Series February 18, 2016.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Introduction to the SPSS Interface
CHAPTER 3 Describing Relationships
Analyze ICD-10 Diagnosis Codes with Stata
Introduction to SPSS.
Happiness comes not from material wealth but less desire.
Secondary Data Analysis Lec 10
Regression Forecasting and Model Building
SEM: Step by Step In AMOS and Mplus.
Introduction to the SPSS Interface
Ordinary Least Square estimator using STATA
Presentation transcript:

Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India

Objective of the session To understand Data File Management, Quality checking a dataset & missing values through software packages

1. What are the procedure one should follow before proceeding for statistical analysis through a software? 2. How do we check quality of data? 3. How do we organize the dataset through a software?

Data sources International Food Policy Research Institute (IFPRI) – Bangladesh Bureau of Statistics – Household Income and Expenditure Surveys (HIES) – 2004/2005 Bangladesh Demographic and Health Survey (BDHS)

IFPRI Dataset Chronic Poverty Study (resurvey 3 studies) 1.Micronutrients Gender/Agricultural Technology ( ) – 5 Thanas 2. Food for Education/Cash for Education - (2000 (10 Thanas) & 2003 (8 Thanas)) 3. Microfinance (1994 – 5 Thanas) Institute involved: IFPRI, Chronic Poverty Research Center, Data Analysis and Technical Assistance

In the resurvey, all thanas from the 1994, & 2003 rounds were resurveyed

Micronutrients Gender/Agricultural Technology Hereafter we refer MCG study also known as Agricultural Technology or Ag Tech “A census of households was conducted in villages where the NGO had introduced the agricultural technology and comparable villages where NGO was operating, but where the new technologies had not yet been introduced”.

There are two major type of households selected from census 1. NGO – members adopting agricultural tech households 2. NGO members likely adopter households in villages where the technology was not yet introduced

330 Households 1304 HHs in the resurvey for AgrTech AgriTech introduced – “A” type villages AgriTech not introduced – “B” type villages 110 NGO Members adopter HHs “A” - HHs 55 Non adopter non-NGO Members & NGO members UNLIKELY to adopt “C1” HHs 110 NGO Members LIKELY adopter –“B” HHs 55 Non LIKELY adopter non NGO members & NGO members unlikely to adopt “C2” HHs

What are the procedure one should follow before proceeding for statistical analysis through a software? SPSS

1.Identify the data file format and convert them into relevant software (SPSS) data file format (*.sav) 2.Make sure that COMPLETE variables and observations has been converted into SPSS Format 3.Identify the characteristics of the variables for the analysis 4.Save name of the file smaller size 5.It is better to have no space in the file name 6.Organize the data file at one place and folder 7.When ever we work on data, please append the files with the previous programme file.

How do we check quality of data? There are few things that needs to be checked before we proceed for any statistical analysis 1. Missing values 2. Wrong coding system 3. Outliers 4. Digits in the variables (specially for value term variables) 5. Unique numbers of id for the observation 6. Relevant variable characteristics i.e string, numberic etc

SPSS has some good routines for detecting outliers There is always the FREQUENCIES routine, of course. The PLOTS command can do scatterplots of 2 variables. The EXAMINE procedure includes an option for printing out the cases with the 5 lowest and 5 highest values. The REGRESSION command can print out scatterplots (particularly good is *ZRESID by *ZPRED, which is a plot of the standardized residuals by the standardized predicted values). In addition, the regression procedure will produce output on CASEWISE DIAGNOSTICS, which indicate which cases are extreme outliers.

Detecting the problem Scatterplots, frequencies can reveal atypical cases Can also look for cases with very large residuals. Suspicious correlations sometimes indicate the presence of outliers.

The difference between STATA & SPSS Probably the most critical difference between SPSS and STATA is that STATA includes additional routines (e.g. rreg, qreg) for addressing the problem of outliers, which we will discuss in future classes.