SJTU CMGPD 2012 Methodological Lecture Day 3 Position and Status Variables.

Slides:



Advertisements
Similar presentations
Stata as a Data Entry Management Tool
Advertisements

Case Study Lecture 4 UML Huma Ayub Department of Software Engineering
Inter-Warehouse Transfers An Enhancement For iSeries 400 DMAS from  Copyright I/O International, 2004, 2005, 2007, 2010 Skip Intro.
SJTU CMGPD 2012 Methodological Lecture Day 2 TABLE, COLLAPSE, HISTOGRAM, TWOWAY BAR.
CMGPD-LN Methodological Lecture Day 1 Why Use Historical Data? Origins of the CMGPD-LN Basic Characteristics of the CMPGD-LN.
Measurement Spring Topics From abstraction to measure Sources of error What to do about error Practical ways to improve measurement Data.
Introduction to Structured Query Language (SQL)
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
Searching the University of Alberta Library’s Statistics Canada-based Websites 2001 Census of Canada Canadian Centre for Justice Statistics Canadian Business.
© John M. Abowd 2005, all rights reserved Sampling Frame Maintenance John M. Abowd February 2005.
Introduction to Structured Query Language (SQL)
Raw Census Microdata from IPUMS IPUMS Data Structure Household record (shaded) followed by a person record for each member of the household Relationship.
Getting Started with your data
Basic Concept of Data Coding Codes, Variables, and File Structures.
Analyzing Quantitative Data Lecture 21 st. Recap Questionnaires are often used to collect descriptive and explanatory data Five main types of questionnaire.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
8 Copyright © 2004, Oracle. All rights reserved. Creating LOVs and Editors.
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
  Online public access catalog is an online database of materials held by a library or a group of libraries.  Users search a library catalog principally.
Lesson 7-Creating and Changing Directories. Overview Using directories to create order. Managing files in directories. Using pathnames to manage files.
Matching school attendance boundaries with schools from CCD dataset.
SJTU CMGPD 2012 Methodological Lecture Day 9 Kinship.
SJTU CMGPD Methodological Lecture Day 8 Family and contextual influences.
Key Data Management Tasks in Stata
SJTU CMGPD 2012 Methodological Lecture Recommended Acknowledgments Contemporary Applications of Historical Data Origins of the CMGPD-LN Key Features.
SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Lesson 9-Setting and Using Permissions. Overview Describing file permissions. Using execute permissions with a file. Changing file permissions using mnemonics.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Introduction to Using the Data Step Hash Object with Large Data Sets Richard Allen Peak Stat.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.
Data Analysis.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
CONTENTS Processing structures and commands Control structures – Sequence Sequence – Selection Selection – Iteration Iteration Naming conventions – File.
PROCESSING OF DATA The collected data in research is processed and analyzed to come to some conclusions or to verify the hypothesis made. Processing of.
Programming with Microsoft Visual Basic 2008 Fourth Edition Chapter Eight String Manipulation.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Overview of C. C—a high-level programming language developed in 1972 by Dennis Ritchie at AT&T Bell Laboratories. We will discuss: –the elements of a.
SJTU CMGPD 2012 Methodological Lecture Day 1 (supplemental) Strengths and Weaknesses of the CMGPD-LN.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
CS212: Object Oriented Analysis and Design Lecture 19: Exception Handling.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
Extracting Information from an Excel List The purpose of creating a database, or list in Excel, is to be able to manipulate the data elements in ways that.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
Day 11 Methodological Lecture Migration. Measuring migration Create a event variable from comparison of unique values of UNIQUE_VILLAGE_ID Make sure to.
CMSC 202 Computer Science II for Majors. CMSC 202UMBC Topics Exceptions Exception handling.
Using Data from the National Survey of Children with Special Health Care Needs Centers for Disease Control and Prevention National Center for Health Statistics.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Use Case Diagrams. Introduction In the previous Lecture, you saw a brief review of the nine UML diagrams. Now that you have the clear, you'll start to.
Data Preparation and Description Lecture 24 th. Recap If you intend to undertake quantitative analysis consider the following: type of data (scale of.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
Ec 2390: Section 1 Useful STATA commands Jack Willis September 14th, 2015.
LM 5 Introduction to SQL MISM 4135 Instructor: Dr. Lei Li.
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
1 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 Caveats, Versions, Quality and Documentation Specification Chris Perry.
1 Indifference Curves and Utility Maximization CHAPTER 6 Appendix © 2003 South-Western/Thomson Learning.
Data Entry, Coding & Cleaning SPSS Training Thomas Joshua, MS July, 2008.
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Setting up your database And codebook. What is a codebook? It is a description of all your variables How they were created How they are scored Includes.
EMPA Statistical Analysis
Chapter 18 Automatic Account Assignment
Coding Manual and Process
ECONOMETRICS ii – spring 2018
Stata Basic Course Lab 4.
CMGPD-LN Methodological Lecture
CMGPD-LN Methodological Lecture Day 4
Presentation, data and programs at:
CMGPD-LN Methodological Lecture Day 3
Presentation transcript:

SJTU CMGPD 2012 Methodological Lecture Day 3 Position and Status Variables

Variables for position The basic and analytic files include a variety of indicator variables for whether a male holds position These are based on the statuses recorded in the registers – File with hanyu pinyin for raw occupations has been released DS 6 – Occupations with original Chinese characters are released as PDF Turned out to be difficult to include Chinese characters in the released data

Variables for position In the original data, entries included the official positions held by males. Coders assigned a numeric code to each new position, and entered the code into the dataset. – Codes started again for each new dataset Transcribed the original Chinese into a codebook Can use DATASET and POSITION_CODE to look up original Chinese in the appendix to the Analytic release codebook DS 6 allows merging of hanyu pinyin for code, if you want to create your own position variables from the originals.

Position variables We have provided a variable of flag variables identifying different kinds of position We have a separate file that for each combination of dataset and numeric position code specifies the hanyu pinyin and Chinese characters. This file provides flag and other variables describing characters of positions. These flags are merged back into the main file to provide variables for analysis.

Created Position Variables HAS_POSITION – Any salaried official position or purchased title – Doesn’t include miding, piding, etc. Those were statuses, not salaried official positions ESTIMATED_INCOME – Imputed income based on stipends associated with the position(s) held by an individual RANK – Bureaucratic rank, based on specification of pin in the position

Position variables BI_TIE_SHI, ZHI_SHI_REN, and flags for specific positions JUAN, DING_DAI etc. for presence of modifiers EXAMINATION for any examination-related title NO_STATUS indicates that no status at all was recorded for a male, even though we would have expected one.

Name variables HAS_SURNAME DIMINUTIVE_NAME RUSTIC_NAME NON_HAN_NAME NUMBER_NAME

Creating New Variables DS-6 contains pinyin for positions DATASET and POSITION_CODE are the basis of a merge back to the data files POSITION_PINYIN is the ‘raw’ position, as transcribed by the coders POSITION_CORE is a stripped down version that includes modifiers Chinese characters are in an appendix to the Analytic File codebook

Creating new variables STATA lets you search strings for particular values, and return an indicator if a string is fine. Can use this for occupations of special interest For example, – generate artisan = index(POSITION_PINYIN,"jiang") > 0 – generate juanna = index(POSITION_PINYIN,”juan na”) > 0 Can code positions manually using Chinese characters in the appendix of the Analytic File codebook

Studying attainment We have mainly used event-history – Determinants of chances of attaining position by next register – Allows for consideration of time-varying characteristics Characteristics of kin An alternative would be to look at determinants of attaining a position by a specific age, with one observation per person

Creating variables to identify attainment of position by next register generate at_risk_position = SEX == 2 & PRESENT & NEXT_3 & HAS_POSITION == 0 bysort PERSON_ID (YEAR): generate next_position = at_risk_position & HAS_POSITION[_n+1] bysort AGE_IN_SUI: egen total_at_risk_position = total(at_risk_position) bysort AGE_IN_SUI: egen total_next_position = total(next_position) generate p_next_position = total_next_position/total_at_risk_position bysort AGE_IN_SUI: generate first_in_age = _n == 1 twoway line p_next_position AGE_IN_SUI if AGE_IN_SUI >= 1 & AGE_IN_SUI <= 80 & first_in_age, ytitle("Proportion attaining position by next register") scheme(s1mono)

bysort bysort groups the records in the dataset according to the values of the specified variables. Each set of records defined by a unique value of the specified variables is treated as a distinct block of records when the command is executed. If a variable is in parentheses, the data is sorted on that variable, but not divided according to the unique values of that variable. [ ]allows access to values from other observations in the same block. [1] says to draw the value of a variable from the first record in the block, [_N] from the last record, [_n+1] the next record and so forth _n refers to the location of the current record within the block

Create a variable with the record number within x: – bysort x (y): generate a = _n Create a flag identifying the first record within x: – bysort x (y): generate b = _n == 1 Create a flag identifying the last record within x: – bysort x (y): generate c = _N == _n Create a variable with the total number of records with that unique value of x: – bysort x (y): generate d = _N Create a variable with the y from the next record within x: – bysort x (y): generate e = y[_n+1] xy

Results xyabcde