SJTU CMGPD 2012 Methodological Lecture Day 3 Position and Status Variables
Variables for position The basic and analytic files include a variety of indicator variables for whether a male holds position These are based on the statuses recorded in the registers – File with hanyu pinyin for raw occupations has been released DS 6 – Occupations with original Chinese characters are released as PDF Turned out to be difficult to include Chinese characters in the released data
Variables for position In the original data, entries included the official positions held by males. Coders assigned a numeric code to each new position, and entered the code into the dataset. – Codes started again for each new dataset Transcribed the original Chinese into a codebook Can use DATASET and POSITION_CODE to look up original Chinese in the appendix to the Analytic release codebook DS 6 allows merging of hanyu pinyin for code, if you want to create your own position variables from the originals.
Position variables We have provided a variable of flag variables identifying different kinds of position We have a separate file that for each combination of dataset and numeric position code specifies the hanyu pinyin and Chinese characters. This file provides flag and other variables describing characters of positions. These flags are merged back into the main file to provide variables for analysis.
Created Position Variables HAS_POSITION – Any salaried official position or purchased title – Doesn’t include miding, piding, etc. Those were statuses, not salaried official positions ESTIMATED_INCOME – Imputed income based on stipends associated with the position(s) held by an individual RANK – Bureaucratic rank, based on specification of pin in the position
Position variables BI_TIE_SHI, ZHI_SHI_REN, and flags for specific positions JUAN, DING_DAI etc. for presence of modifiers EXAMINATION for any examination-related title NO_STATUS indicates that no status at all was recorded for a male, even though we would have expected one.
Name variables HAS_SURNAME DIMINUTIVE_NAME RUSTIC_NAME NON_HAN_NAME NUMBER_NAME
Creating New Variables DS-6 contains pinyin for positions DATASET and POSITION_CODE are the basis of a merge back to the data files POSITION_PINYIN is the ‘raw’ position, as transcribed by the coders POSITION_CORE is a stripped down version that includes modifiers Chinese characters are in an appendix to the Analytic File codebook
Creating new variables STATA lets you search strings for particular values, and return an indicator if a string is fine. Can use this for occupations of special interest For example, – generate artisan = index(POSITION_PINYIN,"jiang") > 0 – generate juanna = index(POSITION_PINYIN,”juan na”) > 0 Can code positions manually using Chinese characters in the appendix of the Analytic File codebook
Studying attainment We have mainly used event-history – Determinants of chances of attaining position by next register – Allows for consideration of time-varying characteristics Characteristics of kin An alternative would be to look at determinants of attaining a position by a specific age, with one observation per person
Creating variables to identify attainment of position by next register generate at_risk_position = SEX == 2 & PRESENT & NEXT_3 & HAS_POSITION == 0 bysort PERSON_ID (YEAR): generate next_position = at_risk_position & HAS_POSITION[_n+1] bysort AGE_IN_SUI: egen total_at_risk_position = total(at_risk_position) bysort AGE_IN_SUI: egen total_next_position = total(next_position) generate p_next_position = total_next_position/total_at_risk_position bysort AGE_IN_SUI: generate first_in_age = _n == 1 twoway line p_next_position AGE_IN_SUI if AGE_IN_SUI >= 1 & AGE_IN_SUI <= 80 & first_in_age, ytitle("Proportion attaining position by next register") scheme(s1mono)
bysort bysort groups the records in the dataset according to the values of the specified variables. Each set of records defined by a unique value of the specified variables is treated as a distinct block of records when the command is executed. If a variable is in parentheses, the data is sorted on that variable, but not divided according to the unique values of that variable. [ ]allows access to values from other observations in the same block. [1] says to draw the value of a variable from the first record in the block, [_N] from the last record, [_n+1] the next record and so forth _n refers to the location of the current record within the block
Create a variable with the record number within x: – bysort x (y): generate a = _n Create a flag identifying the first record within x: – bysort x (y): generate b = _n == 1 Create a flag identifying the last record within x: – bysort x (y): generate c = _N == _n Create a variable with the total number of records with that unique value of x: – bysort x (y): generate d = _N Create a variable with the y from the next record within x: – bysort x (y): generate e = y[_n+1] xy
Results xyabcde