Key variables1 Key Variables: Social Science Measurement and Functional Form Presentation to: ‘ Interpreting results from statistical modelling – A seminar.

Slides:



Advertisements
Similar presentations
Research Strategies: Joining Deaf Educators Together Deaf Education Virtual Topical Seminars Donna M. Mertens Gallaudet University October 19, 2004.
Advertisements

Innovation data collection: Advice from the Oslo Manual South East Asian Regional Workshop on Science, Technology and Innovation Statistics.
The Robert Gordon University School of Engineering Dr. Mohamed Amish
GEODE - NeSC workshop, Oct 2006 GEODE: Grid Enabled Occupational Data Environment Paul Lambert and Larry Tan University of Stirling
For the e-Stat meeting of 6-7 April 2011 Paul Lambert / DAMES Node inputs 1)Updates on DAMES 2)Bringing DAMES inputs to e-Stat 3)Misc. feedback - Stat-JR.
DAMES - Data Management through e-Social Science 1 DAMES: Data Management through e-Social Science NCeSS Research Node University of Stirling / University.
Conceptualization, Operationalization, and Measurement
Brief introduction on Logistic Regression
Methods of Economic Investigation Lecture 2
Barbara M. Altman Emmanuelle Cambois Jean-Marie Robine Extended Questions Sets: Purpose, Characteristics and Topic Areas Fifth Washington group meeting.
Using the Crosscutting Concepts As conceptual tools when meeting an unfamiliar problem or phenomenon.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
1 Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
1 Procedural Analysis or structured approach. 2 Sometimes known as Analytic Induction Used more commonly in evaluation and policy studies. Uses a set.
Models with Discrete Dependent Variables
1 Editing Administrative Data and Combined Data Sources Introduction.
Lecture 8 Relationships between Scale variables: Regression Analysis
GEODE Project introduction and summary, 12/12/05 GEODE: Grid Enabled Occupational Data Environment GEODE Project introduction and summary, 12/12/05 Motivation.
INFO 271B LECTURE 2 COYE CHESHIRE Foundations of Research.
Researching society and culture Alan Bradley
Chapter 7 Correlational Research Gay, Mills, and Airasian
Data Management: Quantifying Data & Planning Your Analysis
How to write a publishable qualitative article
NCRM, Session 27, 1 July Handling data on occupations, educational qualifications, and ethnicity Paul Lambert & Vernon Gayle, Univ. Stirling Talk.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
LDA, 11th May Variable constructions in Longitudinal Research: Ethnicity Dr Paul Lambert, University of Stirling Session 2 of the ESRC Research Methods.
RC33 Aug Lambert1 Ethnicity and the Comparative Analysis of Contemporary Survey Data Paul S. Lambert Stirling University, UK
26 August 2015© Academic Conferences Limited, Validity, Reliability and Generalisability by Dr Dan Remenyi Visiting Professor School of Systems.
Are the results valid? Was the validity of the included studies appraised?
ECON 6012 Cost Benefit Analysis Memorial University of Newfoundland
GEODE, March 2007 Handling Occupational Information and Introduction to GEODE GEODE – Grid Enabled Occupational.
ESRC - NCRM - Apr Concepts and Measures in occupation-based social classifications Presentation to: ‘Interpreting results from statistical modelling.
Understanding Trends in Occupational Sex Segregation By Daniel Guinea-Martin Advanced Centre for Scientific Research, Spain (formerly at the Office for.
Statistics for Education Research Lecture 5 Tests on Two Means: Two Independent Samples Independent-Sample t Tests Instructor: Dr. Tung-hsien He
GEODE, 16 Jan 2007 Curating Occupational Information GEODE – Grid Enabled Occupational Data Environment Session.
GEODE, 16 Jan 2007 Handling Occupational Information and Introduction to GEODE GEODE – Grid Enabled Occupational.
INTERNATIONAL SOCIETY FOR TECHNOLOGY IN EDUCATION working together to improve education with technology Using Evidence for Educational Technology Success.
GEODE - eSS Manchester, June 2006 Development of a Grid Enabled Occupational Data Environment GEODE – Paper presented.
Evaluating a Research Report
Longitudinal Data Analysis Professor Vernon Gayle
Quantitative Analysis. Quantitative / Formal Methods objective measurement systems graphical methods statistical procedures.
GEODE / SSSN, 23 Jan 2008 Handling Occupational Information GEODE – Presentation to Scottish Social Survey Network,
LEVEL 3 I can identify differences and similarities or changes in different scientific ideas. I can suggest solutions to problems and build models to.
New and easier ways of working with aggregate data and geographies from UK censuses Justin Hayes UK Data Service Census Support.
GEODE - Glasgow DCC, Nov 2006 Data curation standards and the messy world of social science occupational information resources Paper presented to the 2nd.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 16.
Application 3: Estimating the Effect of Education on Earnings Methods of Economic Investigation Lecture 9 1.
An overview of multi-criteria analysis techniques The main role of the techniques is to deal with the difficulties that human decision-makers have been.
Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.
Quantitative research – variables, measurement levels, samples, populations HEM 4112 – Research methods I Martina Vukasovic.
Academic Research Academic Research Dr Kishor Bhanushali M
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
GEODE - Durban ISA RC33, July 2006 Utilising a Grid Enabled Occupational Data Environment GEODE – Paper presented.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Resolving the Goldilocks problem: Variables and measurement Jane E. Miller, PhD.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Academic perspectives: Quantitative and qualitative paradigms in studying migrant youth identity Paul Lambert (University of Stirling) Presentation to.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Eurostat Accuracy of Results of Statistical Matching Training Course «Statistical Matching» Rome, 6-8 November 2013 Marcello D’Orazio Dept. National Accounts.
Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,
UNECE Work Session on Gender Statistics 6-8 october 2008 Geneva 1 MEASUREMENT ISSUES AND MULTIDISCRIMINATION: GENDER AND ETHNICITY Ko Oudhof Statistics.
A research and policy informed discussion of cross-curricular approaches to the teaching of mathematics and science with a focus on how scientific enquiry.
Tools of data analysis Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 2 on.
SIMD and the flaws of area- based socio-economic profiles Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership.
GEODE, March 2007 Occupational Analysis – the examples of: - the Youth Cohort Study of England & Wales - ‘By Slow Degrees’ - social mobility research Grid.
Webinar 4: Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions Scottish Civil Society Data Partnership.
Occupational data Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 3 on ‘Dealing.
Standard measures and variables Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar.
Presentation transcript:

Key variables1 Key Variables: Social Science Measurement and Functional Form Presentation to: ‘ Interpreting results from statistical modelling – A seminar for Scottish Government Social Researchers”, Edinburgh, 1 April 2009 Dr Paul Lambert and Professor Vernon Gayle University of Stirling

Key variables2 Key Variables: Social Science Measurement and Functional Form 1)Working with variables - ‘Beta’s in Society’ and ‘Demystifying Coefficients’ 2)Key Variables and social science measurement - Harmonisation and standardisation - An example: occupations 3)Functional Form

Key variables3 ‘Beta’s in Society’ and ‘Demystifying Coefficients’  Dorling, D., & Simpson, S. (Eds.). (1999). Statistics in Society: The Arithmetic of Politics. London: Arnold.  Irvine, J., Miles, I., & Evans, J. (Eds.). (1979). Demystifying Social Statistics. London: Pluto Press. Famous works on critical interpretation of social statistics tend to have a univariate / bivariate focus –Measuring unemployment; averaging income; bivariate significance tests; correlation v’s causation But social survey analysts usually argue that complex multivariate analyses are more appropriate..  Critical interpretation of joint relative effects  Attention to effects of ‘key variables’ in multivariate analysis

Key variables4 “A program like SPSS.. has two main components: the statistical routines,.. and the data management facilities. Perhaps surprisingly, it was the latter that really revolutionised quantitative social research” [Procter, 2001: 253] “Socio-economic processes require comprehensive approaches as they are very complex (‘everything depends on everything else’). The data and computing power needed to disentangle the multiple mechanisms at work have only just become available.” [Crouchley and Fligelstone 2004]

Key variables5 Large scale survey data: 2 technological themes We’re data rich (but analysts’ poor) Plenty of variables (a thousand is common) Plenty of cases We work overwhelmingly through individual analysts’ micro-computing –impact of mainstream software –Pressure for simple / accessible / popular analytical techniques (whatever happened to loglinear models?) –Propensity for simple ‘data management’ –Specialist development of very complex analytical packages for very simple sets of variables

Key variables6 Survey research: Access, manipulate & analyse patterns in variables (‘variable by case matrix’)

Key variables7 Critical role of syntactical records in working with data & variables Reproducible (for self) Replicable (for all) Paper trail for whole lifecycle Cf. Dale 2006; Freese 2007 In survey research, this means using clearly annotated syntax files (e.g. SPSS/Stata) Syntax Examples:

Key variables8 Stata syntax example (‘do file’)

Key variables9 Some comments on survey analysis software for analysing variables.. Data management and data analysis must be seen as integrated processes Stata is the most effective software, as it combines advanced data management and data analysis functionality and makes good documentation easy For an extended example of using Stata, concentrating on variable operationalisations and standardisations: –Lambert, P. S., & Gayle, V. (2009). Data management and standardisation: A methodological comment on using results from the UK Research Assessment Exercise Stirling: University of Stirling, Technical paper of the Data Management through e-Social Science research Node ( E.g. “do E.g. “use clear”

Key variables10 Working with variables and understanding ‘variable constructions’ Meaning? –Coding frames; re-coding decisions; metric transformations and functional forms; relative effects in multivariate models –Data collection and data analysis –Cf. processes by which survey measures are defined and subsequently interpreted by research analysts

Key variables11 β’s - Where’s the action? If we have lots of variables, lots of cases, yet often quite simple techniques and software, the action is primarily in the variable constructions… The example of social mobility research – see Lambert et al. (2007) i.How we chose between alternative measures ii.How much data management we try (or bother with) Plus other issues in how we analyse & interpret the coefficients from the models we use (..elsewhere today..)

Key variables12 i) Choosing measures See (2) below A sensible starting point is with ‘key variables’ Approaches to standardisation / harmonisation {Lack of} awareness of existing resources See (3) below Influence of functional form

Key variables13 ii) Data management – e.g. recoding data

Key variables14 ii) Data management – e.g. Missing data / case selection

Key variables15 ii) Data management – e.g. Linking data Linking via ‘ojbsoc00’ : c1-5 =original data / c6 = derived from data / c7 = derived from

Key variables16 Aspects of data management… Manipulating data  Recoding categories / ‘operationalising’ variables Linking data  Linking related data (e.g. longitudinal studies)  combining / enhancing data (e.g. linking micro- and macro-data) Secure access to data  Linking data with different levels of access permission  Detailed access to micro-data cf. access restrictions Harmonisation standards  Approaches to linking ‘concepts’ and ‘measures’ (‘indicators’)  Recommendations on particular ‘variable constructions’ Cleaning data  ‘missing values’; implausible responses; extreme values

Key variables17 ‘The significance of data management for social survey research’ see and The data manipulations described above are a major component of the social survey research workload  Pre-release manipulations performed by distributors / archivists –Coding measures into standard categories –Dealing with missing records  Post-release manipulations performed by researchers –Re-coding measures into simple categories We do have existing tools, facilities and expert experience to help us…but we don’t make a good job of using them efficiently or consistently So the ‘significance’ of DM is about how much better research might be if we did things more effectively…

Key variables18 Data Management through e-Social Science (DAMES – Supporting operations on data widely performed by social science researchers –Matching data files together –‘Cleaning’ data –Operationalising variables –Specialist data resources (occupations; education; ethnicity) Why is e-Social Science relevant? –Dealing with distributed, heterogeneous datasets –Generic data requirements / provisions –Lack of previous systematic standards (e.g. metadata; security; citation procedures; resources to review/obtain suitable data)

Key variables19 Working with variables – further issues Re-inventing the wheel –…In survey data analysis, somebody else has already struggled through the variable constructions your are working on right now… –Increasing attention to documentation and replicability [cf Dale 2006; Freese 2007] Guidance and support –In the UK, use –Most guidance concerns collecting & harmonising data –Less is directed to analytically exploiting measures

Key variables20 Key Variables: Social Science Measurement and Functional Form 1)Working with variables - ‘Beta’s in Society’ and ‘Demystifying Coefficients’ 2)Key Variables and social science measurement - Harmonisation and standardisation - An example: occupations 3)Functional Form

Key variables21 Key variables and social science measurement Defining ‘key variables’ -Commonly used concepts with numerous previous examples -Methodological research on best practice / best measurement [cf. Stacey 1969; Burgess 1986] ONS harmonisation ‘primary standards’

Key variables22 Key variables: concepts and measures VariableConceptSomething useful OccupationClass; stratification; unemployment EducationCredentials; Ability; Meritwww.equalsoc.org/8www.equalsoc.org/8 ; [Schneider 2008] Ethnic groupEthnicity; race; religion; national origins [Bosveld et al 2006] AgeAge; life course stage; cohort [Abbott 2006] GenderGender; household / family context IncomeIncome; wealth; poverty; [SN 3909]

Key variables23 Key variables –Standardisation Much attention to key variables involves proposing optimum / standard measures UK – ONS Harmonisation EU – Eurostat standards Studies of ‘criterion’ and ‘construct’ validity Standardisation impacts other analyses –Affects available data –Affects popular interpretations of data

Key variables24 “a method for equating conceptually similar but operationally different variables..” [Harkness et al 2003, p352] Input harmonisation [esp. Harkness et al 2003] ‘harmonising measurement instruments’ [H-Z and Wolf 2003, p394] –unlikely / impossible in longer-term longitudinal studies –common in small cross-national and short term lngtl. studies Output harmonisation (‘ex-post harmonisation’) ‘harmonising measurement products’ [H-Z and Wolf 2003, p394] Key variables – Harmonisation (across countries; across time periods)

Key variables25 More on harmonisation [esp. HZ and Wolf 2003, p393ff] Numerous practical resources to help with input and output harmonisation –[e.g. ONS ; UN / EU / NSI’s; LIS project IPUMS ] –[Cross-national e.g.: HZ & Wolf 2003; Jowell et al. 2007] Room for more work in justifying/ understanding interpretations after harmonisation

Key variables26 “the degree to which survey measures or questions are able to assess identical phenonema across two or more cultures” [Harkness et al 2003, p351] Equivalence Measurement equivalence involves same instruments and equality of measures (e.g. income in pounds) Functional equivalence involves different instruments, but addresses same concepts (e.g. inflation adjusted income)

Key variables27 “Equivalence is the only meaningful criterion if data is to be compared from one context to another. However, equivalence of measures does not necessarily mean that the measurement instruments used in different countries are all the same. Instead it is essential that they measure the same dimension. Thus, functional equivalence is more precisely what is required” [HZ and Wolf 2003, p389]

Key variables28 Harmonisation & equivalence combined  ‘Universality’ or ‘specificity’ in variable constructions Universality: collect harmonised measures, analyse standardised schemes Specificity: collect localised measures, analyse functionally equivalent schemes  Most prescriptions aim for universality  But specificity is theoretically better  Specificity is more easily obtained than is often realised  Especially for well-known ‘key variables’

Key variables29 Working with key variables - speculation a) Data manipulation skills and inertia I would speculate that around 80% of applications using key variables don’t consult literature and evaluate alternative measures, but choose the first convenient and/or accessible variable in the dataset  Data supply decisions (‘what is on the archive version’) are critical Much of the explanation lies with lack of confidence in data manipulation / linking data Too many under-used resources – cf.

Key variables30 Working with key variables – speculation b) Endogeneity and key variables ‘everything depends on everything else’ [Crouchley and Fligelstone 2004] We know a lot about simple properties of key variables Key variables often change the main effects of other variables Simple decisions about contrast categories can influence interpretations Interaction terms are often significant and influential We have only scratched the surface of understanding key variables in multivariate context and interpretation Key variables are often endogenous (because they are ‘key’!) Work on standards / techniques for multi-process systems and/or comparing structural breaks involving key variables is attractive

Key variables31 An example: Occupations In the social sciences, occupation is seen as one of the most important things to know about a person  Direct indicator of economic circumstances  Proxy Indicator of ‘social class’ or ‘stratification’ Projects at Stirling ( GEODE – how social scientists use data on occupations DAMES – extending GEODE resources

Stage 1 - Collecting Occupational Data (and making a mess) Example 1: BHPS Occ descriptionEmployment statusSOC-2000EMPST Miner (coal)Employee81227 Police officer (Serg.)Supervisor33126 Electrical engineerEmployee21237 Retail dealer (cars)Self-employed w/e12342 Example 2: European Social Survey, parent’s data Occ descriptionSOC-2000EMPST Miner?8122?6/7 Police officer?3312?6/7 Engineer?? Self employed businessman???1/2

Key variables33

34 Occupations: we agree on what we should do: Preserve two levels of data  Source data: Occupational unit groups, employment status  Social classifications and other outputs Use transparent (published) methods [i.e. OIR’s]  for classifying index units  for translating index units into social classifications for instance..  Bechhofer, F 'Occupations' in Stacey, M. (ed.) Comparability in Social Research. London: Heinemann.  Jacoby, A 'The Measurement of Social Class' Proceedings from the Social Research Association seminar on "Measuring Employment Status and Social Class". London: Social Research Association.  Lambert, P.S 'Handling Occupational Information'. Building Research Capacity 4:  Rose, D. and Pevalin, D.J 'A Researcher's Guide to the National Statistics Socio- economic Classification'. London: Sage.

35 …in practice we don’t keep to this... Inconsistent preservation of source data Alternative OUG schemes SOC-90; SOC-2000; ISCO; SOC-90 (my special version) Inconsistencies in other index factors ‘employment status’; supervisory status; number of employees Individual or household; current job or career Inconsistent exploitation of Occupational Information Numerous alternative occupational information files (time; country; format) Substantive choices over social classifications Inconsistent translations to social classifications – ‘by file or by fiat’ Dynamic updates to occupational information resources Strict security constraints on users’ micro-social survey data Low uptake of existing occupational information resources

Key variables36 GEODE provides services to help social scientists deal with occupational information resources 1)disseminate, and access other, Occupational Information Resources 2)Link together their (secure) micro-data with OIR’s External user (micro-social data) Occ info (index file) (aggregate) User’s output (micro-social data) idougsex.ougCS-MCS-FEGPidougCS I II VIIa

Occupational information resources: small electronic files about OUGs… Index units# distinct files (average size kb) Updates? CAMSIS, Local OUG*(e.s.) 200 (100)y CAMSIS value labels Local OUG50 (50)n ISEI tools, home.fsw.vu.nl/~ganzeboo m Int. OUG20 (50)y E-Sec matrices Int. OUG*(e.s.) 20 (200)n Hakim gender seg codes (Hakim 1998) Local OUG2 (paper)n

Key variables38 For example: ISCO-88 Skill levels classification

Key variables39 and: UK 1980 CAMSIS scales and CAMCON classes

Key variables40 Existing resources on occupations Popular websites: Emerging resource: Some papers: –Chan, T. W., & Goldthorpe, J. H. (2007). Class and Status: The Conceptual Distinction and its Empirical Relevance. American Sociological Review, 72, –Rose, D., & Harrison, E. (2007). The European Socio-economic Classification: A New Social Class Scheme for Comparative European Research. European Societies, 9(3), –Lambert, P. S., Tan, K. L. L., Gayle, V., Prandy, K., & Bergman, M. M. (2008). The importance of specificity in occupation-based social classifications. International Journal of Sociology and Social Policy, 28(5/6),

Key variables41 Using data on occupations – further speculation Growing interest in longitudinal analysis and use of longitudinal summary data on occupations Intuitive measures (e.g. ever in Class I)  Lampard, R. (2007). Is Social Mobility an Echo of Educational Mobility? Sociological Research Online, 12(5). Empirical career trajectories / sequences  Halpin, B., & Chan, T. W. (1998). Class Careers as Sequences. European Sociological Review, 14(2), Growing cross-national comparisons –Ganzeboom, H. B. G. (2005). On the Cost of Being Crude: A Comparison of Detailed and Coarse Occupational Coding. In J. H. P. Hoffmeyer-Zlotnick & J. Harkness (Eds.), Methodological Aspects in Cross-National Research (pp ). Mannheim: ZUMA, Nachrichten Spezial. Treatment of the non-working populations Seldom adequate to treat non-working as a category ‘Selection modelling’ approaches expanding

Key variables42 Occupations as key variables Extensive debate about occupation-based social classifications Document your procedures....as you may be asked to do something different.. When choosing between occupation-based measures… –They all measure, mostly, the same things –Don’t assume concepts measure measures Lambert, P. S., & Bihagen, E. (2007). Concepts and Measures: Empirical evidence on the interpretation of ESeC and other occupation-based social classifications. Paper presented at the ISA RC28 conference, Montreal ( August),

Key variables43 Key Variables: Social Science Measurement and Functional Form 1)Working with variables - ‘Beta’s in Society’ and ‘Demystifying Coefficients’ 2)Key Variables and social science measurement - Harmonisation and standardisation - An example: occupations 3)Functional Form

Key variables44 ‘Functional form’ The way in which measures are arithmetically incorporated in analysis a)Level of measurement (nominal, ordinal, interval, ratio) b)Alternative models and link functions c)Other variables and interaction effects

Key variables45 a) Levels of measurement and the desire to categorise Categories are easier to envisage / communicate Much harmonisation work ≡ locating into categories Appearance of measurement equivalence But functional equivalence is seldom achieved Metrics are better for functional equivalence E.g. Standardised income How to deal with categorisations? –The qualitative foundation of quantity [Prandy 2002a]

Key variables46 Example: categorisation and the scandalous use of collapsed EGP/NS-SEC…! Ignores heterogeneity within occupations Defines and hinges on arbitrary boundaries Creates artefactual gender differences

Key variables47 The scaling alternative… Many concepts can be reasonably regarded as metric –cf. simplified / dichotomisted categorisations Comparability / standardisation is easier with scales Complex / Multi-process systems are easier with scales –Structural Equation Models –Interaction effects Growing availability/use of distance score techniques –Stereotyped ordered logit [‘slogit’ in Stata] –Correspondence Analysis –Latent variable models …But, scaling seems to be seen by some as a wicked, positivistic activity..!

Key variables48 Practical suggestions on the level of measurement It’s rare not to have a few alternative measures of the same concepts at different levels of measurement Good practice would be to –try alternative measures and see what difference they make –consider treatment of missing values in relation to measurement instrument choice –Engage as much as possible with other studies

Key variables49 b) Alternative models and link functions The functional form of the outcome variable(s) is of greatest importance (influences which model is used) ‘Link functions’ perform the maths to allow for alternative functional forms of the outcome variable See [Talk 1] for popular alternative models

Key variables50 Practical observations on link functions Social scientists are unduly conservative in choosing between alternative models [We tend to favour binary or metric outcomes and single process systems] i.Substantively, this isn’t ideal ii.Pragmatically, it’s no longer necessary

Key variables51 Substantive risks (of conservative model choice) Attenuated findings –Concentrate on certain category contrasts –Ignore or exacerbate extremes of distribution Mis-specification –Ignore / mis-measure relevant β’s –Ignore / over-emphasise other contextual patterns Endogeneity –ignoring multiprocess system may bias results (e.g. selection bias)

Key variables52 Pragmatics of model choice General rapid expansion in model functionality in statistical packages Stata stands out for it wide range of data management and data analysis functionality –E.g. ‘statsby’; ‘est table’; ‘outreg2’; ‘estout’ facilitate testing and comparing related models with different combinations of variables

Key variables53 c) Other variables and interaction effects A very important influence on one RHS coefficient is what else is in the RHS and what it is interacted with Some brief comments on: Offsets (constraints) Interactions Logit models’ fixed variance

Key variables54 A comment on ‘offsets’ - for comparisons between regressions, it is sometimes suitable to force the coefficients of some variables (e.g. controls) to have a certain fixed value - Below example (predicting income) using ‘cnsreg’ in Stata, e.g.: regress lninc fem age femage matrix define mod1m=e(b) scalar fem_coef=mod1m[1,1] constraint def 1 fem=fem_coef cnsreg lninc fem age femage mcamsis, constraints(1)

Key variables55 Advice on Interaction Effects Start with main effects – get a good idea how they work Be careful how you fit interaction effects –Often appealing substantively –In practice not always significant (especially higher order) –Hard to interpret higher order interactions –Over-fit - check for replication (e.g. in other datasets) –Always wise to formally test interactions (cf. armchair critics) –Best to construct your own interaction variable(s) and maybe fit them as a single X (especially complicated categorical interactions)

56 The fixed variance in logit: linear cf. categorical outcomes GHS Data OLS: Y = age left education (years) Logit: Y = Graduate / Non Graduate X Vars Female 4-category social Class (Advantaged; Lower Supervisory; Semi-routine; Routine) Age (centred at 40)

Key variables57 Regression Estimates ABCDE Female Age (40) Supervisory Semi- Routine Routine Constant

Key variables58 Linear Regression Models 1 unit change in X leading to a  change in Y The  is consistent – minor insignificant random variation (survey data) As long as the X vars are uncorrelated (a classical regression assumption)

Key variables59 Estimates (logit scale) ABCDE Female Age (40) Supervisory Semi-Routine Routine Constant Parameterization ??

Key variables60 Logit Model Estimates on a logit scale The  estimates a shift from X 1 =0 to X 1 =1 leads to a change in the log odds of y=1 Even when the X vars are uncorrelated, including additional variables can lead to changes in  estimates The  estimates the effect given all other X vars in the model Fixed variance in the logit model(   / 3 )

Key variables61 Summary – Social science measurement and functional form We argue that the route to better critical understanding of variable effects combines complex analysis with many mundane, prosaic tasks in checking data –ANALYSIS: Coefficient effects in multivariate models; multi-process models; understanding interactions; etc –DATA MANAGEMENT: Re-coding data; linking data; missing data mechanisms; reviewing literature Seldom central to previous methodological reviews Cf.

62 References  Abbott, A. (2006). Mobility: What? When? How? In S. L. Morgan, D. B. Grusky & G. S. Fields (Eds.), Mobility and Inequality. Stanford University Press.  Bosveld, K., Connolly, H., Rendall, M. S., & (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics.  Burgess, R. G. (Ed.). (1986). Key Variables in Social Investigation. London: Routledge.  Crouchley, R., & Fligelstone, R. (2004). The Potential for High End Computing in the Social Sciences. Lancaster: Centre for Applied Statistics, Lancaster University, and  Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2),  Dorling, D., & Simpson, S. (Eds.). (1999). Statistics in Society: The Arithmetic of Politics. London: Arnold.  Freese, J. (2007). Replication Standards for Quantitative Social Science: Why Not Sociology? Sociological Methods and Research, 36(2),  Harkness, J., van de Vijver, F. J. R., & Mohler, P. P. (Eds.). (2003). Cross-Cultural Survey Methods. New York: Wiley.  Hoffmeyer-Zlotnik, J. H. P., & Wolf, C. (Eds.). (2003). Advances in Cross-national Comparison: A European Working Book for Demographic and Socio-economic Variables. Berlin: Kluwer Academic / Plenum Publishers.  Irvine, J., Miles, I., & Evans, J. (Eds.). (1979). Demystifying Social Statistics. London: Pluto Press.  Jowell, R., Roberts, C., Fitzgerald, R., & Eva, G. (2007). Measuring Attitudes Cross-Nationally. London: Sage.  Lambert, P. S., Prandy, K., & Bottero, W. (2007). By Slow Degrees: Two Centuries of Social Reproduction and Mobility in Britain. Sociological Research Online, 12(1).  Prandy, K. (2002). Measuring quantities: the qualitative foundation of quantity. Building Research Capacity, 2, 3-4.  Procter, M. (2001). Analysing Survey Data. In G. N. Gilbert (Ed.), Researching Social Life, Second Edition (pp ). London: Sage.  Schneider, S. L. (2008). The International Standard Classification of Education (ISCED-97). An Evaluation of Content and Criterion Validity for 15 European Countries. Mannheim: MZES.  Stacey, M. (Ed.). (1969). Comparability in Social Research. London: Heineman.