University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Clustering and Scaling (Week 19)

Slides:



Advertisements
Similar presentations
Chapter 18: The Chi-Square Statistic
Advertisements

CHI-SQUARE(X2) DISTRIBUTION
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Basic Statistics The Chi Square Test of Independence.
The General Linear Model Or, What the Hell’s Going on During Estimation?
Analyzing Data: Bivariate Relationships Chapter 7.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Correspondence Analysis Chapter 14.
Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 17 The Chi-Square Statistic: Tests for Goodness of Fit and Independence University.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Next Colin Clarke-Hill and Ismo Kuhanen 1 Analysing Quantitative Data 1 Forming the Hypothesis Inferential Methods - an overview Research Methods Analysing.
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 1 Bivariate Analysis with SPSS Revisited.
Two Variable Statistics Introduction To Chi-Square Test for Independence.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 3 Multivariate analysis.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Clustering and Scaling (Week 19)
Bootstrap and Model Validation
Comparing Counts Chi Square Tests Independence.
Basic Statistics The Chi Square Test of Independence.
Keller: Stats for Mgmt & Econ, 7th Ed Chi-Squared Tests
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Mini-Revision Since week 5 we have learned about hypothesis testing:
CHAPTER 26 Comparing Counts.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survey Design: Some Implications for.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Measuring association and inequality.
Chapter 11 Chi-Square Tests.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 1 Bivariate Analysis with SPSS.
Hypothesis Testing Review
CHAPTER 11 Inference for Distributions of Categorical Data
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Analysing Means II: Nonparametric techniques.
Lecture #27 Tuesday, November 29, 2016 Textbook: 15.1
Chapter 25 Comparing Counts.
Correspondence Maps.
Multidimensional Scaling and Correspondence Analysis
Essentials of Marketing Research William G. Zikmund
Lecture Slides Elementary Statistics Tenth Edition
Chi Square Two-way Tables
Hypothesis testing. Chi-square test
CHAPTER 11 Inference for Distributions of Categorical Data
LEARNING OUTCOMES After studying this chapter, you should be able to
Multivariate Statistics
Contingency Tables: Independence and Homogeneity
Computing A Variable Mean
Chapter 11 Chi-Square Tests.
Non – Parametric Test Dr. Anshul Singh Thapa.
Descriptive Analysis and Presentation of Bivariate Data
Chapter 26 Comparing Counts.
Multidimensional Space,
Chapter 26 Comparing Counts.
Chapter 3 General Linear Model
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Inference for Two Way Tables
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Analysing Means I: (Extending) Analysis.
Chapter 18: The Chi-Square Statistic
Chapter 26 Comparing Counts.
Chapter 11 Chi-Square Tests.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Index Construction (Week 13)
Correspondence Analysis
Presentation transcript:

University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Clustering and Scaling (Week 19)

Distances... Some quantitative techniques derive and/or use distances between variables, or distances between categories within variables, as the basis for the construction of maps or the division of items into sets of similar items. These include multidimensional scaling, correspondence analysis, and cluster analysis.

Multidimensional scaling (MDS) MDS is applied to a set of distances between all pairs of categories within a set of categories. See Coxon (1982); Kruskal and Wish (1978)

Cluster analysis In cluster analysis, distances between items (cases and/variables) are generated from the raw data, and then used to generate a categorisation of the items. See Everitt (1993; see also later editions)

Classifying women’s occupations Dale et al. (1985: see handout) used cluster analysis to develop an ‘alternative’ set of categories for women’s occupations.

The Cambridge Scale The Cambridge Social Stratification scale was originally derived via the application of multidimensional scaling to occupation-based cross-tabulations matching the occupations of individuals and their ‘associates’. It subsequently moved in the direction of using correspondence analysis (Prandy 1990; Prandy and Bottero 1998: 2.6 - see handout).

‘Marriage and the Social Order’ Prandy and Bottero (1998: handout) applied correspondence analysis to occupation-based cross-tabulations to locate occupations on a number of (highly correlated) occupational scales.

Correspondence analysis Correspondence analysis in effect partitions the relationship in a cross-tabulation (and more specifically the chi-square statistic) into components reflecting a number of underlying dimensions (see Greenacre 2007). More specifically, the difference between the distributions of values for two categories is split into components reflecting different underlying dimensions.

Association models More recently the Cambridge scale and international equivalents have tended to use ‘association models’, which are a form of statistical model that echoes aspects of correspondence analysis. See Goodman, L.A. 1986. ‘Some useful extensions to the usual correspondence analysis approach and the usual loglinear approach in the analysis of contingency tables (with comments)’,. Int. Statist. Rev. 54: 243-309. See also: http://www.camsis.stir.ac.uk/

Evaluating the NS-SEC In Rose and Pevalin (2003), various chapters (by Mills and Evans [see extract in handout], Coxon and Fisher, and Fisher) involved the application of cluster analysis, multidimensional scaling, and association models to the relationship between employment relations measures and occupational categories.

More references… Cluster analysis: Multidimensional scaling: Hair, J.F. Jr. and Black, W.C. 2000. ‘Cluster Analysis’. In In L. Grimm and P. R. Yarnold (eds) Reading and Understanding More Multivariate Statistics. Washington, DC: APA Press. Multidimensional scaling: Stalans, L.J. 1995. ‘Multidimensional scaling’. In L. Grimm and P. R. Yarnold (eds) Reading and Understanding Multivariate Statistics. Washington, DC: APA Press. Correspondence analysis: Phillips, D. 1995. ‘Correspondence Analysis’, Social Research Update 7. (http://sru.soc.surrey.ac.uk/SRU7.html)

Row and column scores in correspondence analysis These are chosen in such a way that each successive dimension explains as much of the cross-tabulation’s chi-square statistic as possible, by contributing to a contingency hierarchy (see next slide) which is as small a chi-square ‘distance’ as possible from the residuals of the independence model applied to the original cross-tabulation (i.e. from the expected values within the calculation of the chi-square statistic.)

Table 2/5: First contingency hierarchy (from Lampard 1992: 30; residuals in brackets)   1 2 3 4 5 ROW SCORE ROW PROPORTION 35.66 (39.0) 1.29 (1.4) -9.32 (-20.0) -11.15 (-8.1) -16.48 (-12.3) -0.93 0.20 13.89 (12.1) 0.50 (0.6) -3.63 (3.3) -4.34 (-7.5) -6.42 (-8.4) -0.26 0.28 -1.96 (-5.5) -0.07 (-1.1) 0.51 (13.7) 0.61 (-3.1) 0.91 (-4.0) 0.05 0.21 -18.79 (-20.9) -0.68 4.91 5.88 (10.9) 8.68 (5.2) 0.68 0.15 -28.74 (-24.6) -1.04 (-2.3) 7.51 (-0.3) 8.98 (7.8) 13.28 (19.5) 0.98 0.16 COLUMN SCORE -0.96 -0.04 0.27 0.48 1.09 COLUMN PROPORTION 0.25 0.23 0.10 Calculation of one of the entries: 35.66 = -0.96 x -0.93 x 0.25 x 0.20 x (n=)774

So what’s left? Note that the five biggest discrepancies between the residuals and the contingency hierarchy are in the third row and/or third column; these are consequently the focus of the second contingency hierarchy. However, the first contingency hierarchy accounts for 131.6 of the original chi-square statistic of 153.2 (i.e. 85.9%), leaving only 21.6 for the subsequent contingency hierarchies.