Register variation: correlation, clusters and factors

Slides:



Advertisements
Similar presentations
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Advertisements

Richard M. Jacobs, OSA, Ph.D.
Chapter 16: Correlation.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Cluster Analysis Hal Whitehead BIOL4062/5062. What is cluster analysis? Non-hierarchical cluster analysis –K-means Hierarchical divisive cluster analysis.
Multivariate Methods Pattern Recognition and Hypothesis Testing.
BHS Methods in Behavioral Sciences I April 18, 2003 Chapter 4 (Ray) – Descriptive Statistics.
Descriptive (Univariate) Statistics Percentages (frequencies) Ratios and Rates Measures of Central Tendency Measures of Variability Descriptive statistics.
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.
1 Basic statistics Week 10 Lecture 1. Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 2 Meanings.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
As with averages, researchers need to transform data into a form conducive to interpretation, comparisons, and statistical analysis measures of dispersion.
Social Research Methods
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
Standard Scores & Correlation. Review A frequency curve either normal or otherwise is simply a line graph of all frequency of scores earned in a data.
Chapter 21 Correlation. Correlation A measure of the strength of a linear relationship Although there are at least 6 methods for measuring correlation,
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
Psychometrics.
Statistics. Question Tell whether the following statement is true or false: Nominal measurement is the ranking of objects based on their relative standing.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 16 Descriptive Statistics.
Correlation.
Chapter 15 Correlation and Regression
Learning Objective Chapter 14 Correlation and Regression Analysis CHAPTER fourteen Correlation and Regression Analysis Copyright © 2000 by John Wiley &
Descriptive Statistics
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Descriptive Statistics
Investigating the Relationship between Scores
Examining Relationships in Quantitative Research
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Experimental Methods: Statistics & Correlation
Copyright © 2011, 2005, 1998, 1993 by Mosby, Inc., an affiliate of Elsevier Inc. Chapter 19: Statistical Analysis for Experimental-Type Research.
McGraw-Hill/Irwin Business Research Methods, 10eCopyright © 2008 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 18 Measures of Association.
Chapter 15: Correlation. Correlations: Measuring and Describing Relationships A correlation is a statistical method used to measure and describe the relationship.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Correlations: Linear Relationships Data What kind of measures are used? interval, ratio nominal Correlation Analysis: Pearson’s r (ordinal scales use Spearman’s.
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
Multivariate statistical methods Cluster analysis.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 3 Investigating the Relationship of Scores.
FACTOR ANALYSIS CLUSTER ANALYSIS Analyzing complex multidimensional patterns.
© 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.McGraw-Hill/Irwin 19-1 Chapter 19 Measures of Association.
Research Methods: 2 M.Sc. Physiotherapy/Podiatry/Pain Correlation and Regression.
Chapter 11 Summarizing & Reporting Descriptive Data.
LangTest: An easy-to-use stats calculator Punjaporn P.
Multivariate statistical methods
Chapter 12 Understanding Research Results: Description and Correlation
MATH-138 Elementary Statistics
Statistical tests for quantitative variables
Correlational Studies
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Univariate Statistics
Analyzing and Interpreting Quantitative Data
Experimental Methods: Statistics & Correlation
CHAPTER fourteen Correlation and Regression Analysis
Social Research Methods
Theme 7 Correlation.
Parametric and non parametric tests
Multivariate community analysis
Analyzing the Relationship Between Two Variables
STATISTICS Topic 1 IB Biology Miss Werba.
Dimension reduction : PCA and Clustering
Change over time: Working with diachronic data
Lexico-grammar: From simple counts to complex models
Introduction: Statistics meets corpus linguistics
Clustering The process of grouping samples so that the samples are similar within each group.
Presentation transcript:

Register variation: correlation, clusters and factors Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Think about and discuss Think about how language works. Is it more surprising to find that some linguistic features are related or that they are unrelated? Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Where to start? Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Relationships in corpora + - ~ Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Co-variance SD = measure of variation of a single variable in a corpus. Co-variance = measure of co-variation of two variables.

Co-variance (cont.) Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Correlation p = 0.033 0 no effect ± 0.1 small effect Correlation = standardised co-variance. 0 no effect ± 0.1 small effect ± 0.3 medium effect ± 0.5 large effect 0 no effect ± 0.1 small effect ± 0.3 medium effect ± 0.5 large effect p = 0.033 Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Correlation as effect size and significance test 0 no effect ± 0.1 small effect ± 0.3 medium effect ± 0.5 large effect Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

500 vs. 5000 NON-SIGNIFICANT r = -.029; p = .52; 95% CI [-.116,.059] Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Pearson’s and Spearman’s correlations Pearson’s correlation: r Spearman’s correlation: rs, rho, ρ Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Visualizing correlation Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Hierarchical agglomerative cluster analysis hierarchical tree plot (or dendrogram) Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Cluster analysis: Distance B A B A Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Cluster analysis: Linking Q: Which of the data points inside a small cluster should be taken as representing the position of the whole cluster? The closest point to the neighbouring cluster with which we want to merge our original cluster (so called SLINK method) The furthest point to the neighbouring cluster with which we want to merge our original cluster (so-called CLINK method) None, the mutual distances of all data points are considered by taking their mean value (average linkage method) None, mutual distances of all data points are considered by calculating the sum of squared distances (Ward’s method). Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

passives … nouns verbs pronouns past tense modality negation Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

67 Biber variables Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Data Individual text design Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Factor analysis Complex mathematical procedure that reduces a large number of linguistic variables to a small number of factors, each combining multiple linguistic variables. This is done by considering correlations between variables; those that correlate – both positively and negatively – are considered components of the same factor because they have a connection.

Factors Factor 2 Factor 1 Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Factors (cont.) Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Factors > Dimensions INVOLVED INFORMATIONAL Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Placement of registers INVOLVED INFORMATIONAL Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Things to remember Correlations are used for the investigation of the relationship between two variables at a time. Pearson’s correlation is suitable for scale variables, while Spearman’s correlation assumes ordinal variables (ranks). Spearman’s correlation can also be used with scale variables if the means as the measures of central tendency do not represent the data well (extremely skewed distributions). Hierarchical agglomerative cluster analysis is used for classification of words, texts, registers etc. The result of this analysis is a tree plot (dendrogram). The most complex type of analysis out of the three discussed in this chapter is multidimensional analysis (MD). MD analyses a large number of linguistics variables and reduces them to a small number of factors which are interpreted as dimensions of variation. Along these dimensions, different registers can be placed. Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.