The Data Goldrush Andy WedelBodo Winter University of ArizonaUC Merced.

Slides:



Advertisements
Similar presentations
How do we know when we know. Outline  What is Research  Measurement  Method Types  Statistical Reasoning  Issues in Human Factors.
Advertisements

1 Health Warning! All may not be what it seems! These examples demonstrate both the importance of graphing data before analysing it and the effect of outliers.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Causal Comparative Research: Purpose
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Social Research Methods
Today Concepts underlying inferential statistics
Chapter 12 Inferring from the Data. Inferring from Data Estimation and Significance testing.
Statistics for the Social Sciences Psychology 340 Spring 2005 Course Review.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Richard M. Jacobs, OSA, Ph.D.
General Linear Model & Classical Inference
Inferential statistics Hypothesis testing. Questions statistics can help us answer Is the mean score (or variance) for a given population different from.
Choosing Statistical Procedures
© 2011 Pearson Prentice Hall, Salkind. Introducing Inferential Statistics.
Hypothesis Testing.
“I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it” Lord William Thomson, 1st.
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
Elementary Statistical Methods André L. Souza, Ph.D. The University of Alabama Lecture 22 Statistical Power.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
SS440 Seminar: Unit 4 Research in Psychopathology Dr. Angie Whalen Kaplan University 1.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Inferential Statistics Body of statistical computations relevant to making inferences from findings based on sample observations to some larger population.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Education 793 Class Notes Decisions, Error and Power Presentation 8.
Introduction to Statistics Osama A Samarkandi, PhD, RN BSc, GMD, BSN, MSN, NIAC Deanship of Skill development Dec. 2 nd -3 rd, 2013.
Power Analysis for Traditional and Modern Hypothesis Tests
What are informative answers to the question, ‘Why do zebras have stripes?’
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Copyright © 2011 Pearson Education, Inc. Putting Statistics to Work.
1 Chapter 13: Interpreting Research Results Describing Results Inferences in Behavioral Science Research Null Results Integrating Results of Research Summary.
CHAPTER OVERVIEW Say Hello to Inferential Statistics The Idea of Statistical Significance Significance Versus Meaningfulness Meta-analysis.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Analyzing Statistical Inferences July 30, Inferential Statistics? When? When you infer from a sample to a population Generalize sample results to.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
Making Comparisons All hypothesis testing follows a common logic of comparison Null hypothesis and alternative hypothesis – mutually exclusive – exhaustive.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
1 השוואות מרובות מדדי טעות, עוצמה, רווחי סמך סימולטניים ד"ר מרינה בוגומולוב מבוסס על ההרצאות של פרופ' יואב בנימיני ופרופ' מלכה גורפיין.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Appendix I A Refresher on some Statistical Terms and Tests.
Psychology Unit 1 Vocabulary. Unit 1 - Psychology 1. Applied research 2. Basic research 3. Biological perspective 4. Cognitive perspective 5. Functionalism.
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
Methods of Presenting and Interpreting Information Class 9.
Chapter 8 Introducing Inferential Statistics.
26134 Business Statistics Week 5 Tutorial
Dr.MUSTAQUE AHMED MBBS,MD(COMMUNITY MEDICINE), FELLOWSHIP IN HIV/AIDS
Hypothesis Tests: One Sample
Introduction to Design
Inference for Categorical Data
Significance and t testing
Do you know population SD? Use Z Test Are there only 2 groups to
Statistical Inference
I. Statistical Tests: Why do we use them? What do they involve?
Statistical Inference about Regression
Detecting evolutionary forces in language change (2017)
Research Methods Chapter 2.
RES 500 Academic Writing and Research Skills
CS 594: Empirical Methods in HCC Experimental Research in HCI (Part 1)
Presentation transcript:

The Data Goldrush Andy WedelBodo Winter University of ArizonaUC Merced

The data revolution Griffiths, T. L. (2015). Manifesto for a new (computational) cognitive revolution. Cognition, 135,

ml?_r=1& “Data is merely the raw material of knowledge”

The “data science” rebranding

Jeff Leek The “data science” rebranding

Structure of this course Day 1 (1) general statistical issues (2) cross-cultural correlational studies Day 2 Andy’s day! (phonetics & phonology) Day 3 Bodo’s day! (semantics & gesture) Day 4 data-driven language contact research

The course website bodowinter.com/goldrush

My one slide on inferential stats Sample Population inference assume null hypothesis reject null hypothesis p<0.05

Type I error = erroneously rejecting the null hypothesis demo My one slide on inferential stats cont’d

My one slide on regression RT ~ Noise

My one slide on regression

Correlations when-it-does/ ~

Examples of linguistic correlations Object-VerbVerb-Object Postpositions47242 Prepositions14456 Dryer, M.S. (2013). Relationship between the order of object and verb and the order of adposition and noun. WALS,

Examples of linguistic correlations ~

~ sound change Eckert, P. (1989). The whole woman: Sex and gender differences in variation. Language Variation and Change, 1, Labov, W. (1990). The interaction of sex and social class in the course of linguistic change. Language Variation and Change, 2,

Examples of linguistic correlations ~ Chen, M. K. (2012). The effect of language on economic behavior: Evidence from savings rates, health behaviors, and retirement assets. American Economic Review, 103, future tense marking

Examples of linguistic correlations ~ Everett, C. (2013). Evidence for direct geographic influences on linguistic sounds: the case of ejectives. PloS one, 8(6), e kʼ tʼ qʼ

~ Everett, C., Blasi, D. E., & Roberts, S. G. (2015). Climate, vocal folds, and tonal languages: Connecting the physiological and geographic dots. Proceedings of the National Academy of Sciences, 112, Examples of linguistic correlations

Two types of correlations … between linguistic features … between linguistic & non-linguistic features Ladd, D. R., Roberts, S. G., & Dediu, D. (2015). Correlational studies in typological and historical linguistics. Annual Review of Linguistics, 1,

Examples of correlations Ladd, D. R., Roberts, S. G., & Dediu, D. (2015). Correlational studies in typological and historical linguistics. Annual Review of Linguistics, 1,

Three statistical problems Multiple comparisons Correlation is not causation Lack of independence

Three statistical problems Multiple comparisons Correlation is not causation Lack of independence

Three statistical problems Correlation is not causation Roberts, S., & Winters, J. (2013). Linguistic diversity and traffic accidents: Lessons from statistical studies of cultural traits. PLoS one, 8(8), e70902.

Three statistical problems Correlation is not causation Christian Bentz Bentz, C., & Winter, B. (2013). Languages with more second language learners tend to lose nominal case. Language Dynamics & Change, 3:1, 1-27.

Three statistical problems Correlation is not causation L2 learningmorphology L2 speakersmorphology L2 speakersmorphology confound?

Three statistical problems Correlation is not causation L2 speakersmorphology 4. shared history 5. chance

when-it-does/ Three statistical problems Correlation is not causation

Three statistical problems Correlation is not causation The data must be strong. The data must be consistent. The data must be coherent. The data must be specific. The causal effect must be plausible. Steps to support causality when-it-does/

Three statistical problems Correlation is not causation Data Ladd, D. R., Roberts, S. G., & Dediu, D. (2015). Correlational studies in typological and historical linguistics. Annual Review of Linguistics, 1, correlations increase the probability of compatible models, they decrease the probability of incompatible ones Hypothesis

Three statistical problems Correlation is not causation Experiments do not provide fool-proof access to causality

Three statistical problems Correlation is not causation

Three statistical problems Multiple comparisons Correlation is not causation Lack of independence

Three statistical problems Multiple comparisons Correlation is not causation Lack of independence

Three statistical problems Multiple comparisons Bennett, C. M., Baird, A. A., Miller, M. B., & Wolford, G. L. (2011). Neural correlates of interspecies perspective taking in the post-mortem atlantic salmon: an argument for proper multiple comparisons correction. Journal of Serendipitous and Unexpected Results, 1, 1-5.

Three statistical problems Multiple comparisons Example of doing 100 tests: If α = 0.05 is taken as the significance level For 100 tests, the expected number of incorrect rejections of the null hypothesis is 5 The probability of at least one statistical result being significant is:

Three statistical problems Multiple comparisons What to do? (1) Correcting for multiple comparisons (e.g., Bonferroni correction) (2) Avoid multiplicity at the design stage (Bender & Lange, 2001) Bender, R., & Lange, S. (2001). Adjusting for multiple testing—when and how?. Journal of clinical epidemiology, 54(4),

Three statistical problems Multiple comparisons Correlation is not causation Lack of independence

Three statistical problems Multiple comparisons Correlation is not causation Lack of independence

Three statistical problems Lack of independence

Three statistical problems Lack of independence

Three statistical problems Lack of independence

Three statistical problems Lack of independence

Three statistical problems Lack of independence Sample 1 Sample 2 A simple Type I error simulation:

Three statistical problems Lack of independence (Source code on class github repo) unique_items <- rnorm(nitems) unique_subs <- rnorm(nsub) resp <- unique_items[items] + unique_subs[subjects] + rnorm(nsub*nitems) 1,000 simulations

Three statistical problems Lack of independence

Three statistical problems Lack of independence

Three statistical problems Lack of independence

Two sources of non-independence Lack of independence Language genealogy (Dryer, 1989, 1991, 1992; Cysouw, 2010; Jaeger et al., 2011) Language areas (Dryer, 1989, 2000; Maslova, 2000, Bickel, 2008; Cysouw, 2010; Jaeger et al., 2011) Bickel, B. (2008). A refined sampling procedure for genealogical control. STUF-Language Typology and Universals, 61(3), 221–233. Cysouw M. (2010). Dealing with diversity: Towards an explanation of NP-internal word order frequencies. Linguistic Typology, 14: 253–286. Dryer, M. (1989). Large linguistic areas and language sampling. Studies in Language, 13(2), 257–292. Dryer, M. (1991). SVO languages and the OV: VO typology. Journal of Linguistics, 27(2), 443–482. Dryer, M. (1992). The Greenbergian word order correlations. Language, 68(1), 81–138. Dryer, M. (2000). Counting genera vs. counting languages. Linguistic Typology, 4, 334–350. Jaeger, T. F., Graff, P., Croft, W., & Pontillo, D. (2011). Mixed effect models for genetic and areal dependencies in linguistic typology. Linguistic Typology, 15(2), Maslova, E. (2000). A dynamic approach to the verification of distributional universals. Linguistic Typology, 4(3), 307–333.

Roberts, S., & Winter, J., & Chen, K. (upcoming). Future tense and economic decisions: controlling for cultural evolution. PLOS ONE. Genealogical relationships

Roberts, S., & Winter, J., & Chen, K. (upcoming). Future tense and economic decisions: controlling for cultural evolution. PLOS ONE. Contact relationships

Roberts, S., & Winter, J., & Chen, K. (upcoming). Future tense and economic decisions: controlling for cultural evolution. PLOS ONE. Contact relationships

Mark Liberman Contact relationships

Two sources of non-independence Lack of independence Sean Roberts James Winters Roberts, S., & Winter, J., & Chen, K. (upcoming). Future tense and economic decisions: controlling for cultural evolution. PLOS ONE.

Two sources of non-independence Controlling for area and family within a mixed effects modeling framework: lmer(case ~ L2prop + (1+L2prop|family) + (1+L2prop|area)) Lack of independence

Three statistical problems Multiple comparisons Correlation is not causation Lack of independence