Download presentation
Presentation is loading. Please wait.
Published byShauna Francis Modified over 9 years ago
1
The Data Goldrush Andy WedelBodo Winter University of ArizonaUC Merced
2
The data revolution Griffiths, T. L. (2015). Manifesto for a new (computational) cognitive revolution. Cognition, 135, 21-23.
3
http://www.nytimes.com/2009/08/06/technology/06stats.ht ml?_r=1& “Data is merely the raw material of knowledge”
4
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram The “data science” rebranding
5
Jeff Leek The “data science” rebranding http://simplystatistics.org/2013/12/12/the-key-word-in-data-science-is-not-data-it-is-science/
6
Structure of this course Day 1 (1) general statistical issues (2) cross-cultural correlational studies Day 2 Andy’s day! (phonetics & phonology) Day 3 Bodo’s day! (semantics & gesture) Day 4 data-driven language contact research
7
The course website bodowinter.com/goldrush
9
https://www.coursera.org/course/datascitoolbox My one slide on inferential stats Sample Population inference assume null hypothesis reject null hypothesis p<0.05
10
Type I error = erroneously rejecting the null hypothesis demo My one slide on inferential stats cont’d
11
My one slide on regression RT ~ Noise
12
My one slide on regression
13
Correlations http://www.skepticalraptor.com/skepticalraptorblog.php/correlation-does-not-imply-causation-except- when-it-does/ ~
14
Examples of linguistic correlations Object-VerbVerb-Object Postpositions47242 Prepositions14456 Dryer, M.S. (2013). Relationship between the order of object and verb and the order of adposition and noun. WALS, http://wals.info/chapter/95
15
Examples of linguistic correlations ~
16
~ sound change Eckert, P. (1989). The whole woman: Sex and gender differences in variation. Language Variation and Change, 1, 245-67. Labov, W. (1990). The interaction of sex and social class in the course of linguistic change. Language Variation and Change, 2, 205-254.
17
Examples of linguistic correlations ~ Chen, M. K. (2012). The effect of language on economic behavior: Evidence from savings rates, health behaviors, and retirement assets. American Economic Review, 103, 690-731. future tense marking
18
Examples of linguistic correlations ~ Everett, C. (2013). Evidence for direct geographic influences on linguistic sounds: the case of ejectives. PloS one, 8(6), e65275. kʼ tʼ qʼ
19
~ Everett, C., Blasi, D. E., & Roberts, S. G. (2015). Climate, vocal folds, and tonal languages: Connecting the physiological and geographic dots. Proceedings of the National Academy of Sciences, 112, 1322-1327. Examples of linguistic correlations
20
Two types of correlations … between linguistic features … between linguistic & non-linguistic features Ladd, D. R., Roberts, S. G., & Dediu, D. (2015). Correlational studies in typological and historical linguistics. Annual Review of Linguistics, 1, 221-241.
21
Examples of correlations Ladd, D. R., Roberts, S. G., & Dediu, D. (2015). Correlational studies in typological and historical linguistics. Annual Review of Linguistics, 1, 221-241.
22
Three statistical problems Multiple comparisons Correlation is not causation Lack of independence
23
Three statistical problems Multiple comparisons Correlation is not causation Lack of independence
24
Three statistical problems Correlation is not causation Roberts, S., & Winters, J. (2013). Linguistic diversity and traffic accidents: Lessons from statistical studies of cultural traits. PLoS one, 8(8), e70902.
25
Three statistical problems Correlation is not causation Christian Bentz Bentz, C., & Winter, B. (2013). Languages with more second language learners tend to lose nominal case. Language Dynamics & Change, 3:1, 1-27.
26
Three statistical problems Correlation is not causation L2 learningmorphology L2 speakersmorphology L2 speakersmorphology confound? 1. 2. 3.
27
Three statistical problems Correlation is not causation L2 speakersmorphology 4. shared history 5. chance
28
http://www.skepticalraptor.com/skepticalraptorblog.php/correlation-does-not-imply-causation-except- when-it-does/ Three statistical problems Correlation is not causation
29
Three statistical problems Correlation is not causation The data must be strong. The data must be consistent. The data must be coherent. The data must be specific. The causal effect must be plausible. Steps to support causality http://www.skepticalraptor.com/skepticalraptorblog.php/correlation-does-not-imply-causation-except- when-it-does/
30
Three statistical problems Correlation is not causation Data Ladd, D. R., Roberts, S. G., & Dediu, D. (2015). Correlational studies in typological and historical linguistics. Annual Review of Linguistics, 1, 221-241. correlations increase the probability of compatible models, they decrease the probability of incompatible ones Hypothesis
31
Three statistical problems Correlation is not causation Experiments do not provide fool-proof access to causality
32
Three statistical problems Correlation is not causation http://blogs.discovermagazine.com/neuroskeptic/2015/05/24/fmri-of-the-amygdala-all-in-vein/
33
Three statistical problems Multiple comparisons Correlation is not causation Lack of independence
34
Three statistical problems Multiple comparisons Correlation is not causation Lack of independence
35
Three statistical problems Multiple comparisons https://www.sciencenews.org/article/trawling-brain Bennett, C. M., Baird, A. A., Miller, M. B., & Wolford, G. L. (2011). Neural correlates of interspecies perspective taking in the post-mortem atlantic salmon: an argument for proper multiple comparisons correction. Journal of Serendipitous and Unexpected Results, 1, 1-5.
36
https://xkcd.com/882/
39
Three statistical problems Multiple comparisons Example of doing 100 tests: If α = 0.05 is taken as the significance level For 100 tests, the expected number of incorrect rejections of the null hypothesis is 5 The probability of at least one statistical result being significant is:
40
Three statistical problems Multiple comparisons What to do? (1) Correcting for multiple comparisons (e.g., Bonferroni correction) (2) Avoid multiplicity at the design stage (Bender & Lange, 2001) Bender, R., & Lange, S. (2001). Adjusting for multiple testing—when and how?. Journal of clinical epidemiology, 54(4), 343-349.
41
Three statistical problems Multiple comparisons Correlation is not causation Lack of independence
42
Three statistical problems Multiple comparisons Correlation is not causation Lack of independence
43
Three statistical problems Lack of independence
44
Three statistical problems Lack of independence
45
Three statistical problems Lack of independence
46
Three statistical problems Lack of independence
47
Three statistical problems Lack of independence Sample 1 Sample 2 A simple Type I error simulation:
48
Three statistical problems Lack of independence (Source code on class github repo) unique_items <- rnorm(nitems) unique_subs <- rnorm(nsub) resp <- unique_items[items] + unique_subs[subjects] + rnorm(nsub*nitems) 1,000 simulations
49
Three statistical problems Lack of independence
50
Three statistical problems Lack of independence
51
Three statistical problems Lack of independence
52
Two sources of non-independence Lack of independence Language genealogy (Dryer, 1989, 1991, 1992; Cysouw, 2010; Jaeger et al., 2011) Language areas (Dryer, 1989, 2000; Maslova, 2000, Bickel, 2008; Cysouw, 2010; Jaeger et al., 2011) Bickel, B. (2008). A refined sampling procedure for genealogical control. STUF-Language Typology and Universals, 61(3), 221–233. Cysouw M. (2010). Dealing with diversity: Towards an explanation of NP-internal word order frequencies. Linguistic Typology, 14: 253–286. Dryer, M. (1989). Large linguistic areas and language sampling. Studies in Language, 13(2), 257–292. Dryer, M. (1991). SVO languages and the OV: VO typology. Journal of Linguistics, 27(2), 443–482. Dryer, M. (1992). The Greenbergian word order correlations. Language, 68(1), 81–138. Dryer, M. (2000). Counting genera vs. counting languages. Linguistic Typology, 4, 334–350. Jaeger, T. F., Graff, P., Croft, W., & Pontillo, D. (2011). Mixed effect models for genetic and areal dependencies in linguistic typology. Linguistic Typology, 15(2), 281-320. Maslova, E. (2000). A dynamic approach to the verification of distributional universals. Linguistic Typology, 4(3), 307–333.
53
Roberts, S., & Winter, J., & Chen, K. (upcoming). Future tense and economic decisions: controlling for cultural evolution. PLOS ONE. Genealogical relationships
54
Roberts, S., & Winter, J., & Chen, K. (upcoming). Future tense and economic decisions: controlling for cultural evolution. PLOS ONE. Contact relationships
55
Roberts, S., & Winter, J., & Chen, K. (upcoming). Future tense and economic decisions: controlling for cultural evolution. PLOS ONE. Contact relationships
56
Mark Liberman http://languagelog.ldc.upenn.edu/nll/?p=3764 Contact relationships
57
Two sources of non-independence Lack of independence Sean Roberts James Winters Roberts, S., & Winter, J., & Chen, K. (upcoming). Future tense and economic decisions: controlling for cultural evolution. PLOS ONE.
58
Two sources of non-independence Controlling for area and family within a mixed effects modeling framework: lmer(case ~ L2prop + (1+L2prop|family) + (1+L2prop|area)) Lack of independence
59
Three statistical problems Multiple comparisons Correlation is not causation Lack of independence
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.