Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Data Goldrush Andy WedelBodo Winter University of ArizonaUC Merced.

Similar presentations


Presentation on theme: "The Data Goldrush Andy WedelBodo Winter University of ArizonaUC Merced."— Presentation transcript:

1 The Data Goldrush Andy WedelBodo Winter University of ArizonaUC Merced

2 The data revolution Griffiths, T. L. (2015). Manifesto for a new (computational) cognitive revolution. Cognition, 135, 21-23.

3 http://www.nytimes.com/2009/08/06/technology/06stats.ht ml?_r=1& “Data is merely the raw material of knowledge”

4 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram The “data science” rebranding

5 Jeff Leek The “data science” rebranding http://simplystatistics.org/2013/12/12/the-key-word-in-data-science-is-not-data-it-is-science/

6 Structure of this course Day 1 (1) general statistical issues (2) cross-cultural correlational studies Day 2 Andy’s day! (phonetics & phonology) Day 3 Bodo’s day! (semantics & gesture) Day 4 data-driven language contact research

7 The course website bodowinter.com/goldrush

8

9 https://www.coursera.org/course/datascitoolbox My one slide on inferential stats Sample Population inference assume null hypothesis reject null hypothesis p<0.05

10 Type I error = erroneously rejecting the null hypothesis demo My one slide on inferential stats cont’d

11 My one slide on regression RT ~ Noise

12 My one slide on regression

13 Correlations http://www.skepticalraptor.com/skepticalraptorblog.php/correlation-does-not-imply-causation-except- when-it-does/ ~

14 Examples of linguistic correlations Object-VerbVerb-Object Postpositions47242 Prepositions14456 Dryer, M.S. (2013). Relationship between the order of object and verb and the order of adposition and noun. WALS, http://wals.info/chapter/95

15 Examples of linguistic correlations ~

16 ~ sound change Eckert, P. (1989). The whole woman: Sex and gender differences in variation. Language Variation and Change, 1, 245-67. Labov, W. (1990). The interaction of sex and social class in the course of linguistic change. Language Variation and Change, 2, 205-254.

17 Examples of linguistic correlations ~ Chen, M. K. (2012). The effect of language on economic behavior: Evidence from savings rates, health behaviors, and retirement assets. American Economic Review, 103, 690-731. future tense marking

18 Examples of linguistic correlations ~ Everett, C. (2013). Evidence for direct geographic influences on linguistic sounds: the case of ejectives. PloS one, 8(6), e65275. kʼ tʼ qʼ

19 ~ Everett, C., Blasi, D. E., & Roberts, S. G. (2015). Climate, vocal folds, and tonal languages: Connecting the physiological and geographic dots. Proceedings of the National Academy of Sciences, 112, 1322-1327. Examples of linguistic correlations

20 Two types of correlations … between linguistic features … between linguistic & non-linguistic features Ladd, D. R., Roberts, S. G., & Dediu, D. (2015). Correlational studies in typological and historical linguistics. Annual Review of Linguistics, 1, 221-241.

21 Examples of correlations Ladd, D. R., Roberts, S. G., & Dediu, D. (2015). Correlational studies in typological and historical linguistics. Annual Review of Linguistics, 1, 221-241.

22 Three statistical problems Multiple comparisons Correlation is not causation Lack of independence

23 Three statistical problems Multiple comparisons Correlation is not causation Lack of independence

24 Three statistical problems Correlation is not causation Roberts, S., & Winters, J. (2013). Linguistic diversity and traffic accidents: Lessons from statistical studies of cultural traits. PLoS one, 8(8), e70902.

25 Three statistical problems Correlation is not causation Christian Bentz Bentz, C., & Winter, B. (2013). Languages with more second language learners tend to lose nominal case. Language Dynamics & Change, 3:1, 1-27.

26 Three statistical problems Correlation is not causation L2 learningmorphology L2 speakersmorphology L2 speakersmorphology confound? 1. 2. 3.

27 Three statistical problems Correlation is not causation L2 speakersmorphology 4. shared history 5. chance

28 http://www.skepticalraptor.com/skepticalraptorblog.php/correlation-does-not-imply-causation-except- when-it-does/ Three statistical problems Correlation is not causation

29 Three statistical problems Correlation is not causation The data must be strong. The data must be consistent. The data must be coherent. The data must be specific. The causal effect must be plausible. Steps to support causality http://www.skepticalraptor.com/skepticalraptorblog.php/correlation-does-not-imply-causation-except- when-it-does/

30 Three statistical problems Correlation is not causation Data Ladd, D. R., Roberts, S. G., & Dediu, D. (2015). Correlational studies in typological and historical linguistics. Annual Review of Linguistics, 1, 221-241. correlations increase the probability of compatible models, they decrease the probability of incompatible ones Hypothesis

31 Three statistical problems Correlation is not causation Experiments do not provide fool-proof access to causality

32 Three statistical problems Correlation is not causation http://blogs.discovermagazine.com/neuroskeptic/2015/05/24/fmri-of-the-amygdala-all-in-vein/

33 Three statistical problems Multiple comparisons Correlation is not causation Lack of independence

34 Three statistical problems Multiple comparisons Correlation is not causation Lack of independence

35 Three statistical problems Multiple comparisons https://www.sciencenews.org/article/trawling-brain Bennett, C. M., Baird, A. A., Miller, M. B., & Wolford, G. L. (2011). Neural correlates of interspecies perspective taking in the post-mortem atlantic salmon: an argument for proper multiple comparisons correction. Journal of Serendipitous and Unexpected Results, 1, 1-5.

36 https://xkcd.com/882/

37

38

39 Three statistical problems Multiple comparisons Example of doing 100 tests: If α = 0.05 is taken as the significance level For 100 tests, the expected number of incorrect rejections of the null hypothesis is 5 The probability of at least one statistical result being significant is:

40 Three statistical problems Multiple comparisons What to do? (1) Correcting for multiple comparisons (e.g., Bonferroni correction) (2) Avoid multiplicity at the design stage (Bender & Lange, 2001) Bender, R., & Lange, S. (2001). Adjusting for multiple testing—when and how?. Journal of clinical epidemiology, 54(4), 343-349.

41 Three statistical problems Multiple comparisons Correlation is not causation Lack of independence

42 Three statistical problems Multiple comparisons Correlation is not causation Lack of independence

43 Three statistical problems Lack of independence

44 Three statistical problems Lack of independence

45 Three statistical problems Lack of independence

46 Three statistical problems Lack of independence

47 Three statistical problems Lack of independence Sample 1 Sample 2 A simple Type I error simulation:

48 Three statistical problems Lack of independence (Source code on class github repo) unique_items <- rnorm(nitems) unique_subs <- rnorm(nsub) resp <- unique_items[items] + unique_subs[subjects] + rnorm(nsub*nitems) 1,000 simulations

49 Three statistical problems Lack of independence

50 Three statistical problems Lack of independence

51 Three statistical problems Lack of independence

52 Two sources of non-independence Lack of independence Language genealogy (Dryer, 1989, 1991, 1992; Cysouw, 2010; Jaeger et al., 2011) Language areas (Dryer, 1989, 2000; Maslova, 2000, Bickel, 2008; Cysouw, 2010; Jaeger et al., 2011) Bickel, B. (2008). A refined sampling procedure for genealogical control. STUF-Language Typology and Universals, 61(3), 221–233. Cysouw M. (2010). Dealing with diversity: Towards an explanation of NP-internal word order frequencies. Linguistic Typology, 14: 253–286. Dryer, M. (1989). Large linguistic areas and language sampling. Studies in Language, 13(2), 257–292. Dryer, M. (1991). SVO languages and the OV: VO typology. Journal of Linguistics, 27(2), 443–482. Dryer, M. (1992). The Greenbergian word order correlations. Language, 68(1), 81–138. Dryer, M. (2000). Counting genera vs. counting languages. Linguistic Typology, 4, 334–350. Jaeger, T. F., Graff, P., Croft, W., & Pontillo, D. (2011). Mixed effect models for genetic and areal dependencies in linguistic typology. Linguistic Typology, 15(2), 281-320. Maslova, E. (2000). A dynamic approach to the verification of distributional universals. Linguistic Typology, 4(3), 307–333.

53 Roberts, S., & Winter, J., & Chen, K. (upcoming). Future tense and economic decisions: controlling for cultural evolution. PLOS ONE. Genealogical relationships

54 Roberts, S., & Winter, J., & Chen, K. (upcoming). Future tense and economic decisions: controlling for cultural evolution. PLOS ONE. Contact relationships

55 Roberts, S., & Winter, J., & Chen, K. (upcoming). Future tense and economic decisions: controlling for cultural evolution. PLOS ONE. Contact relationships

56 Mark Liberman http://languagelog.ldc.upenn.edu/nll/?p=3764 Contact relationships

57 Two sources of non-independence Lack of independence Sean Roberts James Winters Roberts, S., & Winter, J., & Chen, K. (upcoming). Future tense and economic decisions: controlling for cultural evolution. PLOS ONE.

58 Two sources of non-independence Controlling for area and family within a mixed effects modeling framework: lmer(case ~ L2prop + (1+L2prop|family) + (1+L2prop|area)) Lack of independence

59 Three statistical problems Multiple comparisons Correlation is not causation Lack of independence

60


Download ppt "The Data Goldrush Andy WedelBodo Winter University of ArizonaUC Merced."

Similar presentations


Ads by Google