Presentation is loading. Please wait.

Presentation is loading. Please wait.

Michael C. Frank Stanford University

Similar presentations


Presentation on theme: "Michael C. Frank Stanford University"— Presentation transcript:

1 Michael C. Frank Stanford University
The role of data sharing in studying language learning: WordBank and childes-db Michael C. Frank Stanford University

2 The original “big data” for child language
… in a shared research environment with open tools and resources open Childes created 1984

3 An explosion of language
18 mo: “happy-b” 19 mo: “blue ball” 23 mo: “spike doggy no food eat dirt” 26 mo: “dada move own body, my need lilbit more space” (spike the dog doesn’t eat food, he eats dirt”)

4 Frank, Goodman, & Tenenbaum (2008), NIPS
“doggie” > doggie Frank, Goodman, & Tenenbaum (2008), NIPS McMurray et al. (2012), Psych Review Fazly et al. (2010), Cognitive Science Understanding this process will require quantitative theories of how learning operates over children’s perceptual input and within their cognitive limitations. In much of my previous work I’ve attempted to develop probabilistic models that instantiate these quantitative theories.

5 Evaluating learning models
scientific hypothesis about learning Learning Model Input predictors Outcomes A handful of binary experimental results 20 min annotated video Theory building requires more data!

6 Small scale experiments may not be replicable
Many folks here are likely aware of the recent study by the Open Science Collaboration. This study reported the results from 100 independent replications of high-profile studies from 2008. Disappointingly, fewer than half of these replications (by a variety of criteria) produced the same result as the original. I am proud that four of these replications were contributed by students in the first iteration of my graduate lab class. The open science collaboration paper is an important study, but I’ve come to believe that its interpretation is more limited than the interpretations that some have given. In the remainder of this talk, I want to lay out some of the ways that I and my students and collaborators have been thinking about this kind of cumulative enterprise. Open Science Collaboration (2015)

7 Even “simple” analyses may not be reproducible
Sampled 35 articles at Cognition. 13 reproducible from data. The other 22: Even “simple” analyses may not be reproducible Hardwicke et al. (2018), Royal Soc Open Sci

8 The MacArthur-Bates Communicative Development Inventory (CDI)
Spanish French Polish Slovak Japanese The MacArthur-Bates Communicative Development Inventory (CDI)

9 Frank et al. (2016), Journal of Child Language
Frank et al. (2016), Journal of Child Language

10 Cross-linguistic generalizations about early language
The first words are very consistent across languages Across children, variability is a constant in early language There is a noun bias in early language, but verbs vary by language The growth of grammar is linked to vocabulary growth Hi!

11 https://langcog.github.io/wordbank-book

12 A framework for evaluating learning models
Outcomes Learning Model Input predictors

13 Predicting when words are learned
Braginsky et al. (in press), Open Mind

14 Predictors (Mapped across languages through hand-checked translation equivalents) Form: number of phonemes Meaning Concreteness (Brysbaert, Warriner, & Kuperman, 2013) Arousal & valence (Warriner, Kuperman, & Brysbaert, 2013) Babiness (Perry, Perlman, & Lupyan, 2015) Input Need corpora of child-directed language… Introducing….

15 Sanchez*, Meylan* et al. (2016), Behavior Research Methods

16 Predictors of production
A “first pass” predictive model – baseline for future modeling work Coefficient Estimate Braginsky et al. (in press), Open Mind

17 Consistency across languages
Average correlation (r) between predictors random baseline Braginsky et al. (in press), Open Mind

18 Lexical networks Edges derived from semantic or phonological features
(can be word embeddings as well) Fourtassi, Bian, & Frank (2018; under review)

19 Adding network predictors
Mixed effects regression, with language as a random effect PAC = preferential acquisition (growth based on full network) 10 language sample Fourtassi, Bian, & Frank (2018; under review)

20 A framework for evaluating learning models
Input predictors Outcomes Learning Model An invitation to develop new models!

21 Klaus W. Jacobs Foundation
Klaus W. Jacobs Foundation


Download ppt "Michael C. Frank Stanford University"

Similar presentations


Ads by Google