Some thoughts on modelling phonetic effects in corpora.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Phonology, part 7: Rule Types + Ordering
Objectives 10.1 Simple linear regression
ANOVA and Linear Models. Data Data is from the University of York project on variation in British liquids. Data is from the University of York project.
Infant sensitivity to distributional information can affect phonetic discrimination Jessica Maye, Janet F. Werker, LouAnn Gerken A brief article from Cognition.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
PHONETICS AND PHONOLOGY
General Problems  Foreign language speakers of a target language cause a great difficulty to native speakers because the sounds they produce seems very.
Smith/Davis (c) 2005 Prentice Hall Chapter Thirteen Inferential Tests of Significance II: Analyzing and Interpreting Experiments with More than Two Groups.
General Linear Model General Linear Model Generalized Linear Model Generalized Linear Model Generalized Linear Mixed Model.
Phonetic Similarity Effects in Masked Priming Marja-Liisa Mailend 1, Edwin Maas 1, & Kenneth I. Forster 2 1 Department of Speech, Language, and Hearing.
Development of coarticulatory patterns in spontaneous speech Melinda Fricke Keith Johnson University of California, Berkeley.
MAE 552 Heuristic Optimization
Experimental Design.
Non-Experimental designs: Developmental designs & Small-N designs
Chapter 7 Conducting & Reading Research Baumgartner et al Chapter 8 Experimental Research.
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
Biol 500: basic statistics
Overlooking Stimulus Variance Jake Westfall University of Colorado Boulder Charles M. Judd David A. Kenny University of Colorado BoulderUniversity of Connecticut.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Introduction to Multilevel Modeling Using SPSS
Chapter 11 Simple Regression
+ Controlled User studies HCI /6610 Winter 2013.
Hypothesis testing Dr David Field
Extension to ANOVA From t to F. Review Comparisons of samples involving t-tests are restricted to the two-sample domain Comparisons of samples involving.
ANOVA Greg C Elvers.
Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus.
Single-Factor Experimental Designs
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Today: Our process Assignment 3 Q&A Concept of Control Reading: Framework for Hybrid Experiments Sampling If time, get a start on True Experiments: Single-Factor.
Control in Experimentation & Achieving Constancy Chapters 7 & 8.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Intermediate Applied Statistics STAT 460 Lecture 17, 11/10/2004 Instructor: Aleksandra (Seša) Slavković TA: Wang Yu
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Experimental Design Econ 176, Fall Some Terminology Session: A single meeting at which observations are made on a group of subjects. Experiment:
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
ANCOVA. What is Analysis of Covariance? When you think of Ancova, you should think of sequential regression, because really that’s all it is Covariate(s)
General Linear Model.
1 Psych 5510/6510 Chapter 14 Repeated Measures ANOVA: Models with Nonindependent Errors Part 1 (Crossed Designs) Spring, 2009.
Logistic Regression Correlation, ANOVA, t-test, chi-square have numeric dependent variables E.g. test score, number of words in corpus, F2, reaction.
Comparing Two Means Chapter 9. Experiments Simple experiments – One IV that’s categorical (two levels!) – One DV that’s interval/ratio/continuous – For.
FIXED AND RANDOM EFFECTS IN HLM. Fixed effects produce constant impact on DV. Random effects produce variable impact on DV. F IXED VS RANDOM EFFECTS.
Finishing up: Statistics & Developmental designs Psych 231: Research Methods in Psychology.
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
Smith/Davis (c) 2005 Prentice Hall Chapter Fifteen Inferential Tests of Significance III: Analyzing and Interpreting Experiments with Multiple Independent.
Copyright c 2001 The McGraw-Hill Companies, Inc.1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent variable.
Statistics 3: mixed effect models Install R library lme4 to your computer: 1.R -> Packages -> Install packages 2.Choose mirror 3.Choose lme4 4.Open the.
1 Psych 5510/6510 Chapter 14 Repeated Measures ANOVA: Models with Nonindependent ERRORs Part 2 (Crossed Designs) Spring, 2009.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Independent Samples ANOVA. Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.The Equal Variance Assumption 3.Cumulative.
LECTURE 16: BEYOND LINEARITY PT. 1 March 28, 2016 SDS 293 Machine Learning.
PS Research Methods I with Kimberly Maring Unit 9 – Experimental Research Chapter 6 of our text: Zechmeister, J. S., Zechmeister, E. B., & Shaughnessy,
Inferential Statistics Psych 231: Research Methods in Psychology.
Lesson 5. Lesson 5 Extraneous variables Extraneous variable (EV) is a general term for any variable, other than the IV, that might affect the results.
CHAPTER 4 Designing Studies
I271b Quantitative Methods
I. Statistical Tests: Why do we use them? What do they involve?
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Checking Assumptions Primary Assumptions Secondary Assumptions
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Presentation transcript:

Some thoughts on modelling phonetic effects in corpora

Paul CarterUniversity of Sheffield University of Leeds (and once of the University of York) sheffield leeds

Some thoughts on modelling phonetic effects in corpora we’ve seen how individual participants may have individual (random) effects on reaction times etc. someone might be quick someone else might not have got much sleep someone else might be tested on a day like today

Some thoughts on modelling phonetic effects in corpora

different-sized vocal tracts

Brief sketch of 3 papers I’ve been involved in: with John Local at York with Leendert Plug at Leeds with Emma Moore at Sheffield

Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): Laboratory experimental work control balance independent predictors

Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): [l] and [ ɹ ] in 2 nonrhotic varieties of British English ‘clear’ and ‘dark’ [l] also ‘clear’ and ‘dark’ [ ɹ ]? F2 as acoustic correlate of clear/dark

Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): monosyllabic and disyllabic words in word lists initial [l] versus initial [ ɹ ] initial [l] versus final [l] medial [l] versus medial [ ɹ ] lead vs reed lead vs deal believe vs bereave belly vs berry

Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): Original paper: repeated measures ANOVA Normalisation: separate ANOVAs for each gender Hz transformed into ERB-rate between-subjects variety within-subjects position, liquid, etc.

Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): Replication with linear mixed-effects models, allowing random intercepts and slopes allows for avoidance of over-generalisation by gender and better normalisation

Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): initial liquids initial liquids

Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): Carter & Local found large liquid x variety interaction for each gender; also small main effect for variety for female speakers only now I can be even more sure of the liquid x variety interaction; it also seems that there are main effects for variety, gender and liquid (no interaction for variety x gender) initial liquids initial liquids

Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): laterals

Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): Carter & Local found main effects for variety and position; also a variety x position interaction now I can be even more sure of the variety x position interaction but the main effects turn out not to be significant (for Leeds female speakers the post hoc tests in Carter & Local hinted at this) laterals this eases a little theoretical puzzle: why do dark laterals get even darker in syllable rimes?

Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): medial liquids (ws) medial liquids (ws)

Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): medial liquids (sw) medial liquids (sw)

Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): Carter & Local found main effects for liquid and prosodic structure; there were liquid x variety and liquid x prosodic structure interactions now I can be sure of the liquid main effect but there is a main effect for variety (and possibly also gender) rather than prosodic structure; there are interactions as follows: liquid x variety, liquid x prosodic structure and (additionally) liquid x variety x prosodic structure medial liquids medial liquids (this doesn’t solve a theoretical problem of syllabification)

That’s laboratory work; corpus work is different…

Leendert Plug & Paul Carter (2014) Timing and tempo in spontaneous phonological error repair. Journal of Phonetics 45: when speakers make errors and then self-repair, what predicts how quickly they will start the repair and how fast they produce it?

Leendert Plug & Paul Carter (2014) Timing and tempo in spontaneous phonological error repair. Journal of Phonetics 45: We started with control variables then tested expanded models with likelihood ratio tests perhaps not so well supported theoretically (fishing) but we were fishing – in the sense that we wanted to discover which of several similar predictors worked best

Leendert Plug & Paul Carter (2014) Timing and tempo in spontaneous phonological error repair. Journal of Phonetics 45: In similar work, collinearity meant similar predictor variables having significant main effects but in opposite directions clearly a spurious result

Leendert Plug & Paul Carter (2014) Timing and tempo in spontaneous phonological error repair. Journal of Phonetics 45: Our solution was to use conditional inference regression trees to show the structure in the data and conditional variable importance based on random forests to see which predictors mattered most library(party) myformula=DV~IV 1 +IV 2 +…+IV n mytree<-ctree(myformula,data=mydata) myforest=cforest(myformula,data=mydata) myvarimp=varimp(myforest,conditional=TRUE) see references to Strobl et al in Plug & Carter (2014) and Moore & Carter (2015)

Leendert Plug & Paul Carter (2014) Timing and tempo in spontaneous phonological error repair. Journal of Phonetics 45: lost the random effect structure (perhaps didn’t matter) but more robust for missing data and able to cope with correlated variables provided some support for our LME models help to decide which of several similar variables should be a predictor in a model (could have used data reduction methods, e.g. principal components analysis – but part of the point was to identify the best predictors)

Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): TRAP and BATH vowels in Isles of Scilly archive data (can’t get any more!) Scilly speakers, some educated only on the islands, some educated partly on the mainland compared to mainland Cornwall speakers and (sort-of) RP speakers

Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): hides some issues, e.g. imbalance N TRAP =2469 N BATH =345

Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): Again, we wanted to allow for other influences on the formants, e.g.: duration of the vowel (achieving target?) manner of articulation of following consonant (nasals can muck things up) voicing of the following consonant (also has massive effect on duration)

Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): Problem:despite 2814 observations, not enough data this is the nature of corpora: we can’t predict what will appear

many things will be vanishingly rare Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): this is the nature of corpora: Problem:despite 2814 observations, not enough data

Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): this is the nature of corpora: many things will be absent Problem:despite 2814 observations, not enough data

Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): this is the nature of corpora: the more potential predictors, the better the chance of missing cells Problem:despite 2814 observations, not enough data

frequency effects not typically incorporated in laboratory experimental design Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): this is the nature of corpora: Problem:despite 2814 observations, not enough data

Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): So we used conditional inference regression trees and conditional variable importance from random forests of trees in tandem with mixed-effects modelling mixed-effects where we could make models which met the assumptions of the technique; variable importance in random forests where we knew there was collinearity, etc. (e.g. manner of articulation of following consonant with lexical set)

So, mixed-effects models with full random effects structure won’t always work in corpora When they seem to work they may be hard to interpret There are potential statistical solutions involving data reduction techniques There are also possible alternatives to attenuate precisely the problems corpora pose – e.g. conditional variable importance can cope with missing cells, imbalance and highly-correlated predictors

Here’s the snow again