The statistical analysis of acoustic correlates of speech rhythm.

Slides:



Advertisements
Similar presentations
Biomedical Statistics Testing for Normality and Symmetry Teacher:Jang-Zern Tsai ( 蔡章仁 ) Student: 邱瑋國.
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Chapter 9 Hypothesis Testing Understandable Statistics Ninth Edition
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Sampling: Final and Initial Sample Size Determination
Languages’ rhythm and language acquisition Franck Ramus Laboratoire de Sciences Cognitives et Psycholinguistique, Paris Jacques Mehler, Marina Nespor,
Automatic identification of vocalic intervals in speech signal Jesus Garcia Antonio Galves Flaviane Fernandes Janaisa Viscardi Ulrike Gut Phonological.
Sonority as a Basis for Rhythmic Class Discrimination Antonio Galves, USP. Jesus Garcia, USP. Denise Duarte, USP and UFGo. Charlotte Galves, UNICAMP.
Chapter Seventeen HYPOTHESIS TESTING
Elementary hypothesis testing
Sample size computations Petter Mostad
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Today Today: Chapter 10 Sections from Chapter 10: Recommended Questions: 10.1, 10.2, 10-8, 10-10, 10.17,
Topic 2: Statistical Concepts and Market Returns
Chapter 11: Inference for Distributions
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
5-3 Inference on the Means of Two Populations, Variances Unknown
Hypothesis Testing: Two Sample Test for Means and Proportions
Inference about Population Parameters: Hypothesis Testing
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Chapter 15 Nonparametric Statistics
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Chapter Ten Introduction to Hypothesis Testing. Copyright © Houghton Mifflin Company. All rights reserved.Chapter New Statistical Notation The.
Overview of Statistical Hypothesis Testing: The z-Test
Overview Definition Hypothesis
Testing Hypotheses about a Population Proportion Lecture 29 Sections 9.1 – 9.3 Tue, Oct 23, 2007.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
More About Significance Tests
Two Sample Tests Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Essential Statistics Chapter 131 Introduction to Inference.
CHAPTER 14 Introduction to Inference BPS - 5TH ED.CHAPTER 14 1.
1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
BPS - 5th Ed. Chapter 141 Introduction to Inference.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.
BPS - 3rd Ed. Chapter 141 Tests of significance: the basics.
AP Statistics Section 11.1 B More on Significance Tests.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Applied Quantitative Analysis and Practices LECTURE#14 By Dr. Osman Sadiq Paracha.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 2.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.
1 Probability and Statistics Confidence Intervals.
Testing a Single Mean Module 16. Tests of Significance Confidence intervals are used to estimate a population parameter. Tests of Significance or Hypothesis.
Essential Statistics Chapter 171 Two-Sample Problems.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Chapter 9 Estimation using a single sample. What is statistics? -is the science which deals with 1.Collection of data 2.Presentation of data 3.Analysis.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
17th International Conference on Infant Studies Baltimore, Maryland, March 2010 Language Discrimination by Infants: Discriminating Within the Native.
Chapter Nine Hypothesis Testing.
Lecture Nine - Twelve Tests of Significance.
Analysis and Interpretation of Experimental Findings
Basic Practice of Statistics - 3rd Edition Two-Sample Problems
CHAPTER 12 More About Regression
Introductory Statistics
Presentation transcript:

The statistical analysis of acoustic correlates of speech rhythm

1. Introduction " Data description: two corpora " “20 sentences”: 20 sentences spoken three times by two female native speakers of BP and EP ( segmented by Flaviane R. Fernandes and Janaisa M. Viscardi) " “RNM”: 20 sentences of each of : English, Polish, Dutch, French, Spanish, Catalan, Italian, Japanese 5 sentences uttered by each of 4 female speakers

Purposes " Apply the RNM approach to the enlarged data set. " Present alternative descriptive statistical measures " Analize the effect of dropping the last vocalic interval of each sentence. " Introduce a probalility model for duration, which allows for improved descriptions and hypothesis testing " Use this model to give statistical support to the rhythmic class hypothesis

The RNM statistics For each sentence of the corpus the following are computed:  C,  V= standard deviation for vocalic and consonantal intervals %V= proportion of time spent on vocalic intervals Values are averaged for each speaker

%V,  C and  V for the ten languages

%V vs.  C for the ten languages

%V and  C for individual speakers

 V vs.  C for the ten languages

3. Alternative analysis " 3.1 Dropping the last vocalic interval " The last vocalic interval is an important source of variability. " It was obserded that in BP and EP there is a stretching in final vocalic intervals. " New data set: omitting the last vocalic interval for each language, and also the subsequent consonantal interval, if one exists.

The data without the last vocalic interval

Location of BP and EP speakers in the %V vs.  C Plane – complete sentences

The effect of the last vocalic interval in BP and EP- individual values

3.2 Robust statistics “Robust”= insensive to extreme values Simplest robust measure of location: replace the mean by the median. To find the median of a set of numbers: sort them and pick the one in the middle Simplest robust measure of dispersion: replace the standard deviation by the median absolute deviation(MAD).

Robust statistics

4. A probability model for duration " Former analysis is descriptive Finding a parametric family of probability distributions that fits the data closely would have two advantages: " May yield a better description of data " Allows us to make inference, i. e., to extend results from the “sample” (the data set) to the “population” ( the set of all potential setences)

4.1 Histograms show similar asymmetrical shapes

Several distributions tried: Log-normal, Weibull, exponential, Gamma The Gamma was the best fit :quantile-quantile (QQplot)- Given a data set and a theoretical, plot the quantiles of the latter vs those of the empirical distribution

Gamma distribution It has two parameters:  and  controlling shape and size, respectively: small  : high asymmetry;  = 1 gives the the exponential distribution; Large  approximates the normal. The parameters are related to the mean  and standard deviation  by     

Estimated Gamma parameters vocalic intervals

Estimated Gamma parameters consonantal intervals

Mean values for vocalic and consonantal intervals

5. Hypothesis testing In view of the close relationship between  and , we may use one of the two, or one function of both, to represent relevant features. Based on the RNM results, we choose the model standard deviation : â=ÒÑ 1/2

Standard deviation of Gamma for the ten languages – complete sentences

Rhythmic class hypothesis " We represent the rhythmic class hypothesis by the following statistical model:  1.The syllabic languages ( Italian, Spanish, French, Catalan, BP) have the same standard deviation, say,  1.  2.The accentual ones (Polish, Dutch, English, EP) share another, say,  2   1,  2 and the standard deviation for Japanese   are different.

Results To test the model, we first tested (1) and (2) by means of the Likelihood Ratio Test, which yielded a p-value of 0.91, which means that the equality of  's within rhythmic classes is highly compatible with the data ( a small p-value indicates rejection). Then we tested the null hypothesis that some of  ,  ,   are equal, which was rejected with a p-value of , thus giving statistical evidence that the three are different.

Acknowledgments We want to thank Franck Ramus, Marina Nespor and Jacques Mehler, who generously made their unpublished data avalaible to us. We  also thank  Janaisa Viscardi and Flaviane Fernandes for the segmentation of the acoustic data

The “20 sentences” corpus The following sentences of the corpus 20 sentences were considered in the statistical analysis. The choice was based on the quality of acoustic signal and to avoid dubious cases of labeling. 1. A moderniza₤₧o foi satisfatória. 5. A falta de moderniza₤₧o ₫ catastrófica. 6. O trabalho da pesquisadora foi publicado. 8. O governador aceitou a moderniza₤₧o. 9. A falta de autoridade foi alarmante. 11. A catalogadora compreendeu o trabalho da pesquisadora. 12. A professora discutiu a gramaticalidade. 15. A procura da gramaticalidade ₫ o nosso objetivo. 16. A pesquisadora perdeu autoridade. 18. A autoridade cabe ao governador. 20. A gramaticalidade das frases foi conseguida.

Grants supporting the research FAPESP grant n. 98/ (Projeto Temático Rhythmic patterns, parameter setting and language change ) PRONEX grant / (Núcleo de Excel ₨ ncia Critical phenomena in probability and stochastic processes) CNPq grant / (Probabilistic tools for pattern identification applied to linguistics)

Related papers and references Abercrombie, D. (1967). Elements of general phonetics. Chicago: Aldine. Grabe, E. and Low, E., L. (2000) Acoustic correlates in rhythmic class. Paper presented at the 7 th conference on laboratory phonology, Nijmegen. Lloyd, J. (1940) Speech signal in telephony. London. Mehler, J., Jusczyk, P., Dehane-Lambertz, G., Bertoncini, N. And Amiel-Tison, C. (1988) A percursor of language acquisition in young infants. Cognition 29: Nazzi, T., Bertoncini, N. and Mehler, J. ( 1998) Language discrimination by newborns towads an understanding of the role of the rhythm. Journal of experimental psychology: human perception and perfomance 24 (3): Nespor, M. (1990) On the rhythm parameter in phonology. Logical issues in language acquisition, Iggy Roca, Ramus, F. And Mehler, J. ( 1999). Language acquisition with suprasegmental cues: a study based on speecch resynthesis. JASA 105: Ramus, F., Nespor, M. and Mehler (1999) Correlates of linguistic rhythm in speech. Cognition 73: Frota, S. and Vigário, M.(2001) On the correlates of rhythm distinctions: the European/ Brazilian Portuguese case. To be published in Probus.

Appendix: the meaning of a p-value Consider the situation of testing a statistical hypothesis: To fix ideas, suppose that we have samples from two populations, and we want to test the hypothesis that both have the same (unknown) mean. Of course, even if the hypothesis is true, the two sample means will be different, due to sampling variability. To test the hypothesis, we compute a number T from our data (the so-called “test statistics”) which measures the discrepancy between the data and the hypothesis. In our example T will depend on the differences between the sample means. If T is very large, we have a statistical evidence against the hypothesis. What is a rational definition of “large”?

Suppose our data yields T=3.5; and that we compute the probability p that, if the hypothesis is true, we obtain a value of T greater than 3.5. This the so- called “p-value” of the test. If, say, p= 0.002, this means that, if the means are equal, we would be observing an exceptionally large value ( since a larger one is observed only with probability 0.2%); Thus we would have grounds to reject the hypothesis.