Presentation is loading. Please wait.

Presentation is loading. Please wait.

MA in English Linguistics Experimental design and statistics II Sean Wallis Survey of English Usage University College London

Similar presentations


Presentation on theme: "MA in English Linguistics Experimental design and statistics II Sean Wallis Survey of English Usage University College London"— Presentation transcript:

1 MA in English Linguistics Experimental design and statistics II Sean Wallis Survey of English Usage University College London s.wallis@ucl.ac.uk

2 Outline Plotting data with Excel™ The idea of a confidence interval Binomial  Normal  Wilson Interval types –1 observation –The difference between 2 observations From intervals to significance tests

3 Plotting graphs with Excel™ Microsoft Excel is a very useful tool for  collecting data together in one place  performing calculations  plotting graphs Key concepts of spreadsheet programs: –worksheet - a page of cells (rows x columns) you can use a part of a page for any table –cell - a single item of data, a number or text string referred to by a letter (column), number (row), e.g. A15 each cell can contain: –a string: e.g. ‘Speakers –a number: 0, 23, -15.2, 3.14159265 –a formula: =A15, =$A15+23, =SQRT($A$15), =SUM(A15:C15)

4 Plotting graphs with Excel™ Importing data into Excel: –Manually, by typing –Exporting data from ICECUP Manipulating data in Excel to make it useful: –Copy, paste: columns, rows, portions of tables –Creating and copying functions –Formatting cells Creating and editing graphs: –Several different types (bar chart, line chart, scatter, etc) –Can plot confidence intervals as well as points You can download a useful spreadsheet for performing statistical tests: –www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls

5 Recap: the idea of probability A way of expressing chance 0 = cannot happen 1 = must happen Used in (at least) three ways last week P = true probability (rate) in the population p = observed probability in the sample  = probability of p being different from P –sometimes called probability of error, p e –found in confidence intervals and significance tests

6 The idea of a confidence interval All observations are imprecise –Randomness is a fact of life –Our abilities are finite: to measure accurately or reliably classify into types We need to express caution in citing numbers Example (from Levin 2013): –77.27% of uses of think in 1920s data have a literal (‘cogitate’) meaning

7 The idea of a confidence interval All observations are imprecise –Randomness is a fact of life –Our abilities are finite: to measure accurately or reliably classify into types We need to express caution in citing numbers Example (from Levin 2013): –77.27% of uses of think in 1920s data have a literal (‘cogitate’) meaning Really? Not 77.28, or 77.26?

8 The idea of a confidence interval All observations are imprecise –Randomness is a fact of life –Our abilities are finite: to measure accurately or reliably classify into types We need to express caution in citing numbers Example (from Levin 2013): –77% of uses of think in 1920s data have a literal (‘cogitate’) meaning

9 The idea of a confidence interval All observations are imprecise –Randomness is a fact of life –Our abilities are finite: to measure accurately or reliably classify into types We need to express caution in citing numbers Example (from Levin 2013): –77% of uses of think in 1920s data have a literal (‘cogitate’) meaning Sounds defensible. But how confident can we be in this number?

10 The idea of a confidence interval All observations are imprecise –Randomness is a fact of life –Our abilities are finite: to measure accurately or reliably classify into types We need to express caution in citing numbers Example (from Levin 2013): –77% (66-86%*) of uses of think in 1920s data have a literal (‘cogitate’) meaning

11 The idea of a confidence interval All observations are imprecise –Randomness is a fact of life –Our abilities are finite: to measure accurately or reliably classify into types We need to express caution in citing numbers Example (from Levin 2013): –77% (66-86%*) of uses of think in 1920s data have a literal (‘cogitate’) meaning Finally we have a credible range of values - needs a footnote* to explain how it was calculated.

12 Binomial  Normal  Wilson Binomial distribution –Expected pattern of observations found when repeating an experiment for a given P (here, P = 0.5 ) –Based on combinatorial mathematics p F 0.50.30.10.70.9 P

13 Binomial  Normal  Wilson Binomial distribution –Expected pattern of observations found when repeating an experiment for a given P (here, P = 0.5 ) –Based on combinatorial mathematics –Other values of P have different expected distribution patterns p F 0.50.30.10.70.9 P 0.30.10.05

14 Binomial  Normal  Wilson Binomial distribution –Expected pattern of observations found when repeating an experiment for a given P (here, P = 0.5 ) –Based on combinatorial mathematics Binomial  Normal –Simplifies the Binomial distribution (tricky to calculate) to two variables: mean P –P is the most likely value standard deviation S –S is a measure of spread p F 0.50.30.10.70.9 P S

15 Binomial  Normal  Wilson Binomial distribution Binomial  Normal –Simplifies the Binomial distribution (tricky to calculate) to two variables: mean P standard deviation S Normal  Wilson –The Normal distribution predicts observations p given a population value P –We want to do the opposite: predict the true population value P from an observation p –We need a different interval, the Wilson score interval p F 0.50.30.10.70.9 P

16 Binomial  Normal Any Normal distribution can be defined by only two variables and the Normal function z z. S F –With more data in the experiment, S will be smaller p 0.50.30.10.7  population mean P  standard deviation S =  P(1 – P) / n 

17 Binomial  Normal Any Normal distribution can be defined by only two variables and the Normal function z z. S F 2.5%  population mean P –95% of the curve is within ~2 standard deviations of the expected mean  standard deviation S =  P(1 – P) / n  p 0.50.30.10.7 95% –the correct figure is 1.95996! =the critical value of z for an error level of 0.05.

18 Binomial  Normal Any Normal distribution can be defined by only two variables and the Normal function z z. S F 2.5%  population mean P –95% of the curve is within ~2 standard deviations of the expected mean  standard deviation S =  P(1 – P) / n  p 0.50.30.10.7 95% –The ‘tail areas’ –For a 95% interval, total 5%

19 The single-sample z test... Is an observation p > z standard deviations from the expected (population) mean P ? z. S F P p 0.50.30.10.7 observation p If yes, p is significantly different from P 2.5%

20 ...gives us a “confidence interval” The interval about p is called the Wilson score interval ( w –, w + ) This interval reflects the Normal interval about P : If P is at the upper limit of p, p is at the lower limit of P (Wallis, 2013) F P 2.5% p w+w+ observation p w–w– 0.50.30.10.7

21 ...gives us a “confidence interval” The Wilson score interval ( w –, w + ) has a difficult formula to remember F P 2.5% p w+w+ observation p w–w– 0.50.30.10.7  s' =  p(1 – p)/n + z²/4n²   p' = p + z²/2n 1 + z²/n  ( w –, w + ) = (p' – s', p' + s')

22 ...gives us a “confidence interval” The Wilson score interval ( w –, w + ) has a difficult formula to remember F P 2.5% p w+w+ observation p w–w– 0.50.30.10.7 You do not need to know this formula! You can use the 2x2 spreadsheet!  s' =  p(1 – p)/n + z²/4n²   p' = p + z²/2n 1 + z²/n  ( w –, w + ) = (p' – s', p' + s') –www.ucl.ac.uk/english -usage/statspapers/ 2x2chisq.xls

23 An example: uses of think Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods –This is the graph we created in Excel –http://corplingstats.wordpress.com/2012/04/03/plotting-confidence-intervals/

24 An example: uses of think Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods –This is the graph we created in Excel –Not an alternation study Categories are not “choices” –The graph plots the probability of reading different uses of the word think (given the writer used the word) –http://corplingstats.wordpress.com/2012/04/03/plotting-confidence-intervals/

25 An example: uses of think Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods –This is the graph we created in Excel –Has Wilson score intervals for each point –http://corplingstats.wordpress.com/2012/04/03/plotting-confidence-intervals/

26 An example: uses of think Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods –This is the graph we created in Excel –Has Wilson score intervals for each point –It is easy to spot where intervals overlap A quick test for significant difference –http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

27 An example: uses of think Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods –Wilson score intervals for each point –It is easy to spot where intervals overlap A quick test for significant difference –No overlap = significant –Overlaps point = ns –Otherwise test fully –http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

28 A quick test for significant difference No overlap = significant Overlaps point = ns Otherwise test fully –http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ p1p1 p2p2 w1–w1– w1+w1+ w2–w2– w2+w2+

29 A quick test for significant difference No overlap = significant Overlaps point = ns Otherwise test fully –http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ p1p1 p2p2 w1–w1– w1+w1+ w2–w2– w2+w2+ Lower bound Upper bound Observed probability

30 p1p1 p2p2 w1–w1– w1+w1+ w2–w2– w2+w2+ Test 1: Newcombe’s test This test is used when data is drawn from different populations (different years, groups, text categories) –We calculate a new Newcombe-Wilson interval ( W –, W + ): W – = -  (p 1 – w 1 – ) 2 + (w 2 + – p 2 ) 2 W + =  (w 1 + – p 1 ) 2 + (p 2 – w 2 – ) 2 –http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ (Newcombe, 1998)

31 p1p1 p2p2 w1–w1– w1+w1+ w2–w2– w2+w2+ Test 1: Newcombe’s test This test is used when data is drawn from different populations (different years, groups, text categories) –We calculate a new Newcombe-Wilson interval ( W –, W + ): W – = -  (p 1 – w 1 – ) 2 + (w 2 + – p 2 ) 2 W + =  (w 1 + – p 1 ) 2 + (p 2 – w 2 – ) 2 –We then compare W – < (p 2 – p 1 ) < W + –http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ (Newcombe, 1998)

32 p1p1 p2p2 w1–w1– w1+w1+ w2–w2– w2+w2+ Test 1: Newcombe’s test This test is used when data is drawn from different populations (different years, groups, text categories) –We calculate a new Newcombe-Wilson interval ( W –, W + ): W – = -  (p 1 – w 1 – ) 2 + (w 2 + – p 2 ) 2 W + =  (w 1 + – p 1 ) 2 + (p 2 – w 2 – ) 2 –We then compare W – < (p 2 – p 1 ) < W + –http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ (p 2 – p 1 ) < 0 = fall (Newcombe, 1998)

33 p1p1 p2p2 w1–w1– w1+w1+ w2–w2– w2+w2+ Test 1: Newcombe’s test This test is used when data is drawn from different populations (different years, groups, text categories) –We calculate a new Newcombe-Wilson interval ( W –, W + ): W – = -  (p 1 – w 1 – ) 2 + (w 2 + – p 2 ) 2 W + =  (w 1 + – p 1 ) 2 + (p 2 – w 2 – ) 2 –We then compare W – < (p 2 – p 1 ) < W + –We only need to check the inner interval –http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ (Newcombe, 1998)

34 Test 2: 2 x 2 chi-square This test is used when data is drawn from the same population of speakers (e.g. grammar -> grammar) –We put the data into a 2 x 2 table www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls –http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ (Wallis, 2013)

35 Test 2: 2 x 2 chi-square This test is used when data is drawn from the same population of speakers (e.g. grammar -> grammar) –We put the data into a 2 x 2 table www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls –The test uses the formula  2 =  (o – e) 2 where e = r x c / n –http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ e (Wallis, 2013)

36 Expressing change Percentage difference is a very common idea: –“X has grown by 50%” or “Y has fallen by 10%” –We can calculate percentage difference by d % = d / p 1 where d = p 2 – p 1 –We can put Wilson confidence intervals on d % BUT Percentage difference can be very misleading –It depends heavily on the starting point p 1 (might be 0) –What does it mean to say something has increased by 100%? it has decreased by 100%? It is better to simply say that –“the rate of ‘cogitate’ uses of think fell from 77% to 59%” –http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

37 Summary We analyse results to help us report them –Graphs are extremely useful! You can include graphs and tables in your essays –If a result is not significant, say so and move on… Don’t say it is “nearly significant” or “indicative” –An error level of 0.05 (or 95% correct) is OK Some people use 0.01 (99%) but this is not really better Wilson confidence intervals tell us –Where the true value is likely to be –Which differences between observations are likely to be significant If intervals partially overlap, perform a more precise test

38 Summary Always say which test you used, e.g. –“We compared ‘cogitate’ uses of think with other uses, between the 1920s and 1960s periods, and this was significant according to  2 at the 0.05 error level.” Tell your reader that you have plotted (e.g.) “95% Wilson confidence intervals” in a footnote to the graph. For advice on deciding which test to use, see –http://corplingstats.wordpress.com/2012/04/11/choosing-right- test/ The tests you will need in one spreadsheet: –www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls

39 References Levin, M. 2013. The progressive in modern American English. In Aarts, B., J. Close, G. Leech and S.A. Wallis (eds). The Verb Phrase in English: Investigating recent language change with corpora. Cambridge: CUP. Newcombe, R.G. 1998. Interval estimation for the difference between independent proportions: comparison of eleven methods. Statistics in Medicine 17: 873-890 Wallis, S.A. 2013. z-squared: The origin and application of χ². Journal of Quantitative Linguistics 20: 350-378. Wilson, E.B. 1927. Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association 22: 209-212 Assorted statistical tests: –www.ucl.ac.uk/english-usage/staff/sean/resources/2x2chisq.xls


Download ppt "MA in English Linguistics Experimental design and statistics II Sean Wallis Survey of English Usage University College London"

Similar presentations


Ads by Google