The Golden Age of Speech and Language Science Mark Liberman University of Pennsylvania

Slides:

Advertisements

Similar presentations

Critical Reading Strategies: Overview of Research Process

Advertisements

Qualitative methods - conversation analysis

1 The Effect of Pitch Span on the Alignment of Intonational Peaks and Plateaux Rachael-Anne Knight University of Cambridge.

The Scientific Method.

Results ISI Variance in STP Corpus ISI Variance in BU Corpus * p

Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.

Prosodic Signalling of (Un)Expected Information in South Swedish Gilbert Ambrazaitis Linguistics and Phonetics Centre for Languages and Literature.

Quantitative vs. Qualitative Research Method Issues Marian Ford Erin Gonzales November 2, 2010.

B121 Chapter 7 Investigative Methods. Quantitative data & Qualitative data Quantitative data It describes measurable or countable features of whatever.

Designing a Continuum of Learning to Assess Mathematical Practice NCSM April, 2011.

Development of coarticulatory patterns in spontaneous speech Melinda Fricke Keith Johnson University of California, Berkeley.

Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.

Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.

1 WELL-BEING AND ADJUSTMENT OF SPONSORED AGING IMMIGRANTS Shireen Surood, PhD Supervisor, Research & Evaluation Information & Evaluation Services Addiction.

©2007 Prentice Hall Organizational Behavior: An Introduction to Your Life in Organizations Chapter 19 OB is for Life.

DATA ANALYSIS IN QUALITATIVE RESEARCH Review - DATA COLLECTION METHODS= before I analyze, I must collect 1. usually data is collected by the use of two,

Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.

Business and Management Research

The Future of Computational Linguistics: or, What Would Antonio Zampolli Do? Mark Liberman University of Pennsylvania

McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)

STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.

Sociological Research Methods and Techniques

Chapter 1 Introduction to the statistics. Chapter One What is Statistics? ONE Understand why we study statistics. TWO Explain what is meant by descriptive.

BPS - 3rd Ed. Chapter 211 Inference for Regression.

 How to Sound like a Native English Speaker Joey Nevarez CELOP.

The Process of Science Science is the quest to understand nature.

Chapter 11: Qualitative and Mixed-Method Research Design

Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,

The Golden Age of Speech and Language Science Mark Liberman University of Pennsylvania

Montclair State University 10/12/2015. Sociological Inquiry Families do not exist or evolve in isolation Rather, they react to and have an influence on.

Linguistics Introduction.

Variables, sampling, and sample size. Overview  Variables  Types of variables  Sampling  Types of samples  Why specific sampling methods are used.

Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.

Hypothesis & Research Questions Understanding Differences between qualitative and quantitative approaches.

Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.

Planning an Applied Research Project Chapter 3 – Conducting a Literature Review © 2014 by John Wiley & Sons, Inc. All rights reserved.

1 ©2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Into the Golden Age Mark Liberman University of Pennsylvania

Scientific Methods and Terminology. Scientific methods are The most reliable means to ensure that experiments produce reliable information in response.

Week 2 The lecture for this week is designed to provide students with a general overview of 1) quantitative/qualitative research strategies and 2) 21st.

Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.

Performance Comparison of Speaker and Emotion Recognition

Levels of Linguistic Analysis

Paper III Qualitative research methodology.  Qualitative research is designed to reveal a specific target audience’s range of behavior and the perceptions.

Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Helpful hints for planning your Wednesday investigation.

Evaluation and Assessment of Instructional Design Module #4 Designing Effective Instructional Design Research Tools Part 2: Data-Collection Techniques.

Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.

Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.

Sociology. Sociology is a science because it uses the same techniques as other sciences Explaining social phenomena is what sociological theory is all.

INTRODUCTION TO APPLIED LINGUISTICS

Abstracts. What is an abstract?  a self-contained, short, and powerful statement that describes a larger work  Components vary according to discipline.

BPS - 5th Ed. Chapter 231 Inference for Regression.

Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.

Abstract  An abstract is a concise summary of a larger project (a thesis, research report, performance, service project, etc.) that concisely describes.

Daniel R. Harris Center for Clinical and Translational Sciences

Advanced Higher Modern Languages

Introduction to Corpus Linguistics

CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.

Studying Intonation Julia Hirschberg CS /21/2018.

Introduction to science

Business and Management Research

Audio Books for Phonetics Research

Understanding Variation of VOT in spontaneous speech

Quantitative vs. Qualitative Research Method Issues

Practicing Science Table of Contents Math in Science Graphs Brainpop-

Detecting evolutionary forces in language change (2017)

Levels of Linguistic Analysis

Presentation transcript:

The Golden Age of Speech and Language Science Mark Liberman University of Pennsylvania

Abstract: From the perspective of a linguist, today's vast archives of digital text and speech, along with new analysis techniques and inexpensive computation, look like a wonderful new scientific instrument, a modern equivalent of the 17th-century invention of the telescope and microscope. We can now observe linguistic patterns in space, time, and cultural context, on a scale three to six orders of magnitude greater than in the past, and simultaneously in much greater detail than before. Scientific use of these new instruments remains mainly potential, but the next decade is likely to be a new "golden age" of research. This talk will discuss some of the barriers to be overcome, present some successful examples, and speculate about future directions. 10/4/2010CUHK: The Golden Age2

According to the National Academy of Sciences: We see that the computer has opened up to linguists a host of challenges, partial insights, and potentialities. We believe these can be aptly compared with the challenges, problems, and insights of particle physics. Certainly, language is second to no phenomenon in importance. And the tools of computational linguistics are considerably less costly than the multibillion-volt accelerators of particle physics. The new linguistics presents an attractive as well as an extremely important challenge. There is every reason to believe that facing up to this challenge will ultimately lead to important contributions in many fields. Language and Machines: Computers in Translation and Linguistics Report by the Automatic Language Processing Advisory Committee (ALPAC), National Academy of Sciences 10/4/20103CUHK: The Golden Age

That’s what they all say... Progress in any science depends on a combination of improved observation, measurement, and techniques. The cheap computing of the past two decades means there has been a tremendous increase in the availability of economic data and huge strides in econometric techniques. As a result, economics stands at the verge of a golden age of discovery. -Diane Coyle, “Economics on the Verge of a Golden Age”, The Chronicle of Higher Education, March 12, /4/2010CUHK: The Golden Age4

Two wrinkles (1)ALPAC ’s main recommendation was to de-fund Machine Translation research.... wait, what? (2) And, the ALPAC report came out in 1966 (!) so 44 years later, where’s the QCD of linguistics? 10/4/2010CUHK: The Golden Age5

The plan vs. the reality ALPAC ‘s idea: 1. computers → new language science 2. language science → language engineering What actually happened: 1. computers → new language engineering 2. engineering → new language science (???) 10/4/20106CUHK: The Golden Age

Parenthesis: What is linguistics? The term linguistics is ambiguous: (1) “rational inquiry into questions of speech and language” (2) “the institutions of academic linguistics” (3) “what people identified as linguists do” In sense (1), most linguistics today is not done by linguists, but rather by computer scientists, electrical engineers, psychologists, neurologists, anthropologists, biologists, lexicographers, classicists, etc. Still,... 10/4/2010CUHK: The Golden Age7

Speech and language technology Despite the “funding winter” that followed ALPAC (and Pierce 1969), there has been an extraordinary flowering of technologies for dealing with digital speech and text. As more and more of the world’s intellectual and social life is mediated by digital networks, we can expect this to continue. 10/4/2010CUHK: The Golden Age8

The science of speech Plenty of computer use – minicomputers in the 1960s – micro- and super-computers in the 1980s – ubiquitous laptops today Applications: – replaced tape splicing – replaced sound spectrograph – easier pitch tracking, formant tracking – more convenient statistical analysis – and so on BUT… 10/4/20109CUHK: The Golden Age

No phonetic quantum mechanics Great speech science by smart people But surprisingly little change – in style and scale of research – in scientific questions about speech – in the rate of progress compared to (the first golden age of phonetics) …at least on the acoustic analysis side. Peterson & Barney 1951 – data is still relevant – many contemporary publications are similar in style and scale 10/4/201010CUHK: The Golden Age

The science of language Despite ALPAC’s 1966 predictions, computers have not had a large impact on “the new linguistics” (in the sense of what “linguists” do). This has begun to change over the past decade, at least in some sub-disciplines. But for the most part, the effects are still in the future tense. 10/4/2010CUHK: The Golden Age11

What went wrong? There are still many unmet challenges, – not enough new insights, – and the scientific potentialities of 1966 are still mostly potential. Is this just cultural conservatism? No. (1966-era) Computers were not enough: we also need – adequate accessible digital data – tools for large-scale automated analysis – applicable research paradigms Now: we have (at least) two out of three... 10/4/201012CUHK: The Golden Age

Why 2010 is like 1610 … at least a little … Telescope: invented 1608 Galileo 1609, Kepler 1611, Newton 1668 Microscope: invented 1590 Hooke 1665, Leeuvenhoek 1674 Instruments that opened new worlds to view 10/4/2010CUHK: The Golden Age13

Why 2010 is like 1610 Science needs theory -- but “Sometimes you can observe a lot just by watching” -Yogi Berra 10/4/201014CUHK: The Golden Age

Breakfast experiments Our “telescope and microscope” are – Easily available collections of speech and text – Computer algorithms for analyzing speech and text aligning speech and text collecting, displaying, analyzing statistics When we point these new instruments in almost any direction, we see interesting new things This is so easy and fast that we can often do an “experiment” on a laptop over breakfast. 10/4/2010CUHK: The Golden Age15

10/4/2010CUHK: The Golden Age16 These quick looks are not a substitute for serious research. But they illustrate the power of our new tools, and allow us to explore interesting new directions quickly.

Five One-Hour Explorations Do Japanese speakers show more gender polarization in pitch than American speakers? Do American women talk more (and faster) than men? How does word duration vary with phrase position? How does local speaking rate vary in the course of a conversation? How does disfluency vary with sex and age? 10/4/2010CUHK: The Golden Age17

One-hour exploration #1 Gender polarization in conversational speech Question: are Japanese men and women more polarized (more different) in pitch than Americans or Europeans? Method: – Pitch-track published telephone conversations – LDC “Call Home” publications for Japanese, U.S. English, German Collected , published about 100 conversations per language – Compare quantiles of pooled values Answer: yes, apparently so. 10/4/2010CUHK: The Golden Age18

Data from CallHome M/F conversations; about 1M F0 values per category. 10/4/201019CUHK: The Golden Age

As usual, more questions: Other cultures and languages Effects of speaker’s age Effects of relationship between speakers, nature of discussion Formal vs. conversational speech Effects of social class, region 10/4/2010CUHK: The Golden Age20

One-hour exploration #2a Sex differences in conversational word counts Question: Do women talk more than men? Method: Count words in “Fisher” transcripts – Conversational telephone speech Collected by LDC in ,850 ten-minute conversations – 2,368 between two women – 1,910 one woman, one man – 1,572 between two men Answer: No. 10/4/2010CUHK: The Golden Age21

10/4/2010CUHK: The Golden Age22

10/4/2010CUHK: The Golden Age23

One-hour experiment #2b Sex differences in conversational speaking rates Question: Do women talk faster than men? Method: Words and speaking times in Fisher 2003 Answer: No. 10/4/2010CUHK: The Golden Age24

(11,700 conversational sides; mean speaking rate=173 wpm, sd=27) (Male mean 174.3, female 172.6: difference 1.7, effect size d=0.06) 10/4/201025CUHK: The Golden Age

One-Hour Experiment #3 Phrasal modulation of speaking rate – “final lengthening” a well-established effect – first observed by Abbé J.-P. Rousselot ~1870 – other phrase-position effects are less clear What is a “phrase”? – A syntactic unit? – A unit of information structure? – A unit of speech production? Method: word duration by position in “pause group” (stretch of speech without internal silence >100 msec) Data: Switchboard corpus Result: Amazingly regular (average) pattern 10/4/2010CUHK: The Golden Age26

Data from Switchboard; phrases defined by silent pauses (Yuan, Liberman & Cieri, ICSLP 2006) 10/4/201027CUHK: The Golden Age

10/4/201028CUHK: The Golden Age

One-hour experiment #4 How does speaking rate reflect the ebb and flow of a conversation? Method: word- or syllable-count in moving window over time-aligned transcripts Result: suggestive pictures 10/4/2010CUHK: The Golden Age29

10/4/201030CUHK: The Golden Age

One-hour experiment #5 How does disfluency vary with sex and age? Method: count “filled pauses” in transcripts of U.S. English conversations by demographic categories of speakers Result: systematic but unexpected interaction 10/4/2010CUHK: The Golden Age31

10/4/201032CUHK: The Golden Age

10/4/201033CUHK: The Golden Age

Serious speech science Transcribed speech is available in very large quantities By applying – forced alignment – pronunciation modeling – automated measurements – multilevel regression we see a new universe of speech data, on a scale 4-5 orders of magnitude greater than the laboratory recordings of the past. And interesting patterns are everywhere! 10/4/201034CUHK: The Golden Age

Interdisciplinary opportunities These techniques will have rich applications in other fields – Clinical diagnosis and evaluation – Educational assessment – Social science survey methods – Studies of performance style –... and so on... Wherever speech and language are relevant! 10/4/2010CUHK: The Golden Age35

10/4/2010CUHK: The Golden Age36 The early years of the twenty-first century have seen a heroic age for intellectual life. Ideas have poured across the world and new minds have joined the professionalized academics and authors in grappling with the heritage of humanity. […] No field of study is poised to benefit more than those of us who study the ancient Greco-Roman world and especially the texts in Greek and Latin to which philologists for more than two thousand years have dedicated their lives. […] The terms eWissenschaft and ePhilology, like their counterparts eScience and eResearch, point towards those elements that distinguish the practices of intellectual life in this emergent digital environment from print-based practices. Terms such as eWissenschaft and ePhilology do not define those differences but assert that those differences are qualitative. We cannot simply extrapolate from past practice to anticipate the future. -- Gregory Crane et al., “Cyberinfrastructure for Classical Philology”, Digital Humanities Quarterly, Winter 2009 Even in classical scholarship!

An historic opportunity: Take an interesting problem, and add – a little linguistics and phonetics – a little psychology – a little signal processing – a little statistics and machine learning – a little computer science – your curiosity and initiative And the future is yours! 10/4/2010CUHK: The Golden Age37

10/4/2010CUHK: The Golden Age38 Thank you!

Serious example #1: Automating sociolinguistic measurements W. Labov, S. Ash, and C. Boberg, Atlas of North American English: Phonetics, Phonology and Sound Change, Mouton 2005 K. Evanini, S. Isard, and M. Liberman, “Automatic formant extraction for sociolinguistic analysis of large corpora”, Interspeech 2009 K. Evanini, The permeability of dialect boundaries: A case study of the region surrounding Erie, Pennsylvania, PhD diss /4/2010CUHK: The Golden Age39

10/4/2010CUHK: The Golden Age40 The ANAE corpus consists of ca. 30-minute long dialectological interviews conducted over the telephone with speakers from across the United States and Canada … at least two speakers were selected randomly from every city in North America with more than 50,000 inhabitants, and only speakers who had lived their entire lives in that city were chosen. […] A total of 439 speakers were selected for detailed acoustic analysis by the ANAE authors. For these speakers, annotators examined all tokens with primary stress, and provided hand measurements for the first two formants at a single point in time. For the purposes of comparing automatic formant prediction methods with the F1 and F2 values provided by the human ANAE annotators, it is necessary to determine the point in time at which the manual F1 and F2 measurements are taken. This information is not contained in the log files that were included in the published version of ANAE, but is available in earlier versions of the log files obtained from the ANAE authors. These two sources of information were merged to produce a database of F1 and F2 measurements with time stamps for a total of 111,810 tokens from 384 speakers (formant data from [the remaining] speakers had to be excluded because the original log files with time stamps were not available).

10/4/201041CUHK: The Golden Age

10/4/201042CUHK: The Golden Age

10/4/201043CUHK: The Golden Age

Evanini, Isard & Liberman, “Automatic formant extraction for sociolinguistic analysis of large corpora”, Interspeech /4/201044CUHK: The Golden Age

Serious example #2: Exploring patterns of pitch “Declination” is the tendency for voice pitch to fall, on average, over the course of a phrase. Studied quantitatively for more than 50 years. What happens when we point our new instruments in this direction? F 0 Declination in English and Mandarin Broadcast News Speech”, InterSpeech Jiahong Yuan & Mark Liberman, “F 0 Declination in English and Mandarin Broadcast News Speech”, InterSpeech /4/2010CUHK: The Golden Age45

CUHK: The Golden Age46 Introduction (1) Declination refers to the downward trend of F 0 over the course of an utterance (Cohen, Collier and 't Hart, 1982). Possible physiological causes for F 0 declination: – Downtrend of subglottal pressure (Lieberman, 1967) – Tracheal pull (Maeda, 1976) – Activity of laryngeal muscles (Ohala, 1990). Debates about F 0 declination: – Is it (only) a by-product of the physics and physiology of talking? Or is it (also) a linguistically-controlled feature? (Ladd 1984, Strik and Boves 1995). – Is it (only) the result of local events and fixed-rate processes? Or does it require phrase-scale pre-planning? 10/4/2010

CUHK: The Golden Age47 Introduction (2) Findings and theories about F 0 declination: – The slope of the declination depends on the sentence type: it is steepest for terminal declaratives and least steep for interrogatives (Thorsen, 1980). – The initial F 0 peak increased with sentence length (Cooper and Sorenscn, 1981). – Declination could be explained by (local) downstep plus (local) final lowering effects (Liberman and Pierrehumbert, 1984). – The F 0 topline, i.e., the line connecting F 0 peaks, was steeper in short sentences than in long ones (Swerts et al., 1996). – Final lowering is grammaticalized and independent of declination (Arvaniti, 2009). 10/4/2010

CUHK: The Golden Age48 Introduction (3) Besides declination, the F 0 contour of a sentence is affected by many linguistic and situational factors. Most previous studies of declination used controlled experiments to minimize the effects of the other factors. In this study, we investigate F 0 declination in large broadcast news speech corpora. We hypothesize that in large natural speech corpora, the other factors will balance and cancel each other, and the “true” declination effect will be revealed. In any case, this source of evidence allows us to examine the debates about declination from another angle. 10/4/2010

CUHK: The Golden Age49 Introduction (4) F 0 declination is also found in Mandarin Chinese (Xu, 1999): – Declination on high tone sequences (Shih 2003) – Linear baseline declination (Yuan 2004) – We will compare the declination effect in English and in Mandarin Chinese. 10/4/2010

CUHK: The Golden Age50 Data Three broadcast news speech corpora were used: – 1997 English Broadcast News Speech (LDC98S71) – 1997 Mandarin Broadcast News Speech (LDC98S73) – 1997 Spanish Broadcast News Speech (LDC98S74) The “utterances” (between-pause units in the transcripts) were aligned with the transcripts using the PPL Forced Aligner: – Units containing a pause longer than 50 ms were excluded. – Units from unknown speakers were also excluded. Utterances 1-4 seconds long were selected for this study: – 5,652 in English; 8,383 in Mandarin; 3,904 in Spanish (All data and scripts will be published in association with the paper) 10/4/2010

CUHK: The Golden Age51 Methods (1) F 0 contours were extracted using esps/get_f0 with a 10 ms frame rate. The contours were linearly interpolated to be continuous over the unvoiced segments, and smoothed by passing them through a Butterworth low-pass filter with normalized cutoff frequency at 0.1. F 0 values were converted to semitones. The base frequency used for calculating semitones was speaker dependent, defined as the 5 th percentile of all F 0 values for that speaker. 10/4/2010

CUHK: The Golden Age52 Method Two methods were applied to measure F 0 declination. – A linear regression line was fitted to each contour using the least- squares method. – The convex-hull algorithm (Mermelstein, 1975) was used to identify local F 0 valleys and peaks. The F 0 peaks were connected to form the topline and the F 0 valleys were connected to form the baseline. 10/4/2010

CUHK: The Golden Age53 Results: Length and Slope There is a strong correlation between utterance length and declination slope: The shorter the utterance, the steeper the slope of the regression line is. 10/4/2010

And also in Spanish! Our Spanish data were very close to English: CUHK: The Golden Age5410/4/2010

CUHK: The Golden Age55 Results: Length and Slope, Excluding Edges The time-slope relationship still holds after excluding the initial and final 500 ms. That is, shorter utterances have steeper declination within the middle region, from which likely “edge effects” are excluded. 10/4/2010

CUHK: The Golden Age56 Results: Topline and Baseline Both the topline and baseline show declination, and the topline has final lowering in both languages. The baseline of Mandarin Chinese is close to a straight line. In English the topline and baseline are very similar: initial rising, middle declination, and final lowering. 10/4/2010

And also in Spanish! Again, Spanish is very much like English (in this measure, on this data): CUHK: The Golden Age5710/4/2010

CUHK: The Golden Age58 Results: Pitch Range, Number of Extrema In this sample, Mandarin Chinese has wider pitch range than English. Mandarin Chinese also has more F 0 peaks and valleys than English, i.e. more F 0 fluctuations 10/4/2010

CUHK: The Golden Age59 Conclusions and discussion (1) We investigated F 0 declination in large broadcast news speech corpora in English and Mandarin Chinese (… and Spanish). We applied two methods, linear regression and convex hull. Regression was used to measure overall declination slope, and the convex hull was used to extract F 0 peaks and valleys for representing the topline and baseline patterns. There was a strong relation between declination slope and utterance length: the shorter the utterance, the steeper the declination. This relationship still holds when the first and last 0.5 sec. are excluded. This result suggests that speakers may control declination slope (directly or indirectly) in a way that requires phrase-level planning. 10/4/2010

CUHK: The Golden Age60 Conclusions and discussion (2) Both the topline and baseline show declination, and the topline has final lowering -- in all three languages. In Mandarin Chinese, the baseline is close to a straight line, which is different from the topline. In English and Spanish, the baseline and topline are similar, both consisting of three parts: initial rising, middle declination, and final lowering. This cross-linguistic difference suggests that topline and baseline declinations are partly independent phenomena, and that they are not entirely automatic by-products of some physiological process, but are linguistically controlled at least in part. Finally, our results showed that Mandarin Chinese has wider pitch range and more F 0 fluctuations than English. This is probably due to the effect of lexical tone, but may also represent a difference in broadcast styles. 10/4/2010

More on Sources – 1997 English Broadcast News Speech (LDC98S71): ABC World News Tonight, CNN Headline News, CNN Early Prime, CNN Prime News, CNN The World Today, PRI The World, CSPAN Public Policy, CSPAN Washington Journal – 1997 Mandarin Broadcast News Speech (LDC98S73): CC-TV, KAZN-AM, VOA – 1997 Spanish Broadcast News Speech (LDC98S74): ECO, Univision, VOA CUHK: The Golden Age6110/4/2010

Serious example #3 Noah Constant, Christopher Davis, Christopher Potts, and Florian Schwarz, “The pragmatics of expressive content: Evidence from large corpora” Sprache und Datenverarbeitung, We use large collections of online product reviews, in Chinese, English, German, and Japanese, to study the use conditions of expressives (swears, antihonorifics, intensives). The distributional evidence provides quantitative support for a pragmatic theory of these items that is based in speaker and hearer expectations. 10/4/2010CUHK: The Golden Age62

… and more … Christopher Potts and Florian Schwarz, “Affective 'this‘”, Linguistic Issues in Language Technology, Lakoff (1974) argues that affective demonstratives in English are markers of solidarity, with exclamative overtones deriving from their close association with evaluative predication. Focusing on this, we seek to inform these claims using quantitative corpus evidence. Our experiments suggest that affectivity is not limited to specific uses of this, but rather that it arises in a wide range of linguistic and discourse contexts. We also briefly extend our methodology to demonstrative that and to German diese- (‘this’). 10/4/2010CUHK: The Golden Age63