A Question of Questions: Prosodic Cues to Question Form and Function Julia Hirschberg (Joint work with) Jennifer Venditti and Jackson Liscombe.

Slides:



Advertisements
Similar presentations
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Advertisements

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Prosodic analysis: theoretical value and practical difficulties Anne Wichmann Nicole Dehé.
Results Clear distinction between two question intonations: perception and understanding level Three distinct prototypes for different interpretations.
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Ability to attract and retain followers by virtue of personal characteristics - not traditional or political office (Weber ‘47) What makes an individual.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Detecting missrecognitions Predicting with prosody.
1 error handling – Higgins / Galatea Dialogs on Dialogs Group July 2005.
Sound and Speech. The vocal tract Figures from Graddol et al.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
1 Back Channel Communication Antoine Raux Dialogs on Dialogs 02/25/2005.
Topics = Domain-Specific Concepts Online Physics Encyclopedia ‘Eric Weisstein's World of Physics’ Contains total 3040 terms including multi-word concepts.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Lecture 6 The Intonation Phonology Suprasegmental phonology Intonation
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.
Breathing and speech planning in turn-taking Francisco Torreira Sara Bögels Stephen Levinson Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Collaborative Research: Monitoring Student State in Tutorial Spoken Dialogue Diane Litman Computer Science Department and Learning Research and Development.
1 Natural Language Processing Lecture Notes 14 Chapter 19.
1 Computation Approaches to Emotional Speech Julia Hirschberg
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.
Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.
Dialogue Act Tagging Discourse and Dialogue CMSC November 4, 2004.
Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Intonation Lecture 11.
User Responses to Prosodic Variation in Fragmentary Grounding Utterances in Dialog Gabriel Skantze, David House & Jens Edlund.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Activity-based dialogue analysis as evaluation method Bilyana Martinovska Ashish Vaswani Institute Creative Technology University of Southern California.
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
INTONATION And IT’S FUNCTIONS
Investigating Pitch Accent Recognition in Non-native Speech
Towards Emotion Prediction in Spoken Tutoring Dialogues
Dialogue-Learning Correlations in Spoken Dialogue Tutoring
Recognizing Structure: Dialogue Acts and Segmentation
Studying Intonation Julia Hirschberg CS /21/2018.
Meanings of Intonational Contours
Studying Intonation Julia Hirschberg CS /21/2018.
Spoken Dialogue Systems
Intonational and Its Meanings
Intonational and Its Meanings
Dialogue Acts Julia Hirschberg CS /18/2018.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Information Structure and Prosody
Meanings of Intonational Contours
Turn-taking and Disfluencies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Fadi Biadsy. , Andrew Rosenberg. , Rolf Carlson†, Julia Hirschberg
High Frequency Word Entrainment in Spoken Dialogue
Spoken Dialogue Systems
Comparative Studies Avesani et al 1995; Hirschberg&Avesani 1997
Recognizing Structure: Dialogue Acts and Segmentation
Low Level Cues to Emotion
Automatic Prosodic Event Detection
Presentation transcript:

A Question of Questions: Prosodic Cues to Question Form and Function Julia Hirschberg (Joint work with) Jennifer Venditti and Jackson Liscombe

Questioning in Dialogue A fundamental activity in conversation Elicit information Elicit action But How to define a question? Bolinger ’57: “fundamentally an attitude…an utterance that ‘craves’ a verbal or other semiotic … response” Ginzburg & Sag ‘00: “the semantic object associated with the attitude of wondering and the speech act of questioning” How to identify a question as such How to represent its semantics? The intention of the questioner?

Distinguishing Question Form and Function Questions may take many syntactic forms Is it a question? What is a question? It’s a question, isn’t it? Is it a question or an answer? Right? It’s a question? Questions may serve many pragmatic functions Clarification-seeking? Information-seeking? Confirmation- seeking? Possible Indicators Syntactic cues Context Intonation

Questions in Spoken Dialogue Systems Goals Examine question form and function How are they related? What features characterize them? Identify form and function automatically in an Intelligent Tutoring domain

Previous Studies Integration of prosodic tree model with language model based on words yields best performance accuracy in detecting questions/question form (Shriberg et al.’98: English) Some corpus-based (MapTask) studies have examined tune/accent types wrt. question function (Kowtko’96: Glaswegian English; Grice et al.’95: German, Italian, Bulgarian) Studies of different types (functions) of clarification questions (Rodríguez & Schlangen’94: German; Edlund et al.’95: Swedish) Our goal: a comprehensive quantitative analysis of question form and function in English which will permit question form/function identification

Domain: Intelligent Tutoring Systems ITSs must be able to recognize both the form and function of student questions Students ask human tutors many questions More questions  better learning Different question FORMs seek different information e.g. polar questions seek yes-no answer wh-questions seek different information Different question FUNCTIONs also often require different types of answers

Wh-questions, e.g. Information-seeking: (S has just submitted an essay to the tutor) S: Ok, what do you think about that? T: Uh, well that uh you have uh there are too many parameters here which uh need definition... Clarification-seeking: T: So if there is if the only force on an object in earth’s gravity then what is its motion called? S: What was the motion called? T: Yes, what’s the name for this motion?

Yes-no questions, e.g. Information-seeking  tutor provides additional information Clarification  clarification subdialogue Successful ITSs must be able to recognize the presence of a question in a student turn and its form and function

Question Corpus Human-human tutoring dialogs collected by Litman et al.’04 for development of ITSpoke, a speech-enabled ITS designed to teach physics Why2-Atlas ( Kurt VanLehn (U. Pitt), Art Graesser (U. Memphis)) Corpus includes 1030 student questions ‘Question’ defined a la Bolinger ‘57 as “an utterance that craves a response” 25.2 Qs/hour 13.3% of total student speaking time This study: a subset of 643 tokens

[pr01_sess00_prob58]

Question Detection what symbol are you talking about do i have to rewrite this again am i ok with that so it’d be one meter per second squared

Coding question type Form coding based on surface syntax Declarative question (dQ): It’s a vector? A vector? Yes-no question (ynQ): Is it a vector? Wh-question (whQ): What is a vector? Tag question (ynTAG): It’s a vector, isn’t it? Alternative question (altQ): Is it a vector or a scalar? Particle (part): Huh? Function coding derived from Stenström ‘84 Confirmation-seeking check question (chk) Clarification-seeking question (clar) Information-seeking question (info) Other (oth)

Form/Function Distribution chkclarinfoothN (%) dQ (53.5) ynQ (25.7) whQ (10.6) ynTAG (7.2) altQ (1.9) part-8--8 (1.2) N (%)(55.5)(35.1)(7.9)(1.4)(100)

Falling (L-L%) F0 contours chkclarinfoothN (%) dQ34--7 (2.0) ynQ (6.7) whQ (42.6) ynTAG11--2 (4.3) altQ251-8 (66.7) part----- N (%)(1.7)(11.5)(45.1)(22.2)(100)

F0 measures of non-falling questions Quantitative analysis of F0 height in the 573 non-falling tokens w/sufficient data for analysis Examined question nucleus (nucF0) and tail (btF0) only Speaker-normalized (z-score) F0 of: 1. nuclear accent (nucF0) 2. rightmost edge of question (btF0) 3. difference between 1 & 2 (riserange)

Question Form and F0 DeclQs and YNQs both thought to rise (H*H- H% vs. L*H-H%?): Are there F0 height differences between them? 2-way ANOVA on form x function: FORM:nucF0: F(5)=19.34, p=0 btF0: F(5)10.71, p=0 riserange: F(5)=3.6, p<.01 Planned comparisons (Tukey, alpha=.01) show no difference between declarative Qs and yes-no Qs Main effect of form caused by yes-no tags (low F0) and particles (high F0)

Normalized means at nucF0 and btF0 chkclarinfochkclarinfo

Question Function and F0 Question dialog acts thought to correlate with F0: Does question FUNCTION affect F0? 2-way ANOVA on form x function: FUNCTION: nucF0: F(3)=16.6, p=0 btF0: F(3)=8.56, p<.001 riserange: F(3)=3.94, p<.01 Main effect; planned comparisons show: clarQ > chkQ (nucF0 & btF0) infoQ > clarQ/chkQ (nucF0) No interactions for any measure

Clarification types and F0 1Channel: Problem hearing if the tutor actually said something or not (Huh?, Hm?) 2Perception: Problem hearing what the tutor said (‘G’ as in God?, Did you say a word or a letter?, including reprise/echo questions (A what?) 3Understanding: Problem with reference resolution (This up here?, What did I imply or what does the statement imply?), or with general understanding (Is that the same thing or is that different?, What do you mean?) 4Intention: Problem determining what the tutor intended by his utterance (You want an exact number?, Uh are you asking me another characteristic of freefall?) +Non-interlocutor-related (NIR): Problem understanding the task (Am I supposed to speak this or type it?), or clarification of the examination question (Should I assume both vehicles are going at the same speed?) Clark ‘96 levels of coordination: sources of communication problems

Effects of Clarification Type One-way ANOVA combining levels 1&2 into single acoustic/perceptual category: nucF0: F(3)=5.41, p=.001 btF0: F(3)=6.6, p<.001 riserange: F(3)=2.59, p=.05 Main effect for clarification type Ranking for each measure: higher F0 > > > > > > > > > > > > > > > lower F0 acoust/percept > understanding > NIR > intention Planned comparisons (Tukey, alpha=.01) show only significant comparison was acoust/percep > intention

Can Prosody Distinguish Question Form? Question Function? Only a few question forms prosodically distinct in our study – lexico/syntactic information can help Question function more successfully differentiated prosodically – where there is less reliable lexico/syntactic information Can we use prosodic information with lexico- syntactic information to help identify question form and function automatically?

Detecting Student Questions Syntax Wh-words, subject/auxiliary inversion Prosody Phrase-final rising intonation (Pierrehumbert & Hirschberg ‘90) Duration and pausing (Shriberg et al. ‘98) Lexico-pragmatics personal pronouns, utterance-initial pronouns (Geluykens 1987; Beun 1990)

Corpus 141 ITSpoke dialogues 5 hours of student speech Student turns average 2.5 seconds 1,030 questions 25 questions per hour 70% of turns consist entirely of the question 89% of questions are turn-final

Question Form Distribution in ITSpoke FormExampleDistr. yes/noIs that right?24% wh-What do you mean?10% yes/no tagIt will stay the same, right?7% alternativeForce or something?3% particleHuh?2% declarativeThe weight?54%

Question-Bearing Turns Contain one or more questions N = 918

Features Extracted Prosodic pitch loudness pausing speaking rate calculated over entire turn and last 200 ms Syntactic unigram and bigram part-of-speech tags

Feature Extraction Lexical unigram and bigram hand-labeled transcriptions Student and task dependent pre-test score gender correctness previous tutor dialogue act

Machine Learning Experiments Question-bearing vs. non-question-bearing Down-sampled to 50/50 distribution Experimented by feature type Adaboosted C4.5 decision trees 5-fold cross validation Best results with all features Accuracy = 79.7% Precision = Recall = F-measure = 0.8

Accuracy by Feature Type prosody: pausing and speaking rate52.6% student and task dependent56.1% prosody: loudness61.8% syntactic65.3% lexical67.2% prosody: last 200 ms70.3% prosody: pitch72.6% prosody: all74.5%

Feature Type Discussion Which features most informative? pitch slope of last 200 ms and entire turn maximum and mean pitch of turn Which features most often used in learning? pre-test score slope of last 200 ms maximum pitch of entire turn cumulative pause duration

Other Observations Syntactic features were informative personal pronoun + verb, wh-pronoun, interjection Lexical features were informative yes, right, what, I, you

Conclusions Most questions in our tutoring corpus are declarative in form More than syntax is needed to identify these as questions Prosodic features are very important Detecting question-bearing turns is possible Detecting question function is needed

Question Forms in ITSpoke FormDistr.Example declarative54%The weight? yes/no24%Is that right? wh-10%What do you mean? yes/no tag7%It will stay the same, right? alternative3%Force or something? particle2%Huh?