Presentation is loading. Please wait.

Presentation is loading. Please wait.

Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.

Similar presentations


Presentation on theme: "Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour."— Presentation transcript:

1 Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour (e.g. captures the big changes in uttering the word “great.”) Word Level Previous work uses mostly features computed over the entire turn. Efficient but offers a coarse approximation of the pitch contour. Turn Level Problems classifying the overall turn emotion (One prediction) Turn-level is simple: Labeling granularity = turn One set of features per turn Example student turn: “The force of the truck” Turn-level speech: Overall turn prediction: (One set) Uncertain Turn-level feature set: “t he force of the truck” …… extract predict The force of the truck Word-level is more complicated: Label granularity mismatch: label at turn level, features at word level Variable number of features per turn Turn-level speech: Overall turn prediction: (Five sets) Uncertain Word-level feature set: extract predict The force of the truck ………… “the”“force”“of”“the”“truck” … (One prediction) Techniques to solve this problem Technique 1: Word-level emotion model (WLEM) Train: word-level model with turn’s emotion label Predict: emotion label of each word Combine: majority voting of predictions Overall turn prediction: Non-uncertain (3/5) Word-level feature set: ………… “the”“force”“of”“the”“truck” … Non-uncertain Uncertain Non-uncertain Word-level predictions: Issues: Turn  Word level labeling assumption Majority voting is a very simple scheme predict combine Combine: Concatenate features from 3 words (first, middle, last) into a conglomerate feature set Train & predict: turn-level model with turn’s emotion Overall turn prediction: Non-uncertain Word-level feature set: … … … … “the”“force”“of”“the”“truck” … PSSU feature set: Technique 2: Predefined subset of sub-turn units (PSSU) “the”“of”“truck” … … … Predict combine Corpus ITSPOKE dialogues Domain: qualitative physics tutoring Backend: WHY2-Atlas, Sphinx2 speech recognition, Cepstral text-to-speech PreviousCurrent # of turns2209854 # of words51127548 words/turns2.322.80 Emotional classification Emotional/ Non-emotional Uncertain/ Non-uncertain Class distribution 129/91 (E/nE)2189/7665 (U/nU) Baseline58.64%77.79% [1] showed that the WLEM method works better than turn-level Used in [2] at breath-group level but not at word level Recall/Precision Overall prediction accuracy Experimental Results Future work Turn-levelWord-level (WLEM) Word-level (PSSU) 81.97 (0.09)82.53 (0.07)84.11 (0.05) Comparison of recall and precision for predicting uncertain turns Turn-level: Medium recall/precision WLEM: Best recall, lowest precision Tends to over-generalize PSSU: Good recall, best precision Much less over-generalization, overall best choice WLEM word-level slightly improves upon turn-level (+0.56%) PSSU word-level show a much better improvement (+2.14%) Overall, PSSU is best according to this metric as well Affective computing – direction for improving spoken dialogue systems Emotion detection (prediction) Emotion handling Detecting emotion: train a classifier on features extracted from user turns. Types of features: Amplitude Pitch Lexical Duration We concentrate on Pitch features to detect Uncertainty Baseline: 77.79% [1] M. Rotaru and D. Litman, "Using Word-level Pitch Features to Better Predict Student Emotions during Spoken Tutoring Dialogues," Proceedings of Interspeech, 2005. [2] J. Liscombe, J. Hirschberg, and J. J. Venditti, "Detecting Certainness in Spoken Tutorial Dialogues,” Proceedings of Interspeech, 2005. Many alterations could further improve these techniques: Annotate each individual word for certainty instead of whole turns Include other features pictured above: lexical, amplitude, etc. Try predicting in a human-human dialogue context Better combination techniques (e.g. confidence weighting) More selective choices for PSSU than the middle word of the turn (e.g. longest word in the turn, ensuring the word chosen has domain-specific content) Approximations of pitch contours ? Approximation of pitch contour Poster by Greg Nicholas. Adapted from paper by Greg Nicholas, Mihai Rotaru, & Diane Litman Issues: Might lose details from discarded words Corpus comparison with previous study [1] 1 2 34 5 6


Download ppt "Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour."

Similar presentations


Ads by Google