Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?

Slides:



Advertisements
Similar presentations
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Advertisements

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
LIST QUESTIONS – COMPARISONS BETWEEN MODES AND WAVES Making Connections is a study of ten disadvantaged US urban communities, funded by the Annie E. Casey.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Presented By: Karan Parikh Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent.
Advisor: Prof. Tony Jebara
© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 14.
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
AQUAINT Herbert Gish and Owen Kimball June 11, 2002 Answer Spotting.
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
Prediction of Influencers from Word Use Chan Shing Hei.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
1.  Interpretation refers to the task of drawing inferences from the collected facts after an analytical and/or experimental study.  The task of interpretation.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
National Taiwan University, Taiwan
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
Selecting Good Expansion Terms for Pseudo-Relevance Feedback Guihong Cao, Jian-Yun Nie, Jianfeng Gao, Stephen Robertson 2008 SIGIR reporter: Chen, Yi-wen.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
1 Kuo-hsien Su, National Taiwan University Nan Lin, Academia Sinica and Duke University Measurement of Social Capital: Recall Errors and Bias Estimations.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data Shivesh Ranjan, Gang Liu and John H. L. Hansen.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of new variational inference.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Main effect of “you” category words, F(2, 333)= 24.52, p
Introduction to Survey Research
Sentiment analysis algorithms and applications: A survey
University of Rochester
What is the purpose of the IELTS test?
Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2
Automatic Fluency Assessment
Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov
Supervised vs. unsupervised Learning
A maximum likelihood estimation and training on the fly approach
Presentation transcript:

Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?

OUTLINE  Abstract  Introduction  Corpus  Automatic Speech Recognition  Nature of Telephone Conversations  Supervised Classification of Types of Social Relationships  Conclusions

Abstract  Aims to infer the social nature of conversation from their content automatically  This motivation is stem from the need to understand how social disengagement affects cognitive decline or depression among older adult.

Abstract  First step: Learned a binary classifier to filter out business related conversation.

Abstract  For classifying different type, we investigated feature related to:  Language use (entropy)  Hand-craft dictionary (LIWC)  Latent Dirichlet models (LDA, unsupervised)

Introduction  This paper focus on understanding the social interaction of an individual over a period of time.

Introduction  Telephone conversation present several advantages for analysis  Solely to an audio channel, without recourse to gestures or facial expression.  Simplify both collection and analysis.  Transcribe using automatic speech recognition (ASR) systems.

Introduction  our first task was to classify social and non-social conversations and reverse listing was useful to a certain extent.

Introduction  This study focus on using the resulting classifier as a tool to probe the nature of telephone conversations as well as to test whether the scores obtained from it can serve as a proxy for degree of social familiarity.

Corpus  Corpus consists of lane-line telephone conversation  10 volunteers, 79years or older  Approximately 12 month  Native English speaker form USA

Corpus  Corpus includes meta-data such as  Call direction (incoming or outgoing)  Time of call  Time of duration  DTMF/caller ID (if available)

Corpus  For each subject, twenty telephone numbers were corresponding to  Top ten most frequent call  Top ten longest call

Corpus  Subject were asked to identify relationship at these calls as  Immediate family  Near relatives  Close friends  Casual friends  Strangers  Business

Corpus  Of the 8,558 available conversations, 2,728 were identified as residential conversations and 1,095 were identified as business conversations using reverse listings from multiple sources  phone directory lookup  exit interviews  internet lookup.

Automatic Speech Recognition  The acoustic models were trained on about 2000 hours of telephone speech from Switchboard and Fisher corpora.

Automatic Speech Recognition  Decoding is performed in three stages  speaker-independent models  vocal-tract normalized models  speaker-adapted models

Nature of Telephone Conversations  Classification Experiments  Can the Scores of the Binary Classifier Differentiate Types of Social Relationship?  How Informative are Openings and Closings in Differentiating Telephone Conversations?  Why are Short Conversations difficult to Classify?  Can Openings Help Predict Relative Lengths of Conversations?

Classification Experiments  We assume that the majority of our training set reflect the true nature of the conversations.  and expect to employ the classifier subsequently for correcting the errors arising when personal conversations occur on business lines.

Classification Experiments  We learned a baseline SVM classifier using a balanced training set.  Found that RBF kernel is most effective, accuracy achieve 87.5% on verification data.

Can the Scores of the Binary Classifier Differentiate Types of Social Relationship?  We computed SVM score statistics for all conversations with family and friends  And for comparison, we also computed the statistics for all conversations automatically tagged as residential as well as all conversations in the data.

How Informative are Openings and Closings in Differentiating Telephone Conversations?  The structure of openings facilitate establishing identity of the conversants and the purpose of their call.  Closings in personal conversations are likely to include a pre-closing signal that allows either party to mention any unmentioned mentionables before conversation ends.

Why are Short Conversations difficult to Classify?  The results from the above section appear to contradict that shorter conversations suffer from poor classification, as a 30-word window can give very good performance.

Why are Short Conversations difficult to Classify?  These observations suggest that the long and short conversations are inherently different in nature, at least in their openings.

Can Openings Help Predict Relative Lengths of Conversations?  It is natural to ask whether openings can predict relative lengths of conversations.

Can Openings Help Predict Relative Lengths of Conversations?  Features from very short conversations may contain both openings and closings.  both a hello and a goodbye.

Supervised Classification of Types of Social Relationships  We investigate the performance of classifiers to differentiate the following binary classes.  Residential vs business  Family vs all other  Family vs other residential  Familiar vs non-familiar

Supervised Classification of Types of Social Relationships

Lexical Statistics  Speakers who share close social ties are likely to engage in conversations on a wide variety of topics and this is likely to reflect in the entropy of their language use.

Lexical Statistics  It is shown that people talk longer, more rapidly and have wider range of language use when conversing with a familiar contact and/or family member.

Lexical Statistics  Only the speaking rate showed significant differences among the residential/business categories, with business conversations being conducted at a slower pace at least for the elderly demographic in our corpus.

Linguistic inquiry and Word Count  The categories have significant overlap and a given word can map to zero or more categories.

Linguistic inquiry and Word Count  The clear benefit of LIWC is that the word categories have very clear and pre-labeled meanings.

Latent Dirichlet allocation  Unsupervised clustering and feature selection can make use of data for which we have no labels.

Latent Dirichlet allocation  LDA models a conversation as a bag of words.  Experimentally, we found best cross-validation results were obtained when α and β were set to 0.01 and 0.1 respectively.

Latent Dirichlet allocation  Most interesting, when the number of clusters are reduced to two, the LDA model managed to segment residential and business conversations with relatively high accuracy (80%).

Classifying Types of Social Relationships  Before performing classification, we produce balanced datasets that have equal numbers of conversations for each category.

Conclusions  We show that the business calls can be separated from social calls with accuracies as high as 85%.

Conclusions  When compared to language use (entropy)and hand-crafted dictionaries (LIWC), posteriors over topics computed using a latent Dirichlet model provide superior performance.

Conclusions  openings of conversations were found to be more informative in classifying conversation than closings or random segments, when using automated transcripts.

Conclusions  In future work, we plan to examine subject specific language use, turn taking and affect to further improve the classification of social calls.