Presented By: Karan Parikh Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent.

Slides:



Advertisements
Similar presentations
The Sociometer: A Wearable Device for Understanding Human Networks Tanzeem Choudhury and Alex Pentland MIT Media Laboratory.
Advertisements

Acoustic/Prosodic Features
Acoustic Characteristics of Consonants
Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Acoustic Characteristics of Vowels
Abstract Binaural microphones were utilised to detect phonation in a human subject (figure 1). This detection was used to cut the audio waveform in two.
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
A Robust Algorithm for Pitch Tracking David Talkin Hsiao-Tsung Hung.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
The Human Voice. I. Speech production 1. The vocal organs
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Xkl: A Tool For Speech Analysis Eric Truslow Adviser: Helen Hanson.
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Creating Dynamic Social Network Models from Sensor Data Tanzeem Choudhury Intel Research / Affiliate Faculty CSE Dieter Fox Henry Kautz CSE James Kitts.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Agenda for January 25 th Administrative Items/Announcements Attendance Handouts: course enrollment, RPP instructions Course packs available for sale in.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Voice. Anatomy ApeHuman Greater risk of choking in exchange for speaking.
Why is ASR Hard? Natural speech is continuous
Representing Acoustic Information
Harmonics, Timbre & The Frequency Domain
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Source/Filter Theory and Vowels February 4, 2010.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
LE 460 L Acoustics and Experimental Phonetics L-13
Digital Audio Watermarking: Properties, characteristics of audio signals, and measuring the performance of a watermarking system نيما خادمي کلانتري
Lecture 1 Signals in the Time and Frequency Domains
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
Technology to support psychosocial self-management Kurt L. Johnson, Ph.D. Henry Kautz, Ph.D.
Irfan Essa, Alex Pentland Facial Expression Recognition using a Dynamic Model and Motion Energy (a review by Paul Fitzpatrick for 6.892)
Analysis of Temporal Lobe Paroxysmal Events Using Independent Component Analysis Jonathan J. Halford MD Department of Neuroscience, Medical University.
ECEN 621, Prof. Xi Zhang ECEN “ Mobile Wireless Networking ” Course Materials: Papers, Reference Texts: Bertsekas/Gallager, Stuber, Stallings,
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Nemesysco’s SCA1 Automated “Emotional content” & Veracity call analyzer for multi- channel recording systems Targeted at government intelligence and Law.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.
Regression MBA/510 Week 5. Objectives Describe the use of correlation in making business decisions Apply linear regression and correlation analysis. Interpret.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
The Sociometer: A Wearable Device for Understanding Human Networks
Assessment of voice and Resonance. Classification Organic disorders –known physical cause –Includes neurological disorders Functional disorders – no known.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
More On Linear Predictive Analysis
Predicting Voice Elicited Emotions
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Topic: Pitch Extraction
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
HOW WE TRANSMIT SOUNDS? Media and communication 김경은 김다솜 고우.
The Human Voice. 1. The vocal organs
Talking with computers
PCB 3043L - General Ecology Data Analysis.
The Human Voice. 1. The vocal organs
The Production of Speech
EE513 Audio Signals and Systems
CSE 313 Data Communication
Automatic Prosodic Event Detection
Presentation transcript:

Presented By: Karan Parikh Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent Embedded Systems

Objective Difficulties faced in achieving the objectives Dataset Formation Privacy Sensitive Speech Processing Analysis Application References Overview

An automated approach for studying fine grained details of social interaction and relationship Local behavior and global structure of the group to be analyzed Conversation characteristics of a group of 24 people over 6 months are analyzed to study the relationship between conversational dynamics and network position. Objective

Difficulties Requires more data than just pure audio recording Risk of using audio of uninvolved parties, unethical and illegal Risk of people changing behavior if they know that they are being recorded

Dataset Formation Data collected from a group of 24 Grad students attending a class. All of them carried a MSB(Multi Sensor Board) consisting of tri-axial accelerometer, barometer, microphone, digital compass. The MSB was connected to the PDA carried in bag. Data was collected over the period of 9 months Subjects also submitted reports about the conversations with other subjects for the same month.

Privacy Sensitive Speech processing Speech can be modeled in 2 separate components: Sound generated from the Vocal Chords Filter (mouth, nose, tongue) Voice can be of 2 types: Voiced, of fundamental frequency Unvoiced with no fundamental frequency. Intonation, stress and duration are described by the changes in the fundamental frequency and the energy change during the speech

Privacy Sensitive Speech processing Resonant peaks of the frequency response(formants) contain information about the phonemes, which form the basis for words. Minimum 3 formants required to reconstruct the words. Once they are removed/not recorded, the word cannot be reconstructed. Features that are useful for extracting information from the recorded speech are Non-initial maximum autocorrelation peak Number of such peaks Spectral relative entropy The speech is measured in a frames of 60Hz, known as voicing frames For each frame, the relative entropy is calculated between the normalized power spectrum of the current voicing frame and a normalized running average of the power spectra of the last 500 voicing frames. The accuracy for detection the conversation ranges from 96.1% to 99.2%

Privacy Sensitive Speech processing A subject is marked active for the preceding 20 seconds and the following 1 minute If the subject is not marked active, then the person is removed from the conversation. This feature is heuristic only to prevent the false triggering of conversation recording.

Analysis A network is constructed based on the face-face interaction An edge is created when there is conversation of 20 minutes or more between 2 subjects. Thus a network is formed using multiple edges. We then examine whether the network thus formed corresponds to the social relations from the survey and the feedback given by the subject. The average aggregate to agreement with the network came to 71.3%.

Analysis Correlation between speaking style and strength of lies Assumption: People’s normal behavior is based more on the regular interaction partners than by rare partners Hypothesis : people change their way of speaking less when interacting in their strong ties. Time spent in conversation by persons I and j estimated to test the hypothesis. 4 features that are to be measured of subjects speaking style: Rate Pitch Turn frequency Turn Length

Analysis b i/j – Mean of i’s speech feature while talking to everyone except j b i->j - Mean of i’s speech feature while talking j s i – Standard Deviation of i’s feature d ij = |b i/j - b i->j |/s i, the amount of feature, that i’s speech changes, when in conversation with j. c ij = time spent by i and j in conversation. Correlation between c ij and d ij can be measured out. The negative correlation : the more the people talk with each other, the less they change their speaking style. The table listed below supports the above mentioned hypothesis:

Analysis Correlation between change in speech features and tie strength

Analysis Correlation between speaking style and strength of lies Assumption: People’s normal behavior is based more on the regular interaction partners than by rare partners Hypothesis : people change their way of speaking more when they are speaking with a person who is more central to the network. The length of the edge for i->j is calculated using (1 – c ij ) where cij is that time spent in conversation between I and j. The centrality of the person is calculated by calculating the multiplicative inverse of the mean distance of all the points from i. For no conversation, the edge length will be infinite and for longer conversation, the length will be shorter. dik is the change in the feature of all k who speak with i. The higher the incoming mean, the more the people change their style talking to i. Thus dik is correlated with the centrality of the subject.

Analysis Correlation between change in speech features and tie strength

More than sociological applications More than just the identity of the call, the content of the conversation can be noted and the interruptibility can be predicted. Can be used in the mobile phones, to divert the incoming calls or messages to the voic during important conversation Can used to study difference in conversational characteristics of male/female, people across different geographical regions. Application

Questions?

[Donovan, 1996] Danny Wyatt, Tanzeem Choudhury, Henry Kautz,” Conversation Detection and Speaker Segmentation in Privacy-Sensitive situated speech data “ Jeff Bilmes, Danny Wyatt, Tanzeem Choudhury, Henry Kautz “ References