Creating Dynamic Social Network Models from Sensor Data Tanzeem Choudhury Intel Research / Affiliate Faculty CSE Dieter Fox Henry Kautz CSE James Kitts.

Slides:



Advertisements
Similar presentations
The Sociometer: A Wearable Device for Understanding Human Networks Tanzeem Choudhury and Alex Pentland MIT Media Laboratory.
Advertisements

Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
By Venkata Sai Pulluri ( ) Narendra Muppavarapu ( )
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Strong and Weak Ties Chapter 3, from D. Easley and J. Kleinberg book.
Modeling Malware Spreading Dynamics Michele Garetto (Politecnico di Torino – Italy) Weibo Gong (University of Massachusetts – Amherst – MA) Don Towsley.
SOCI 5013: Advanced Social Research: Network Analysis Spring 2004.
Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks Sergio Escalera, Petia Radeva, Jordi Vitrià, Xavier Barò and Bogdan Raducanu.
DARWIN PHONES: THE EVOLUTION OF SENSING AND INFERENCE ON MOBILE PHONES PRESENTED BY: BRANDON OCHS Emiliano Miluzzo, Cory T. Cornelius, Ashwin Ramaswamy,
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Consistency of Assessment
1 Vertically Integrated Seismic Analysis Stuart Russell Computer Science Division, UC Berkeley Nimar Arora, Erik Sudderth, Nick Hay.
Forecasting Presence and Availability Joe Tullio CS8803.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Presented By: Karan Parikh Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent.
A Practical Approach to Recognizing Physical Activities Jonathan Lester Tanzeem Choudhury Gaetano Borriello.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Advanced Topics in Data Mining Special focus: Social Networks.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
A PRESENTATION BY SHAMALEE DESHPANDE
Natural Language Understanding
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Introduction to Automatic Speech Recognition
LE 460 L Acoustics and Experimental Phonetics L-13
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
Technology to support psychosocial self-management Kurt L. Johnson, Ph.D. Henry Kautz, Ph.D.
Multimodal Interaction Dr. Mike Spann
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Multimodal Information Analysis for Emotion Recognition
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
Susan O’Shea The Mitchell Centre for Social Network Analysis CCSR/Social Statistics, University of Manchester
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
The Sociometer: A Wearable Device for Understanding Human Networks
AUTOMATIC TARGET RECOGNITION AND DATA FUSION March 9 th, 2004 Bala Lakshminarayanan.
Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao, Tanzeem Choudhury †, Dieter Fox, and Henry Kautz University of Washington.
Network Community Behavior to Infer Human Activities.
+ Big Data, Network Analysis Week How is date being used Predict Presidential Election - Nate Silver –
Performance Comparison of Speaker and Emotion Recognition
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
CCT 333: Imagining the Audience in a Wired World Class 6: Intro to Research Methods – Qualitative Methods.
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
Applications of graph theory in complex systems research
Statistical Models for Automatic Speech Recognition
EE513 Audio Signals and Systems
CS 594: Empirical Methods in HCC Social Network Analysis in HCI
Audio and Speech Computers & New Media.
Digital Systems: Hardware Organization and Design
Presented by Chen-Wei Liu
Presentation transcript:

Creating Dynamic Social Network Models from Sensor Data Tanzeem Choudhury Intel Research / Affiliate Faculty CSE Dieter Fox Henry Kautz CSE James Kitts Sociology

What are we doing? Why are we doing it? How are we doing it?

Social Network Analysis Work across the social & physical sciences is increasingly studying the structure of human interaction o 1967 – Stanley Milgram – 6 degrees of separation o 1973 – Mark Granovetter – strength of weak ties o 1977 –International Network for Social Network Analysis o 1992 – Ronald Burt – structural holes: the social structure of competition o 1998 – Watts & Strogatz – small world graphs

Social Networks Social networks are naturally represented and analyzed as graphs

Example Network Properties Degree of a node Eigenvector centrality o global importance of a node Average clustering coefficient o degree to which graph decomposes into cliques Structural holes o opportunities for gain by bridging disconnected subgraphs

Applications Many practical applications o Business – discovering organizational bottlenecks o Health – modeling spread of communicable diseases o Architecture & urban planning – designing spaces that support human interaction o Education – understanding impact of peer group on educational advancement Much recent theory on finding random graph models that fit empirical data

The Data Problem Traditionally data comes from manual surveys of people’s recollections o Very hard to gather o Questionable accuracy o Few published data sets o Almost no longitudinal (dynamic) data 1990’s – social network studies based on electronic communication

Social Network Analysis of Science, 6 Jan 2006

Limits of E-Data data is cheap and accurate, but misses o Face-to-face speech – the vast majority of human interaction, especially complex communication o The physical context of communication – useless for studying the relationship between environment and interaction Can we gather data on face to face communication automatically?

Research Goal Demonstrate that we can… Model social network dynamics by gathering large amounts of rich face-to-face interaction data automatically o using wearable sensors o combined with statistical machine learning techniques Find simple and robust measures derived from sensor data o that are indicative of people’s roles and relationships o that capture the connections between physical environment and network dynamics

Questions we want to investigate: Changes in social networks over time: o How do interaction patterns dynamically relate to structural position in the network? o Why do people sharing relationships tend to be similar? o Can one predict formation or break-up of communities? Effect of location on social networks o What are the spatio-temporal distributions of interactions? o How do locations serve as hubs and bridges? o Can we predict the popularity of a particular location?

Other Applications of such Data Research on emotional content of speech o Need for “natural” data Medical applications o Speaking rate is an indicator of mental activity o Overly-rapid speech symptom of mania o Asperger’s syndrome: abnormal conversational dynamics Meeting understanding o Interruptions indicate status & dominance

Support Human and Social Dynamics – one of five new priority areas for NSF o $800K award to UW / Intel / Georgia Tech team o Intel at no-cost Intel Research donating hardware and internships Leveraging work on sensors & localization from other NSF & DARPA projects

Procedure Test group o 32 first-year incoming CSE graduate students o Units worn 5 working days each month o Collect data over one year Units record o Wi-Fi signal strength, to determine location o Audio features adequate to determine when conversation is occurring Subjects answer short monthly survey o Selective ground truth on # of interactions o Research interests All data stored securely o Indexed by code number assigned to each subject

Privacy UW Human Subjects Division approved procedures after 6 months of review and revisions Major concern was privacy, addressed by o Procedure for recording audio features without recording conversational content o Procedures for handling data afterwards

Data Collection Intel Multi-Modal Sensor Board Real-time audio feature extraction audio features WiFi strength Coded Database code identifier

Recording Units

Data Collection Multi-sensor board sends sensor data stream to iPAQ iPAQ computes audio features and WiFi node identifiers and signal strength iPAQ writes audio and WiFi features to SD card Each day, subject uploads data using his or her code number to the coded data base

Speech Detection From the audio signal, we want to extract features that can be used to determine o Speech segments o Number of different participants (but not identity of participants) o Turn-taking style o Rate of conversation (fast versus slow speech) But the features must not allow the audio to be reconstructed!

Speech Production vocal tract filter Fundamental frequency (F0/pitch) and formant frequencies (F1, F2 …) are the most important components for speech synthesis The source-filter Model

Speech Production Voiced sounds: Fundamental frequency (i.e. harmonic structure) and energy in lower frequency component Un-voiced sounds: No fundamental frequency and energy focused in higher frequencies Our approach: Detect speech by reliably detecting voiced regions We do not extract or store any formant information. At least three formants are required to produce intelligible speech* * 1. Donovan, R. (1996). Trainable Speech Synthesis. PhD Thesis. Cambridge University 2. O’Saughnessy, D. (1987). Speech Communication – Human and Machine, Addison-Wesley.

Goal: Reliably Detect Voiced Chunks in Audio Stream

Speech Features Computed 1.Spectral entropy 2.Relative spectral entropy 3.Total energy 4.Energy below 2kHz (low frequencies) 5.Autocorrelation peak values and number of peaks 6.High order MEL frequency cepstral coefficients

Features used: Autocorrelation Autocorrelation of (a) un-voiced frame and (b) voiced frame. Voiced chunks have higher non-initial autocorrelation peak and fewer number of peaks (a)(b)

Features used: Spectral Entropy Spectral entropy: 3.74 Spectral entropy: 4.21 FFT magnitude of (a) un-voiced frame and (b) voiced frame. Voiced chunks have lower entropy than un-voiced chunks, because voiced chunks have more structure

Features used: Energy Energy in voiced chunks is concentrated in the lower frequencies Higher order MEL cepstral coefficients contain pitch (F0) information. The lower order coefficients are NOT stored

Segmenting Speech Regions

Multi-Person Conversation Model Group State G t Who is holding the floor (main speaker) 1-N: instrumented subjects N+1: silence N+2: any unmiked speaker

Multi-Person Conversation Model Individual State M i t True if subject i is speaking P(M|G) set so as to disfavor people talking simultaneously U true if unmiked subject speaking

Multi-Person Conversation Model Voicing States V i t True if sound from mike i is a human voice P(V i t | M i t ) = 1 P(V i t | not M i t ) = 0.5 A V t is logical OR of voicing nodes

Multi-Person Conversation Model Observations O i t Acoustic features from mike i that are useful for detecting speech P(O|V) is a 3D Gaussian with covariance matrix, learned from speaker- independent data

Multi-Person Conversation Model Energy E i,j t 2D variable containing log energies of mikes i and j Associates voiced regions with speaker If i talks at t, then energy of mike i should be higher than mike j

Determining Miked Speaker

Multi-Person Conversation Model Entropy H e t Entropy of the log energy distribution across all N microphones When an unmiked subject speaks, entropy across microphones will be low

Determining Unmiked Speaker

Results

Results

Analyzing Results of DBN Inference Compute # of conversations between subjects Create weighted graph Visualize with multi-dimensional scaling

Modeling Influence Goal: model influence of subject j on subject i’s conversational style Formally: o P(Si,t | Si,t-1) = self transition probability (probability of continuing to speak or remain silent) o Question: for a particular conversation, how much of P(Si,t | Si,t-1, Sj,t-1) is explained by P(Sj,t | Sj,t-1)? o Create mixed-memory Markov chain model, infer parameters;

Influence

GISTS Inferring what a conversation is about (“gist”) Apply speech recognition Use OpenMind commonsense knowledge database to associate words with classes of events (“buying lunch”) Use simple Naïve Bayes “bag of words” to infer gist and select key words Improve by conditioning on location

Example

Next Step: Locations Wi-Fi signal strength can be used to determine the approximate location of each speech event o 5 meter accuracy o Location computation done off-line Raw locations are converted to nodes in a coarse topological map before further analysis

Topological Location Map Nodes in map are identified by area types o Hallway o Breakout area o Meeting room o Faculty office o Student office Detected conversations are associated with their area type

Goal: Social Network Model Goal: Dynamic Social Network Model o People, Places, Conversations, Time o Nodes o Subjects (wearing sensors, have given consent) o Places (e.g., particular break out area) o Instances of conversations o Edges o Between subjects and conversations o Between places and conversations o Replicate over data collection sessions (as in a DBN) o Compute influences between sessions: E.g., if A-B and B-C are strong a t, then A-C is likely to be strong at t+1