SocialWeaver: Collaborative Inference of Human Conversation Networks Using Smartphones Chengwen Luo and Mun Choon Chan School of Computing National University.

Slides:



Advertisements
Similar presentations
Display Power Management Policies in Practice Stephen P. Tarzia Peter A. Dinda Robert P. Dick Gokhan Memik Presented by: Andrew Hahn.
Advertisements

Pengfei Zhou, Yuanqing Zheng, Mo Li -twohsien
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
Sec-TEEN: Secure Threshold sensitive Energy Efficient sensor Network protocol Ibrahim Alkhori, Tamer Abukhalil & Abdel-shakour A. Abuznied Department of.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Activity, Audio, Indoor/Outdoor classification using cell phones Hong Lu, Xiao Zheng Emiliano Miluzzo, Nicholas Lane CS 185 Final Project presentation.
DARWIN PHONES: THE EVOLUTION OF SENSING AND INFERENCE ON MOBILE PHONES PRESENTED BY: BRANDON OCHS Emiliano Miluzzo, Cory T. Cornelius, Ashwin Ramaswamy,
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Presented By: Karan Parikh Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent.
Computer Networks: TCP Congestion Control 1 TCP Congestion Control Lecture material taken from “Computer Networks A Systems Approach”, Third Ed.,Peterson.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Speaker Adaptation for Vowel Classification
Distinguishing Photographic Images and Photorealistic Computer Graphics Using Visual Vocabulary on Local Image Edges Rong Zhang,Rand-Ding Wang, and Tian-Tsong.
Ensemble Learning: An Introduction
Experimental Evaluation
TelosCAM: Identifying Burglar Through Networked Sensor-Camera Mates with Privacy Protection Presented by Qixin Wang Shaojie Tang, Xiang-Yang Li, Haitao.
Gaussian Mixture-Sound Field Landmark Model for Robot Localization Talker: Prof. Jwu-Sheng Hu Department of Electrical and Control Engineering National.
김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated.
THE SECOND LIFE OF A SENSOR: INTEGRATING REAL-WORLD EXPERIENCE IN VIRTUAL WORLDS USING MOBILE PHONES Sherrin George & Reena Rajan.
Crowd++: Unsupervised Speaker Count with Smartphones Chenren Xu, Sugang Li, Gang Liu, Yanyong Zhang, Emiliano Miluzzo, Yih-Farn Chen, Jun Li, Bernhard.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem Choudhury and Andrew T. Campbell.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Chapter 8 Prediction Algorithms for Smart Environments
Macquarie RT05s Speaker Diarisation System Steve Cassidy Centre for Language Technology Macquarie University Sydney.
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
TEMPLATE DESIGN © Detecting User Activities Using the Accelerometer on Android Smartphones Sauvik Das, Supervisor: Adrian.
Demo. Overview Overall the project has two main goals: 1) Develop a method to use sensor data to determine behavior probability. 2) Use the behavior probability.
Snooping Keystrokes with mm-level Audio Ranging on a Single Phone
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.
Xiao Liu, Jinjun Chen, Ke Liu, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
Handover and Tracking in a Camera Network Presented by Dima Gershovich.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Nicholas D. Lane, Hong Lu, Shane B. Eisenman, and Andrew T. Campbell Presenter: Pete Clements Cooperative Techniques Supporting Sensor- based People-centric.
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
National Taiwan University, Taiwan
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Network Community Behavior to Infer Human Activities.
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Computer Science 1 Using Clustering Information for Sensor Network Localization Haowen Chan, Mark Luk, and Adrian Perrig Carnegie Mellon University
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Doc.: IEEE /2200r2 Submission July 2007 Sandesh Goel, Marvell et alSlide 1 Route Metric Proposal Date: Authors:
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 1 Now let us talk about… Neural Network Application.
Vijay Srinivasan Thomas Phan
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Chapter 10 Verification and Validation of Simulation Models
BlueScan: Boosting Wi-Fi Scanning Efficiency Using Bluetooth Radio
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Computer Vision Lecture 9: Edge Detection II
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Presentation transcript:

SocialWeaver: Collaborative Inference of Human Conversation Networks Using Smartphones Chengwen Luo and Mun Choon Chan School of Computing National University of Singapore SenSys 2013 Presenter : Woonghee Lee

 INTRODUCTION  DESIGN OF SOCIALWEAVER  OVERALL DESIGN OF SOCIALWEAVER  PROXIMITYMANAGEMENT MODULE  SPEAKER CLASSIFICATION MODULE  CONVERSATION CLUSTERING MODULE  ENERGY CONTROL  EVALUATION OF SOCIALWEAVER  CONCLUSION 2 Contents

 Understanding how people communicate with each other plays a very important role in many different research disciplines  They have designed and implemented SocialWeaver, a smartphone-based conversation sensing system which can perform conversation clustering and construct conversation networks among the users  They aim to meet the following objectives  First, the system should be able to detect simultaneous conversation groups  Second, the system should be robust and can work accurately  Third, the system should not need specialized hardware and require minimum user intervention  Finally, the system must respect user privacy 3 INTRODUCTION

 System Workflow  When a new neighbor joins the proximity group at t2, a new clustering process is triggered  Each phone re-computes the clusters based on the aggregated speaker vectors available between t1 and t2  At t4, when one neighbor leaves the proximity group, a new clustering process is again performed based on the speaker vectors available between the time interval t2 and t4 4 OVERALL DESIGN OF SOCIALWEAVER

 1. PROXIMITY MANAGEMENT  2. SPEAKER CLASSIFICATION  3. CONVERSATION CLUSTERING  4. ENERGY CONTROL 5 OVERALL DESIGN OF SOCIALWEAVER

 The basic function of the proximity management module is to decide if devices are “physically close”  They define a proximity group as the set of devices that are neighbors on the Bluetooth network  Neighbor discovery is periodically performed by each phone, and in each round the TTL will either be decreased by one or refreshed to the initial value if the corresponding device is found again  Once the TTL reaches zero, the entry will be deleted from the neighbor list and the neighbor is no longer in the proximity group  They set the RSSI threshold to -90dBm, which covers a range of up to 11m in the environments measured 6 PROXIMITYMANAGEMENT MODULE

 The speaker classification module takes the microphone signal as input and determines if the given voice segment belongs to the current phone user  The first step is to determine if an audio segment recorded is a voice segment  VAD (Voice Activity Detection) is applied to each segment of raw audio data to filter out non-voice inputs  After a voice segment is detected and extracted, a hybrid classification scheme containing two classifiers are used  To do the user/background classification for each voice segment to decide whether the voice belongs to the current phone user or other background users in the same proximity group 7 SPEAKER CLASSIFICATION MODULE

 1. Histogram-based Classifier  The assumption is that the voice of the phone owner is usually louder than the voices recorded from other users if averaged over time  Loudness Adaptation  A voice segment is classified as belonging to the phone's owner if its loudness level exceeds the threshold Thresh abs  2. Probabilistic Classifier  It is well known that the key to accurate speaker identification is to characterize the speaker using speaker-dependent features and build a discriminative model which can effectively distinguish the speaker from all other background speakers  Gaussian Mixture Model (GMM) has been widely used and proven to be effective in speaker identification systems 8 SPEAKER CLASSIFICATION MODULE

 Collaborative Learning  SocialWeaver trains two GMM models for speaker classification, one for the current phone user and one for background users  Once a voice segment is accepted by the histogram-based classifier, it becomes a candidate for the next level of validation  SocialWeaver exploits a voting mechanism among all neighbors in the proximity group to verify the validity of training samples  The request contains two parts, a timestamp of the voice segment, and loudness level of the voice segment to the phone  Since all neighbors in the same proximity group are physically close to each other in the same environment, it is unlikely that one phone belonging to the non-speaking user receives voice samples that are significantly louder than all the rest of the phones  Each phone votes based on local observation and remote loudness value  Once a phone receives positive votes from all neighbors, the requester saves the voice segment as a training sample for the phone's user, and all other phones save their local samples as background training samples 9 SPEAKER CLASSIFICATION MODULE

10 SPEAKER CLASSIFICATION MODULE

 Hybrid Speaker Classification  For each voice segment detected, the two classifiers work independently to decide if the voice belongs to the speaker or background neighbors  Each voice segment is classified as either belonging to the speaker or background users  At the end of the classification window, SocialWeaver computes the speaker score as:  where w e, w p are the weight of histogram-based classifier and probabilistic classifier respectively, and w e + w p = 1  N e, N p are the number of segments in this window marked by each classifier 11 SPEAKER CLASSIFICATION MODULE

 Hybrid Speaker Classification  SocialWeaver decides that current phone user speaks during the window if S hybrid > c h · N, where c h is the classification coefficient controlling the acceptance of voice segments  They use a classification window with size N = 15 (approximately 1 second) and c h = 0.5 to accept the current window  Once the voice segments in a window are accepted as speaker utterance, a speaker vector [T s, T e ] will be generated to indicate that the phone user speaks from time T s to T e 12 SPEAKER CLASSIFICATION MODULE

 Speaker Vector Aggregation and Sharing  A speaker vector generated by one phone represents the starting time and ending time of one speech segment from the phone owner  These speaker vectors are to be shared with other users  Local aggregation is performed to merge the adjacent speaker vectors  These aggregated vectors are broadcast to neighboring devices periodically 13 CONVERSATION CLUSTERING MODULE

14 CONVERSATION CLUSTERING MODULE

 Conversation Score  For devices 1 and 2, {p r =0.94, p c =0, p s =0.06}, and for device 1 and local user, { p r =0.25, p c =0.56, p s =0.19}  Based on the speaking pattern, it should be obvious that users 1 and 2 are much more likely to be having a conversation than user 1 and the local user 15 CONVERSATION CLUSTERING MODULE

 Conversation Clustering  They assume that  Each member in a proximity group can be involved in only one conversation group  There may exist several conversation groups within one proximity group  Each of them is made of disjoint set of users  (1) Split  The proximity group is split into disjoint initial clusters S which consist of only one single user per cluster  All nodes in the initial clusters become cluster heads for future merge operations, and each pair of cluster heads are in different conversation groups  (2) Merge  CS ij is the conversation score between user j and the initial member of the cluster  They find a clustering that maximizes the total conversation score of the system 16 CONVERSATION CLUSTERING MODULE

 (1) Split  S = {Local, 1}  (2) Merge  S’ = {(Local,3), (1,2)} 17 CONVERSATION CLUSTERING MODULE

 Partially Observable Markov Decision Process (POMDP)  At each discrete time step t, one action A t is taken, on which the state changes from S t to S t+1  State transition function T = p(S t+1 | S t, A t )  The agent receives rewards for each action performed in a state  Reward function R(S t, A t )  The goal is to find out a control policy π r that that maps current belief to actions that maximize the expectation of sum of rewards, i.e.,  where γ (0; 1) is a discount factor to ensure convergence of the model 18 ENERGY CONTROL

19 ENERGY CONTROL

 Reward  where f m = M t · V t + B t · P t and  f m and f p reflect the rewards for using high and low duty cycles respectively  c 1 and c 2 are empirically determined coefficients that adjust the weight of Bluetooth and microphone actions  When the energy left is large, f m dominates, and the phone will be encouraged to use higher duty cycling rate to increase the reward  However, when energy becomes scarce, f p becomes more important and a better policy needs to take V t into account and only use high sample rate when V t is high 20 ENERGY CONTROL

 Implementation  SocialWeaver runs as an Android service in the background and has been tested on Samsung Galaxy S2, Samsung Galaxy Nexus GT-I9250 and HTC Desire phones  SocialWeaver uses 16 kHz sampling rate for microphone and the neighbor discovery using Bluetooth is performed periodically by default  Dataset  They evaluate the overall system performance of SocialWeaver through 2 real-life user studies  The first is a controlled experiment where the interaction patterns can be easily verified  The interactions among 10 graduate students are tracked over a 5-day duration  The second is an uncontrolled where people behave naturally  This evaluation was conducted in an actual one hour class presentation where different groups of students made a 5-10 minutes presentation 21 EVALUATION OF SOCIALWEAVER

 Controlled Experiment  Over a 5 day period, 10 participants carried the phones with them when they were on campus  5 participants (ID1~ID5) belong to the same research group and work in the same lab  The other 5 participant (ID6~ID10) worked in different labs  5 participants (ID1~ID5) met on Monday morning for group discussion and 6 of the participants (ID5~ID10) are social friends and they met up for lunch and dinner every day 22 EVALUATION OF SOCIALWEAVER

 Controlled Experiment  The thickness of the edge represents the interaction level between adjacent nodes  The conversation network derived from the information collected by SocialWeaver accurately reflects the real world social connections being measured The conversation network generated by SocialWeaver 23 EVALUATION OF SOCIALWEAVER

 Uncontrolled Experiment  They measure the conversation clustering in a classroom setting  There are about 30 students in the class and 10 students participated in the experiment  During class, each group, consisting of 2 or 3 students, gave a 5 to 10 minutes presentation  There are a total of 11 groups and the 10 participants belong to 4 different groups  At the beginning of class, smartphones installed with SocialWeaver were given to the participants  They observed that participants carried the phones in different ways, some placed them on the table and the rest put the phones in shirt or pants pockets  As 3 additional phones were carried by the teaching staff and the authors, there are a total of 13 participants 24 EVALUATION OF SOCIALWEAVER

 Uncontrolled Experiment  (1) Besides {ID6, ID7, ID8}, The clustering shows 4 distinct conservation groups, {ID1, ID2}, {ID3, ID4, ID5}, {ID10, ID11} and {ID12, ID13} which are active over different time periods  (2) The 4 groups ({ID1, ID2}, {ID3, ID4, ID5}, {ID10, ID11} and {ID12, ID13}) are the dominant speakers at different times, most likely when they were presenting  {ID10, ID11} is the most active during the 0 to 5min interval, {D13} from 20 to 25min, {ID3, ID4, ID5} from 25 to 30min, and {ID1, ID2} from 35 to 40min 25 EVALUATION OF SOCIALWEAVER

 Uncontrolled Experiment  (3) ID8 is the most active throughout, and is most likely the lecturer  (4) There can be multiple active conservation groups at the same time  From 0 to 10min, while the group {ID10, ID11} are dominant speakers, there are three other active conservation groups  (5) Some group continues the discussion after presentation  {ID1, ID2} continue their discussion after they became dominant speakers from 35 to 40min for another 20 minutes till the end of class 26 EVALUATION OF SOCIALWEAVER

 Uncontrolled Experiment  They map the inferred clusters to the presentation schedule and conversation clusters manually tagged by an on-site observer to verify the clustering performance  The accuracy for all conversation clustering shown in the figure is 81.9%, 27 EVALUATION OF SOCIALWEAVER

 They presented SocialWeaver, a sensing system running on smartphones to perform conversation clustering and build real- time conversation networks  SocialWeaver exploits collaboration among users to build proximity group, classify speaker, aggregate information and perform conversation group clustering  SocialWeaver provides a practical and effective platform for understanding human communication that has the potential for extracting real world social interactions and has many future applications 28 CONCLUSIONS