“SoundSense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones” Authors: Hong Lu, Wei Pan, Nicholas D. Lane, Tanzeem Choudhury and.

Slides:



Advertisements
Similar presentations
Darwin Phones: the Evolution of Sensing and Inference on Mobile Phones Emiliano Miluzzo *, Cory T. Cornelius *, Ashwin Ramaswamy *, Tanzeem Choudhury *,
Advertisements

Outline Activity recognition applications
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
A Long-Term Sensory Logging Device for Subject Monitoring Dawud Gordon, Florian Witt, Hedda Schmidtke and Michael Beigl TU Braunschweig Institute of Operating.
VTrack: Accurate, Energy-Aware Road Traffic Delay Estimation Using Mobile Phones Arvind Thiagarajan, Lenin Ravindranath, Katrina LaCurts, Sivan Toledo,
THE JIGSAW CONTINUOUS SENSING ENGINE FOR MOBILE PHONE APPLICATIONS Hong Lu,† Jun Yang,! Zhigang Liu,! Nicholas D. Lane,† Tanzeem Choudhury,† Andrew T.
D u k e S y s t e m s Sensing Meets Mobile Social Networks: The Design, Implementation and Evaluation of the CenceMe Application Emiliano Miluzzo†, Nicholas.
SoundSense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones -Hong LU, Wei Pan, Nicholas D. Lane, Tanzeem Choudhury and Andrew T.
DARWIN PHONES: THE EVOLUTION OF SENSING AND INFERENCE ON MOBILE PHONES PRESENTED BY: BRANDON OCHS Emiliano Miluzzo, Cory T. Cornelius, Ashwin Ramaswamy,
Human Activity Inference on Smartphones Using Community Similarity Network (CSN) Ye Xu.
A Survey of Mobile Phone Sensing
Chapter 1: Introduction to Pattern Recognition
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
TRADING OFF PREDICTION ACCURACY AND POWER CONSUMPTION FOR CONTEXT- AWARE WEARABLE COMPUTING Presented By: Jeff Khoshgozaran.
SENSING MEETS MOBILE SOCIAL NETWORKS: THE DESIGN, IMPLEMENTATION AND EVALUATION OF THE CENCEME APPLICATION Emiliano Miluzzo†, Nicholas D. Lane†, Kristóf.
A Practical Approach to Recognizing Physical Activities Jonathan Lester, Tanzeem Choudhury, and Gaetano Borriello In Proceedings of the Fourth International.
A Practical Approach to Recognizing Physical Activities Jonathan Lester Tanzeem Choudhury Gaetano Borriello.
Successful Multiparty Audio Communication over the Internet Vicky Hardman, M. Angela Sasse and Isidor Kouvelas Department of Computer Science University.
Slides modified and presented by Brandon Wilson.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Successful Multiparty Audio Communication over the Internet Vicky Hardman, M. Angela Sasse and Isidor Kouvelas Department of Computer Science University.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,
A Survey of Mobile Phone Sensing Michael Ruffing CS 495.
EXPLOITING VOIP SILENCE FOR WIFI ENERGY SAVINGS IN SMART PHONES Andrew J. Pyles 1, Zhen Ren 1, Gang Zhou 1, Xue Liu 2 1 College of William and Mary, 2.
Crowd++: Unsupervised Speaker Count with Smartphones Chenren Xu, Sugang Li, Gang Liu, Yanyong Zhang, Emiliano Miluzzo, Yih-Farn Chen, Jun Li, Bernhard.
Invariance and context Nothing in the real world is interpreted on its own. –Low-level vision Completely different energy spectra are interpreted as the.
SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem Choudhury and Andrew T. Campbell.
Sensing Meets Mobile Social Networks: The Design, Implementation and Evaluation of the CenceMe Application Emiliano Miluzzo†, Nicholas D. Lane†, Kristóf.
Motivation for ITS Too many vehicles, too little road Infrastructure growth slow due to lack of funds, space and bureaucratic issues Alleviating problem.
Ambulation : a tool for monitoring mobility over time using mobile phones Computational Science and Engineering, CSE '09. International Conference.
Design, Implementation and Evaluation of CenceMe Application COSC7388 – Advanced Distributed Computing Presentation By Sushil Joshi.
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,
Compiled By: Raj G Tiwari.  A pattern is an object, process or event that can be given a name.  A pattern class (or category) is a set of patterns sharing.
SoundSense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones -Hong LU, Wei Pan, Nicholas D. Lane, Tanzeem Choudhury and Andrew T.
Experimental Results ■ Observations:  Overall detection accuracy increases as the length of observation window increases.  An observation window of 100.
Introduction to Smart-Phone Sensing 1. Reference Shamelessly lifted from the following paper : A Survey of Mobile Phone Sensing ◦ By Nicholas D. Lane,
Nicholas D. Lane, Hong Lu, Shane B. Eisenman, and Andrew T. Campbell Presenter: Pete Clements Cooperative Techniques Supporting Sensor- based People-centric.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
The Second Life of a Sensor: Integrating Real-World Experience in Virtual Worlds using Mobile Phones Mirco Musolesi, Emiliano Miluzzo, Nicholas D. Lane,
Audio Location Accurate Low-Cost Location Sensing James Scott Intel Research Cambridge Boris Dragovic Intern in 2004 at Intel Research Cambridge Studying.
Human pose recognition from depth image MS Research Cambridge.
The Sociometer: A Wearable Device for Understanding Human Networks
Abstract A Structured Approach for Modular Design: A Plug and Play Middleware for Sensory Modules, Actuation Platforms, Task Descriptions and Implementations.
Network Community Behavior to Infer Human Activities.
Predicting Voice Elicited Emotions
Sensing Meets Mobile Social Networks: The Design, Implementation and Evaluation of the CenceMe Application Emiliano Miluzzo†, Nicholas D. Lane†, Kristóf.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Power Guru: Implementing Smart Power Management on the Android Platform Written by Raef Mchaymech.
Pocket, Bag, Hand, etc. - Automatically Detecting Phone Context through Discovery Emiliano Miluzzoy, Michela Papandreax, Nicholas D. Laney, Hong Luy, Andrew.
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
Evolutionary Fuzzy Volume Tuner for Cellular Phones.
ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.
A Survey of Mobile Phone Sensing Nicholas D. Lane Emiliano Miluzzo Hong Lu Daniel Peebles Tanzeem Choudhury - Assistant Professor Andrew T. Campbell -
Date of download: 7/8/2016 Copyright © 2016 SPIE. All rights reserved. A scalable platform for learning and evaluating a real-time vehicle detection system.
Traffic State Detection Using Acoustics
R SE to the challenges of ntelligent systems
Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas
Vijay Srinivasan Thomas Phan
Basic Concepts of Audio Watermarking
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
John H.L. Hansen & Taufiq Al Babba Hasan
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Measuring the Similarity of Rhythmic Patterns
Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems NDSS 2019 Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick.
Presentation transcript:

“SoundSense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones” Authors: Hong Lu, Wei Pan, Nicholas D. Lane, Tanzeem Choudhury and Andrew T. Campbell, Department of Computer Science Dartmouth College Published in: ACM Conference on Mobile Systems, Applications, and Services (MobiSys '09), June 22-25, 2009 Benjamin Stokes, Presenter -- 3/28/11 For CS 546: Intelligent Embedded Systems

SoundSense: Overview …a framework for: real-time activity inference using audio without servers. Microphone sensor is often overlooked for inference (but heavily used for comm)

Design Goals 1.At Scale -- to a large number of users; avoid burdensome training requirements 2.With Diverse Audio Environments (robust) 3.While Respecting the Device – do not interfere with the phone’s operation

Noteworthy Constraints Privacy: people are sensitive to capturing phone calls –so limit server-side processing Audio codecs: optimized for human voice, typically sampled at ~8khz. Cannot capture above 4Khz Physics of audio: often layered and remixed –loudest (highest energy) sound dominates –TV audio v. Live audio -- not easily differentiated

What does it look like?

(the same people who made CenseMe) Who is behind this?

Architecture

Pre-processing… drops frames that are too quiet, or too much entropy (e.g., white noise). non-overlapping frames: to minimize CPU burden. 64ms frame length (speech recognition is 25-46ms). Once event recognized, admit 5 seconds of silence – since this might be part of a conversation.

Coarse Category Classification Feature extraction: 8 methods Decision tree classifier: markov models for each of the three sound event categories. Train w/559 sound clips (1 GB) – labeled manually

Stage 2: Fine-grained (by category) Unsupervised learning to discover significant sounds (each into a “bin”) Additional classifiers, esp. Mel Frequency Cepstral Coefficient (MFCC) – mimics human ear SoundRank: to determine if a sound is “interesting” (frequency > 40 mins; duration) Allow users to label interesting ones; hide private Expunge old & uninteresting

Implementation: Refining the Decision Tree Classifier (First Layer) (1)Tree learned to 17-node with a depth of 6 levels – see right (2)Continuous 8 kHZ, 16-bit, mono audio samples. Each frame = 512 samples. 3- frame buffer. (3)Jail-broken iphones to allow for background processing (presumably now possible) (4)Power savings by reducing processing when silent to 1 in 10 frames (every.64 seconds)

Setting Buffer Sizes (1)Buffer for Markov – after decision tree classifier (buffer size of 5 is optimal) (2)MFCC frame length (second stage, classifier mimics human ear)

Evaluation CPU Total: SoundSense + other iPhone system software max was <60% CPU Memory context: iPhone allows 30 MB memory per app

Evaluation: Classification Performance Evaluate data separate from training. Each clip annotated by hand w/label.

Gender Classifier…

Evaluation: Classification Now add Markov…

SoundRank – Top Events Training: users wear iPhone around their neck (!!) for several days – 1 hr of sound a day

App: Audio Daily Diary Goal: –users can find out how much time they spend doing different things. Implementation: –Sound is continuously sampled. –Data from one participant over two weeks…

App: Audio Daily Diary

App: Music Detector Goal: –Crowdsource data collection on nearby music (participatory sensing using audio) Implementation: –When music detected, prompt users to take a photo for community website

Reflections Overall, their framework proposal makes sense – mixing some pre-training, with some automated learning. Feature extraction uses 8 methods – with some literature to support, but not fully convincing argument why 8 (not 7 or 9, or which). Two sample apps are well-chosen –Memory footprint is fine, but CPU consumption unacceptable (should have been discussed more). Of course, this will pass with time. Big Q: when will this be integrated into the mobile infrastructure, e.g., in operating system, or on its own chip? –Better “ground truth” needed, e.g., is classifier picking up car idle, or just when in motion? –Compared to “post-mortems,” few insights from the apps in terms of the design trade-offs others will likely encounter Do differently? Add the ability for user to intervene to compensate for missed opportunities, e.g., doorbell sound They claim to contribute a “Framework” with architecture and algorithms – but their code is private; most of value lost?

Find out more… SoundSense group online: cts.html cts.html

Extension Opportunity: Test a voice-recognition game: (3pm this Zemeckis)