Multi-Speaker Detection By Matt Fratkin EE 6820 3/9/05.

Slides:



Advertisements
Similar presentations
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Advertisements

Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)
Philip Harrison J P French Associates & Department of Language & Linguistic Science, York University IAFPA 2006 Annual Conference Göteborg, Sweden Variability.
1 Live Sound Reinforcement Microphones. 2 Live Sound Reinforcement A microphone is a transducer that changes sound waves into electrical signals and there.
Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
Abstract This article investigates the importance of the vocal source information for speaker recogni- tion. We propose a novel feature extraction scheme.
Multi-View Learning in the Presence of View Disagreement C. Mario Christoudias, Raquel Urtasun, Trevor Darrell UC Berkeley EECS & ICSI MIT CSAIL.
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Chord Recognition EE6820 Speech and Audio Signal Processing and Recognition Mid-term Presentation JunHao Ip.
Pitch Recognition with Wavelets Final Presentation by Stephen Geiger.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Classifying Motion Picture Audio Eirik Gustavsen
Presented By: Karan Parikh Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent.
Segmentation and Event Detection in Soccer Audio Lexing Xie, Prof. Dan Ellis EE6820, Spring 2001 April 24 th, 2001.
A Bayesian algorithm for tracking multiple moving objects in outdoor surveillance video Department of Electrical Engineering and Computer Science The University.
Speaker Adaptation for Vowel Classification
Using Emotion Recognition and Dialog Analysis to Detect Trouble in Communication in Spoken Dialog Systems Nathan Imse Kelly Peterson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Voice Training Machine Final Project By: Masood Qazi Zhongying Zhou.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated.
Crowd++: Unsupervised Speaker Count with Smartphones Chenren Xu, Sugang Li, Gang Liu, Yanyong Zhang, Emiliano Miluzzo, Yih-Farn Chen, Jun Li, Bernhard.
LE 460 L Acoustics and Experimental Phonetics L-13
SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem Choudhury and Andrew T. Campbell.
Kinect Player Gender Recognition from Speech Analysis
Instrument Recognition in Polyphonic Music Jana Eggink Supervisor: Guy J. Brown University of Sheffield
SELECTION Prepared by: Omid Sabah. Objectives : By the end of this chapter you will be able to:  Introduction to Selection  What is selection process.
Project 1 : Eigen-Faces Applied to Speech Style Classification Brad Keserich, Senior, Computer Engineering College of Engineering and Applied Science;
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
What is Environmental “Science” ?. A mix of sciences (ecology, chemistry, biology, math) and social studies (municipal, Provincial, and National government,
Hermitage A Mark Martin (MN) Steve Mehring (MT). Introduction to Hotel Audio Setting up and Managing a WiFi Hotspot Bluetooth speakers for smartphones.
By Sarita Jondhale1 Pattern Comparison Techniques.
Technical Seminar Presented by :- Debabandana Apta (EC ) National Institute of Science and Technology [1] “ECHO CANCELLATION” Presented.
Exploiting lexical information for Meeting Structuring Alfred Dielmann, Steve Renals (University of Edinburgh) {
Umm Al-Qura University Collage of Computer and Info. Systems Computer Engineering Department Automatic Camera Tracking System IMPLEMINTATION CONCLUSION.
Group Members: Sam Marlin, Jonathan Brown Faculty Adviser: Tom Miller.
Math 5 Professor Barnett Timothy G. McManus Anthony P. Pastoors.
A New Fingertip Detection and Tracking Algorithm and Its Application on Writing-in-the-air System The th International Congress on Image and Signal.
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
Multiple Audio Sources Detection and Localization Guillaume Lathoud, IDIAP Supervised by Dr Iain McCowan, IDIAP.
Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
AUTOMATIC TARGET RECOGNITION AND DATA FUSION March 9 th, 2004 Bala Lakshminarayanan.
Image and Video Retrieval INST 734 Doug Oard Module 13.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Speech Perception.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Segmenting Popular Music Sentence by Sentence Wan-chi Lee.
The Methods of Science Chapter 1.
ARTIFICIAL NEURAL NETWORKS
Artist Identification Based on Song Analysis
CSSE463: Image Recognition Day 11
Speech Database/Tool System And Preliminary Accent study.
Speech Perception.
Vehicle Segmentation and Tracking from a Low-Angle Off-Axis Camera
CSSE463: Image Recognition Day 11
Radio Frequency Interference
Voice source characterisation
Audio Multimedia Broadcast.
John H.L. Hansen & Taufiq Al Babba Hasan
Evaluation Report Investigates a problem or a set of options.
CSSE463: Image Recognition Day 11
CSSE463: Image Recognition Day 11
Music Signal Processing
VoiceXML An investigation Author: Mya Anderson
Presentation transcript:

Multi-Speaker Detection By Matt Fratkin EE /9/05

Background Currently, there is a lot of work being done on speaker recognition, but a new problem arises when more than one speaker is present.

Uses for Multi-Speaker Detection Multi-speaker detection is important in instances when there is dialog between more than one person such as in a meeting, debate, conference, or court hearing. It can also be helpful if a certain key speaker needs to be tracked throughout a certain speech or debate.

Attempted Solutions The first attempt at solving this problem was transcribing the events, but that can cost up to $400/hr. An attempt was made to have each speaker have their own microphone therefore each microphone would represent a certain speaker. This proved unsuccessful because other microphones would pick up crosstalk from nearby speakers.

Possible Methods for Multi- Speaker Detection Pattern Recognition Dual Pitch Tracking Speaker Segmentation

Pattern Recognition This method would be used by first hand marking overlaps and then calculating features for each of the overlaps. Some possible features would be critical band loudness values, energy, and zero-crossing rate. After that a classifier would be built to try to separate the two different classes.

Dual Pitch Tracking Using the idea that a single speaker ’ s voice only has one single pitch, a comb filter can be used to cancel out distinct harmonics. Using two comb filters tuned at different pitches that would be able to eliminate overlapping vowels which normally have different pitches. Therefore wherever the second comb filter cancelled out the most energy would indicate two different pitches in the frame.

Speaker Segmentation Using conventional speaker segmentation one would be able to take a look at the boundaries that are created by the different speakers. It would also be possible to take a look at events when speakers are interrupted, to see if these scenarios fit the classification of an overlap.

Sound Sources The sound that would be used for this project will be taken from Professor Ellis ’ ICSI Meeting recorder project. Here there are examples from a recorded meeting, therefore providing the ability to take a look at real world overlaps.

Conclusion From examining the three different methods for multi-speaker detection, I will be able to chose the one that detects multiple speakers with the highest accuracy. After choosing the one that is the most accurate, I can investigate the method more closely in hopes of proposing new ideas to improve the method.

References Brown, Guy J., Renals, Steve, Wan, Vincent, and Wrigley, Stuart N., “Speech and Crosstalk Detection in Multichannel Audio,” Ellis, Daniel P.W. and Kennedy,Lyndon S., “PITCH-BASED EMPHASIS DETECTION FOR CHARACTERIZATION OF MEETINGRECORDINGS,” Lu, Lie and Zhang, Hong-Jiang, “Speaker Change Detection and Tracking in Real- Time News Broadcasting Analysis,” lu.pdf?key1=641127&key2= &coll=GUIDE&dl=GUIDE&CFID= & CFTOKEN= , Martin, Alvin F. and Przybocki, Mark A., “Speaker Recognition in a Multi-Speaker Environment”,