Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander.

Slides:

Advertisements

Similar presentations

Abbas Cheddad, Joan Condell, Kevin Curran and Paul Mc Kevitt Intelligent Systems Research Centre Faculty of Computing and Engineering University of Ulster.

Advertisements

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.

Korea Univ. Division Information Management Engineering UI Lab. Korea Univ. Division Information Management Engineering UI Lab. S E M I N A R Predictive.

1 Fuchun Peng Microsoft Bing 7/23/  Query is often treated as a bag of words  But when people are formulating queries, they use “concepts” as.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 6 Words in Fluent Speech I.

Automatically Annotating and Integrating Spatial Datasets Chieng-Chien Chen, Snehal Thakkar, Crail Knoblock, Cyrus Shahabi Department of Computer Science.

A Novel Approach for Recognizing Auditory Events & Scenes Ashish Kapoor.

Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

ECE 8443 – Pattern Recognition Objectives: Course Introduction Typical Applications Resources: Syllabus Internet Books and Notes D.H.S: Chapter 1 Glossary.

Patch to the Future: Unsupervised Visual Prediction

Psych 156A/ Ling 150: Psychology of Language Learning Lecture 8 Words in Fluent Speech.

Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

Student simulation and evaluation DOD meeting Hua Ai 03/03/2006.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 6 Words in Fluent Speech II.

Psych 156A/ Ling 150: Psychology of Language Learning Lecture 4 Words in Fluent Speech.

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.

CS 188: Artificial Intelligence Fall 2009 Lecture 21: Speech Recognition 11/10/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

Distributional Cues to Word Boundaries: Context Is Important Sharon Goldwater Stanford University Tom Griffiths UC Berkeley Mark Johnson Microsoft Research/

Scalable Text Mining with Sparse Generative Models

Psych 156A/ Ling 150: Acquisition of Language II Lecture 13 Learning Biases.

Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 5 Words in Fluent Speech I.

CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.

Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.

Graphical models for part of speech tagging

Temple University Goals : 1.Down sample 20 khz TIDigits data to 16 khz. 2. Use Down sample data run regression test and Compare results posted in Sphinx-4.

Privacy Protection for Life-log Video Jayashri Chaudhari, Sen-ching S. Cheung, M. Vijay Venkatesh Department of Electrical and Computer Engineering Center.

Statistical Learning in Infants (and bigger folks)

Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

World Languages Mandarin English Challenges in Mandarin Speech Recognition  Highly developed language model is required due to highly contextual nature.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Yazd University, Electrical and Computer Engineering Department Course Title: Advanced Software Engineering By: Mohammad Ali Zare Chahooki 1 Introduction.

Statistical Learning in Infants (and bigger folks)

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

J. Carmona R. Gavaldà UPC (Barcelona, Spain) 1. Outline  The Advent of Process Mining (PM) The challenge of Concept Drift (CD)  Key ingredients  Online.

Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.

Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Lei Zhang and Guoning Chen, Department of Computer Science, University of Houston Robert S. Laramee, Swansea University David Thompson and Adrian Sescu,

Psych 156A/ Ling 150: Acquisition of Language II

Interactive Learning of the Acoustic Properties of Objects by a Robot

Digital Learning India 2008 July , 2008 Mrs. C. Vijayalakshmi Department of Computer science and Engineering Indian Institute of Technology – IIT.

National Taiwan University, Taiwan

Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.

CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.

Chapter 13: Speech Perception. The Acoustic Signal Produced by air that is pushed up from the lungs through the vocal cords and into the vocal tract Vowels.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

CS Machine Learning and Statistical Natural Language Processing Prof. Shlomo Argamon, Room: 237C Office Hours: Mon 3-4 PM Book:

Revisiting Output Coding for Sequential Supervised Learning Guohua Hao & Alan Fern School of Electrical Engineering and Computer Science Oregon State University.

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Paper Title Authors names Conference and Year Presented by Your Name Date.

Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.

A Bayesian approach to word segmentation: Theoretical and experimental results Sharon Goldwater Department of Linguistics Stanford University.

Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者：郝柏翰 2013/05/23.

Psych 156A/ Ling 150: Psychology of Language Learning Lecture 9 Words in Fluent Speech II.

Presenter: Grace M. Wholley Advisor: Jessica F. Hay Department of Psychology, The University of Tennessee, Knoxville

V k equals the vector difference between the object and the block across the first and last frames in the image sequence or more formally: Toward Learning.

Poverty of stimulus in the context of language Second Semester.

Parallel Autonomous Cyber Systems Monitoring and Protection

College of Engineering

Conditional Random Fields for ASR

Psych 156A/ Ling 150: Psychology of Language Learning

Memory and Melodic Density : A Model for Melody Segmentation

GATES: A Grid-Based Middleware for Processing Distributed Data Streams

Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee

Presentation transcript:

Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander Stoytchev Developmental Robotics Lab Department of Electrical and Computer Engineering Iowa State University

Iowa State University Developmental Robotics Laboratory Language: A Grand Challenge A working example Automatically acquires language Well studied

Iowa State University Developmental Robotics Laboratory Statistical Learning Experiments Saffran et. al. (1996): 8-month-olds can segment speech. Artificial Language: tupiro golabu bedaku padoti Language: tu pi ro go la bu be da ku Transition Prob: Acclimate Novel Word Hypothesis: Infants use local minima in single syllable transition probabilities to segment speech streams.

Iowa State University Developmental Robotics Laboratory Voting Experts An algorithm for unsupervised segmentation Key Idea: Natural “chunks” have: –Low Internal Information –High Boundary Entropy itwasabrightcolddayinaprilandtheclockswere

Iowa State University Developmental Robotics Laboratory Voting Experts An algorithm for unsupervised segmentation Key Idea: Natural “chunks” have: –Low Internal Information –High Boundary Entropy itwasabrightcolddayinaprilandtheclockswere

Iowa State University Developmental Robotics Laboratory VE Implementation (Cohen 2006) 1.Build an n-gram trie from text. 2.Slide a window along the text sequence 3.Two experts vote how to break the window 1.One minimizes internal info 2.Other maximizes boundary entropy i t w a s a b r i g h t c o l d d a y i n a p r i l Window 1

Iowa State University Developmental Robotics Laboratory VE Implementation (Cohen 2006) 1.Build an n-gram trie from text. 2.Slide a window along the text sequence 3.Two experts vote how to break the window 1.One minimizes internal info 2.Other maximizes boundary entropy i t w a s a b r i g h t c o l d d a y i n a p r i l Window 2

Iowa State University Developmental Robotics Laboratory VE Implementation (Cohen 2006) 1.Build an n-gram trie from text. 2.Slide a window along the text sequence 3.Two experts vote how to break the window 1.One minimizes internal info 2.Other maximizes boundary entropy 4.Break at vote peaks i t w a s a b r i g h t c o l d d a y i n a p r i l i | t | w | a | s | a | b | r | i | g | h | t | c | o | l | d

Iowa State University Developmental Robotics Laboratory VE Results Results are surprisingly good on text –Especially giving its simplicity –Accuracy and Hit rate about 75% Seems to capture something about the nature of “chunks” Can we use this algorithm to segment real audio? Itwasabright

Iowa State University Developmental Robotics Laboratory Acoustic Model

Iowa State University Developmental Robotics Laboratory Acoustic Model Cluster spectral features using a GGSOM

Iowa State University Developmental Robotics Laboratory Acoustic Model Cluster spectral features using a GGSOM Collapse state sequence

Iowa State University Developmental Robotics Laboratory Acoustic Model Cluster spectral features using a GGSOM Collapse state sequence Run VE to get breaks

Iowa State University Developmental Robotics Laboratory Experiments and Results Used the model to segment “1984” –CD 1 of audio book (40 mins) –Chosen for length, consistency –Evaluation: Human graders

Iowa State University Developmental Robotics Laboratory New Experiments Trained on infant datasets Tested on manually generated keys Stream A: tupiro golabu bedaku padoti Stream B: dapiku tilado pagotu burobi Train Test Acoustic Model A Acoustic Model B VE Model A VE Model B Key A Key B

Iowa State University Developmental Robotics Laboratory New Experiments Trained on infant datasets Tested on manually generated keys Stream A: tupiro golabu bedaku padoti Stream B: dapiku tilado pagotu burobi Test Acoustic Model A Acoustic Model B VE Model A VE Model B Key B Key A

Iowa State University Developmental Robotics Laboratory Results Experiment 1 –Accuracy: 50% on all induced breaks –Hit Rate: 75% of word breaks –Significantly better than chance Experiment 2 –Accuracy: 16% on all induced breaks –Hit Rate: 1% of word breaks –Worse than chance –18 breaks, 3 correct

Iowa State University Developmental Robotics Laboratory Results

Iowa State University Developmental Robotics Laboratory Results

Iowa State University Developmental Robotics Laboratory Conclusions and Future Work VE Model can be used to segment audio Can reproduce the results of Infant studies May model part of the human chunking mechanism Have built more sophisticated acoustic models –Better results (nearly perfect)

Iowa State University Developmental Robotics Laboratory Discussion Suggestions Why? –Can’t we just engineer the solution? What is really needed for unsupervised speech segmentation? Can this model be used for object discovery in other domains?

Iowa State University Developmental Robotics Laboratory Thank You