STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

An Introduction of Support Vector Machine
Rapid and Accurate Spoken Term Detection David R. H. Miller BBN Technolgies 14 December 2006.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
1 Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
ICASSP'06 1 S. Y. Kung 1 and M. W. Mak 2 1 Dept. of Electrical Engineering, Princeton University 2 Dept. of Electronic and Information Engineering, The.
Introduction to Machine Learning Approach Lecture 5.
Chapter 5: Information Retrieval and Web Search
Software Engineer Report What should contains the report?!
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Keyphrase Extraction in Scientific Documents Thuy Dung Nguyen and Min-Yen Kan School of Computing National University of Singapore Slides available at.
Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University.
Rapid and Accurate Spoken Term Detection Owen Kimball BBN Technologies 15 December 2006.
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Detecting Influenza Outbreaks by Analyzing Twitter Messages By Aron Culotta Jedsada Chartree 02/28/11.
Survey of Approaches to Information Retrieval of Speech Message Kenney Ng Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
Audient: An Acoustic Search Engine By Ted Leath Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Engineering University.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Department of Electrical and Computer Engineering.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
TUH EEG Corpus Data Analysis 38,437 files from the Corpus were analyzed. 3,738 of these EEGs do not contain the proper channel assignments specified in.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Data Mining and Decision Support
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of new variational inference.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Machine Learning overview Chapter 18, 21
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
College of Engineering
3.0 Map of Subject Areas.
Consonant variegations in first words: Infants’ actual productions of
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
OpenWorld 2018 Audio Recognition Using Oracle Data Science Platform
Machine Learning Week 1.
Supervised Classification
EEG Recognition Using The Kaldi Speech Recognition Toolkit
feature extraction methods for EEG EVENT DETECTION
Search Engine Architecture
Machine Learning overview Chapter 18, 21
THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU
Presentation transcript:

STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous speech material The effectiveness of a deployed STD system is a tradeoff between processing resource requirements and detection accuracy STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous speech material The effectiveness of a deployed STD system is a tradeoff between processing resource requirements and detection accuracy Abstract Voice keyword search is an extension of text- based searching that allows users to type keywords and search audio files containing spoken language for their existence. Performance dependent on many external factors such as the acoustic channel, language and the confusability of the search term. Goal is to develop an algorithm for predicting the reliability or strength of a search term. Main objective is to find a predictive function to forecast the quality of given search term. Application is to improve the user experience for multimedia information retrieval. Assessment is based on the NIST 2006 Spoken Term Detection (STD) evaluation. Abstract Voice keyword search is an extension of text- based searching that allows users to type keywords and search audio files containing spoken language for their existence. Performance dependent on many external factors such as the acoustic channel, language and the confusability of the search term. Goal is to develop an algorithm for predicting the reliability or strength of a search term. Main objective is to find a predictive function to forecast the quality of given search term. Application is to improve the user experience for multimedia information retrieval. Assessment is based on the NIST 2006 Spoken Term Detection (STD) evaluation. Conclusions Data-driven approaches to modeling search terms appear promising Though the NIST 2006 evaluation data set is small, sufficiently rich generalizations can be made Statistical methods such as decision trees will be useful for clustering and evaluating features The final prediction function will be based on nonlinear mapping functions trained using machine learning approaches (e.g., neural networks or maximum margin classifiers) References 1.J. Fiscus, et al., “Results of the 2006 Spoken Term Detection Evaluation,” Interspeech The 2006 Spoken Term Detection Evaluation Plan, D. Wang, Out of Vocabulary Spoken Term Detection, PhD Dissertation, Edinburgh University, Keyword Search Term Prediction, (project web site) Acknowledgments I am thankful to Sundar Srinivasan and Tao Mao, of Mississippi State University, for their advice and assistance. Conclusions Data-driven approaches to modeling search terms appear promising Though the NIST 2006 evaluation data set is small, sufficiently rich generalizations can be made Statistical methods such as decision trees will be useful for clustering and evaluating features The final prediction function will be based on nonlinear mapping functions trained using machine learning approaches (e.g., neural networks or maximum margin classifiers) References 1.J. Fiscus, et al., “Results of the 2006 Spoken Term Detection Evaluation,” Interspeech The 2006 Spoken Term Detection Evaluation Plan, D. Wang, Out of Vocabulary Spoken Term Detection, PhD Dissertation, Edinburgh University, Keyword Search Term Prediction, (project web site) Acknowledgments I am thankful to Sundar Srinivasan and Tao Mao, of Mississippi State University, for their advice and assistance. J. Smith, M. Smith, and P. Smith Advisers: J. Doe and M. Doe Team Thingamabob College of Engineering, Temple University, Philadelphia, Pennsylvania Quality Assessment For Search Terms in Spoken Systems Figure 5. TWV vs. number of syllables Figure 6. TWV vs. number of consonants Applications of Spoken Term Detection Simple keyword search Complex queries (e.g., logical operators) Prioritization of documents to be reviewed Spoken document retrieval Content-based Internet searches Topic Spotting for national security Clustering/categorization of spoken documents Applications of Spoken Term Detection Simple keyword search Complex queries (e.g., logical operators) Prioritization of documents to be reviewed Spoken document retrieval Content-based Internet searches Topic Spotting for national security Clustering/categorization of spoken documents System Block Diagram System Block Diagram Figure 4. A block diagram of a KWS system. Methods Understanding the relationship between phonetic structure and TWV Analyze training set in terms of broad phonetic class and CVC sequences Investigate the relationship between different features such as number of syllables, syllable structure and consonant position Using linear regression and analysis of variance to select best features Using decision trees to understand relationships between features and how to cluster features Methods Understanding the relationship between phonetic structure and TWV Analyze training set in terms of broad phonetic class and CVC sequences Investigate the relationship between different features such as number of syllables, syllable structure and consonant position Using linear regression and analysis of variance to select best features Using decision trees to understand relationships between features and how to cluster features Figure 4. TWV Vs. Terms (from NIST 2006 STD evaluation) Table 1. Broad Phonetic Class Performance Measurement Types of Errors: false alarms / missed detections Detection error tradeoff (DET) curves are used to evaluate performance. A term-weighted value (TWV) criterion was adopted to balance many practical issues.. Performance Measurement Types of Errors: false alarms / missed detections Detection error tradeoff (DET) curves are used to evaluate performance. A term-weighted value (TWV) criterion was adopted to balance many practical issues.. Figure 3. Typical STD system performance. Figure 2. Generic STD System Architecture. A typical STD system divides the task into two separate phases: indexing and searching. Figure 1. Block Diagram of STD System Table 2. TWV vs. initial phoneme type Table 3. TWV vs. final phoneme type The maximum possible TWV is 1.0, corresponding to a ‘perfect’ system output: no misses and no false alarms. Search terms should have at least 2 consonants Search terms with 4 consonants give improved performance Search terms should have at least 2 consonants Search terms with 4 consonants give improved performance Monosyllabic search terms have poor performance Performance improves dramatically for bisyllabic and trisyllabic terms Monosyllabic search terms have poor performance Performance improves dramatically for bisyllabic and trisyllabic terms

Initial Phoneme TypeAverage TWV Consonant-0.25 Vowel0.44

Final Phoneme TypeAverage TWV Consonant0.43 Vowel-1.42