IBM Haifa Research Lab © 2008 IBM Corporation Retrieving Spoken Information by Combining Multiple Speech Transcription Methods Jonathan Mamou Joint work.

Slides:

Advertisements

Similar presentations

Traditional IR models Jian-Yun Nie.

Advertisements

Chapter 5: Introduction to Information Retrieval

ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.

Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.

T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.

Ch 4: Information Retrieval and Text Mining

Vector Space Model CS 652 Information Extraction and Integration.

The Vector Space Model …and applications in Information Retrieval.

INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,

CS246 Basic Information Retrieval. Today’s Topic  Basic Information Retrieval (IR)  Bag of words assumption  Boolean Model  Inverted index  Vector-space.

Chapter 5: Information Retrieval and Web Search

DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University.

1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.

A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.

Personalisation Seminar on Unlocking the Secrets of the Past: Text Mining for Historical Documents Sven Steudter.

Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.

MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.

Concept Unification of Terms in Different Languages for IR Qing Li, Sung-Hyon Myaeng (1), Yun Jin (2),Bo-yeong Kang (3) (1) Information & Communications.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.

Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Japanese Spontaneous Spoken Document Retrieval Using NMF-Based Topic Models Xinhui Hu, Hideki Kashioka, Ryosuke Isotani, and Satoshi Nakamura National.

Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.

Weighting and Matching against Indices. Zipf’s Law In any corpus, such as the AIT, we can count how often each word occurs in the corpus as a whole =

Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.

Chapter 6: Information Retrieval and Web Search

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,

Word and Sub-word Indexing Approaches for Reducing the Effects of OOV Queries on Spoken Audio Beth Logan Pedro J. Moreno Om Deshmukh Cambridge Research.

Improving out of vocabulary name resolution The Hanks David Palmer and Mari Ostendorf Computer Speech and Language 19 (2005) Presented by Aasish Pappu,

Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.

Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.

University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

Survey of Approaches to Information Retrieval of Speech Message Kenney Ng Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.

Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.

National Taiwan University, Taiwan

Vector Space Models.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.

Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.

ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

Pruning Analysis for the Position Specific Posterior Lattices for Spoken Document Search Jorge Silva University of Southern California Ciprian Chelba and.

Text Similarity: an Alternative Way to Search MEDLINE James Lewis, Stephan Ossowski, Justin Hicks, Mounir Errami and Harold R. Garner Translational Research.

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky

Conditional Random Fields for ASR

Information Retrieval and Web Search

Mohamed Kamel Omar and Lidia Mangu ICASSP 2007

Chapter 5: Information Retrieval and Web Search

VECTOR SPACE MODEL Its Applications and implementations

Presentation transcript:

IBM Haifa Research Lab © 2008 IBM Corporation Retrieving Spoken Information by Combining Multiple Speech Transcription Methods Jonathan Mamou Joint work with Ron Hoory, David Carmel, Yosi Mass, Bhuvana Ramabhadran, Benjamin Sznajder

IBM Haifa Research Lab © 2008 IBM Corporation 2 Speech Technologies Seminar 2008 Motivation Spoken data is everywhere! Conference Meetings Broadcasts News Surveillance & Security Call Center

IBM Haifa Research Lab © 2008 IBM Corporation 3 Speech Technologies Seminar 2008 IR Tasks on Speech Data  Spoken Document Retrieval (SDR)  Traditional search engine approach: find spoken documents relevant to a query.  Spoken Term Detection (STD)  Detect occurrences of a phrase in spoken documents.  NIST STD evaluation

IBM Haifa Research Lab © 2008 IBM Corporation 4 Speech Technologies Seminar 2008 Approaches for Speech Information Retrieval  Keyword spotting  Based on direct detection of a predefined set of keywords in the speech data  Build an index out of automatic transcription output  Based on full transcription of the audio and indexing of the transcription process output  This is the approach we are using  Part of this work has been done in the framework of SAPIR, an EU FP6 project of Search in Audiovisual Content using P2P

IBM Haifa Research Lab © 2008 IBM Corporation 5 Speech Technologies Seminar 2008 Overview Automatic Speech Recognition Speech data Vocabulary Language model Acoustic model Index Search Engine Query Ranked Results

IBM Haifa Research Lab © 2008 IBM Corporation 6 Speech Technologies Seminar 2008 Why is it different from classic text IR?  The classic text IR based solution would be an indexing and search of 1-best word transcript.  However, two main issues can arise during the transcription of the speech data:  Errors (substitutions, deletions, insertions) can occur during the transcription  Out-of-vocabulary (OOV) terms can be present in the spoken data and in the query  OOV words are missing words from the ASR system vocabulary  They are replaced in the output transcript by alternatives that are probable, given the acoustic model, vocabulary and language model of the ASR system.  e.g., TALIBAN  TELL A BAND  Over 10% of user queries can be OOV terms (especially named entities)

IBM Haifa Research Lab © 2008 IBM Corporation 7 Speech Technologies Seminar 2008 Influence of the WER on the Retrieval  Substitutions and deletions reflect the fact that a term “appearing” in the speech signal is not recognized  Impact on the recall of the search (i.e., fraction of the documents relevant to the query that are successfully retrieved)  Substitutions and insertions reflect the fact that a term which is not part of the speech signal appears in the transcript  Impact on the precision of the search (i.e., fraction of the retrieved documents that are relevant to the query)  These issues may dramatically affect the effectiveness of the retrieval and prevent the “naïve” search engine from retrieving the information

IBM Haifa Research Lab © 2008 IBM Corporation 8 Speech Technologies Seminar 2008 Technical Approach  We have developed algorithms  to improve search effectiveness in the presence of errors,  to allow OOV queries.  Indexing of the Word Confusion Network (WCN) including word alternatives and corresponding confidences, for IV terms.  Phonetic indexing and fuzzy search.

IBM Haifa Research Lab © 2008 IBM Corporation Retrieval Model

IBM Haifa Research Lab © 2008 IBM Corporation 10 Speech Technologies Seminar 2008 Word Search  We index Word Confusion Network (WCN) [Mangu et al., 2000]  It is a compact representation of a word lattice: the different word hypotheses that appear at a same time are aligned.  A vertex is associated with a timestamp.  An edge is labeled with  a word hypothesis,  its posterior probability: the probability of the word given the signal.

IBM Haifa Research Lab © 2008 IBM Corporation 11 Speech Technologies Seminar 2008 … have 61% graphic 22% on 100% impressions 19 % grass 3% interested 9% graphics 13% … and 39% glasses 27% impresses 7 % my 100% seen 1% screen 99% A fragment of WCN

IBM Haifa Research Lab © 2008 IBM Corporation 12 Speech Technologies Seminar 2008 Improving Retrieval Effectiveness using WCNs  Recall is enhanced by expanding the 1-best transcript by extra words, taken from the other alternatives provided by the WCN.  These alternatives may have been spoken but were not the top choice of the ASR.  However, such an expansion will probably decrease the precision!  Using an intelligent ranking model, we can improve the mean average precision (MAP) of the search.  Average precision is average of precisions computed after truncating the list of results after each of the relevant documents in turn.  MAP emphasizes returning more relevant documents earlier.

IBM Haifa Research Lab © 2008 IBM Corporation 13 Speech Technologies Seminar 2008 Improving Retrieval Effectiveness using WCNs  We exploit two pieces of information provided by WCN concerning the occurrences of a term to improve our ranking model:  The posterior probability of the hypothesis given the signal,  The rank of the hypothesis among the other alternatives.

IBM Haifa Research Lab © 2008 IBM Corporation 14 Speech Technologies Seminar 2008 Posterior Probability of the Hypothesis, Confidence Level  The posterior probability of the hypothesis given the signal reflects the confidence of the ASR in the hypothesis.  The retrieval process will boost documents for which the query term occurs with higher probability.  We denote by Pr(t|o,D) the posterior probability of a term t at offset o in the WCN of a document D.

IBM Haifa Research Lab © 2008 IBM Corporation 15 Speech Technologies Seminar 2008 Rank of the Hypothesis, Relative Importance  The rank of the hypothesis among other alternatives reflects the importance of the term relatively to other alternatives.  A document containing a query term that is ranked higher, should be preferred over a document where the same term is ranked lower.  We denote by rank(t|o,D) the rank of a term t at the offset o in the WCN of a document D.  A boosting vector B=(B 1,…,B l ) associates a boosting factor to each rank of the different hypotheses.

IBM Haifa Research Lab © 2008 IBM Corporation 16 Speech Technologies Seminar 2008 Scoring  Our scoring is based on Vector Space Model (VSM) [Salton and McGill, 1986]  It is an algebraic model for representing documents as vectors of words.  Each dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is its tfidf.  Relevance ranking of documents can be calculated by comparing the cosine of the angles between each document vector and the original query vector where the query is represented as same kind of vector as the documents. d1d1 q d2d2

IBM Haifa Research Lab © 2008 IBM Corporation 17 Speech Technologies Seminar 2008 Scoring  Term frequency – Inverse document frequency (tfidf)  This weight is a statistical measure used to evaluate how important a word is to a document in a corpus.  The importance increases proportionally to the number of times a word appears in the document (term frequency - tf) but is offset by the frequency of the word in the corpus (inverse document frequency - idf).

IBM Haifa Research Lab © 2008 IBM Corporation 18 Speech Technologies Seminar 2008 Term Frequency  The term frequency is evaluated by summing the posterior probabilities of all the occurrences of the term over the document.  The term frequency is boosted by the rank of the term among the other hypotheses. occ(t,D) is the sequence of all the occurrences of t in D.

IBM Haifa Research Lab © 2008 IBM Corporation 19 Speech Technologies Seminar 2008 Phonetic Search  Different kinds of phonetic transcripts:  Sub-word decoding [Siohan and Bacchiani, 2005]  Sub-word representation of automatic 1-best word transcript  Sub-word can be word-fragment, syllable, phone  Sub-word transcripts have high error rate  Phonetic transcription cannot be an alternative to word transcripts especially for in-vocabulary (IV) search.  That is the reason why we need to combine word transcripts with phonetic transcripts.

IBM Haifa Research Lab © 2008 IBM Corporation 20 Speech Technologies Seminar 2008 Phonetic Search  Relevant to IV and OOV search  N-gram or sub-word based indexing  Retrieval approaches  Exact search  High precision but low recall  Fuzzy search  It improves recall while decreases precision  Using an intelligent ranking model, we can improve the mean average precision of the search.  Based on Edit distance on pronunciations  We have implemented a fail-fast dynamic algorithm for computing it

IBM Haifa Research Lab © 2008 IBM Corporation 21 Speech Technologies Seminar 2008 Scoring  Our scoring model extends TFIDF.  Let’s consider a query that is represented by the phonetic pronunciation ph.  sim(ph,ph’) is the edit distance based similarity between two phonetic pronunciations ph and ph’.  Term frequency:  Document frequency: N is the number of documents in the corpus.

IBM Haifa Research Lab © 2008 IBM Corporation 22 Speech Technologies Seminar 2008 Phonetic Query Expansion  Compensate for  OOV  spelling variations  Each query term is converted to its phonetic pronunciations using joint maximum entropy N-gram model [Chen, 2003].  Each pronunciation is associated with a score that reflects the probability of this pronunciation normalized by the probability of the best pronunciation, given the spelling.

IBM Haifa Research Lab © 2008 IBM Corporation 23 Speech Technologies Seminar 2008 Phonetic Query Expansion  Let’s consider a query term t that is expanded to (ph 1,s 1 ), …, (ph m,s m ) where ph i is a pronunciation and s i its associated score.  The score of t in D is given by the aggregation of the scores of the search on D of the pronunciations ph i weighted by their score s i

IBM Haifa Research Lab © 2008 IBM Corporation 24 Speech Technologies Seminar 2008 Combination of word search with phonetic search  Using the Threshold Algorithm [Fagin, 1996]  Merging result lists of documents returned respectively by word and phonetic search, ordered according to their score  Using inverted indices with Boolean Constraints  Merging posting lists extracted from inverted indices (word and phonetic), ordered by the document identifiers, according to Boolean constraints  Based on query rewriting to combine word and phonetic parts of the original query

IBM Haifa Research Lab © 2008 IBM Corporation Experiments

IBM Haifa Research Lab © 2008 IBM Corporation 26 Speech Technologies Seminar 2008 Experimental Setup  2236 calls made to the IBM internal customer support service.  The calls deal with a large range of software and hardware problems.  The average length of a call is 18 minutes.

IBM Haifa Research Lab © 2008 IBM Corporation 27 Speech Technologies Seminar 2008 Precision and Recall vs. WER  As expected, indexing all WCN candidates improve Recall while reduce Precision  Recall/Precision are both decreased with higher WER

IBM Haifa Research Lab © 2008 IBM Corporation 28 Speech Technologies Seminar 2008 Experiments with several retrieval strategies over WCN  1-best WCN TF  Index: 1-best transcript obtained from WCN - Ranking: classic tf-idf  All WCN TF  Index: all the WCN hypotheses - Ranking: classic tf-idf  1-best WCN CL  Index: 1-best transcript obtained from WCN - Ranking: confidence levels  All WCN CL  Index: all the WCN hypotheses - Ranking: confidence levels  ALL WCN CL boost  Index: all the WCN hypotheses - Ranking: confidence levels and rank among the other hypotheses

IBM Haifa Research Lab © 2008 IBM Corporation 29 Speech Technologies Seminar 2008 MAP vs. WER  Using confidence level information provides significant contribution.  all WCN CL boost always outperforms the other models, especially for high WER.

IBM Haifa Research Lab © 2008 IBM Corporation 30 Speech Technologies Seminar 2008 Experimental Setup  Data set provided by NIST for the STD evaluation  3 hours of broadcast news  We built three different indices  Word: word index on the WCN  WordPhone: a phonetic index of the phonetic representation of the 1-best word decoding  Phone: a phonetic index of the 1-best word-fragment decoding  For phonetic retrieval, we compared two different search methods: exact and fuzzy match.

IBM Haifa Research Lab © 2008 IBM Corporation 31 Speech Technologies Seminar 2008 MAP of Phonetic Query Expansion for OOV search  MAP of phonetic retrieval improves by up to 7.5% with query expansion in respect to baseline search approaches. Phonetic Search Method WordPhonePhoneMerge Exact Exact+expansion Fuzzy Fuzzy+expansion

IBM Haifa Research Lab © 2008 IBM Corporation 32 Speech Technologies Seminar 2008 MAP for Hybrid Search  Queries combine IV and OOV terms under different query semantics  Improvement of merge approach with respect to word and phonetic approaches SemanticsWordWordPhonePhoneMerge OR AND

IBM Haifa Research Lab © 2008 IBM Corporation 33 Speech Technologies Seminar 2008 Conclusions  The Approach  Word-based approach suffers from limited vocabulary of the recognition system.  Phonetic-based approach suffers from lower accuracy.  Our spoken information retrieval system combines both approaches  Recall and MAP are significantly improved by searching  all the hypotheses provided by the WCN  in phonetic transcripts  This approach received the highest overall ranking for US English speech data in the last NIST Spoken Term Detection evaluation (December 2006).

IBM Haifa Research Lab © 2008 IBM Corporation 34 Speech Technologies Seminar 2008 References  Spoken Document Retrieval from Call-Center Conversations, Jonathan Mamou, David Carmel, Ron Hoory, SIGIR 2006  Vocabulary Independent Spoken Term Detection, Jonathan Mamou, Bhuvana Ramabhadran, Olivier Siohan, SIGIR 2007  Audio-visual content analysis in P2P: the SAPIR approach, Walter Allasia, Francesco Gallo, Fabrizio Falchi, Mouna Kacimi, Aaron Kaplan, Jonathan Mamou, Yosi Mass, Nicola Orio, Workshop on Automated Information Extraction in Media Production, DEXA 2008  Combination of Multiple Speech Transcription Methods for Vocabulary Independent Search, Jonathan Mamou, Yosi Mass, Bhuvana Ramabhadran, Benjamin Sznajder, Search in Spontaneous Conversational Speech Workshop, SIGIR 2008  Phonetic Query Expansion for Spoken Document Retrieval, Jonathan Mamou, Bhuvana Ramabhadran, Interspeech 2008

IBM Haifa Research Lab © 2008 IBM Corporation Thank you!