Language modeling for speaker recognition Dan Gillick January 20, 2004.

Slides:



Advertisements
Similar presentations
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Advertisements

Language Modeling.
N-gram model limitations Important question was asked in class: what do we do about N-grams which were not in our training corpus? Answer given: we distribute.
SI485i : NLP Set 4 Smoothing Language Models Fall 2012 : Chambers.
Smoothing Techniques – A Primer
Speaker Detection Without Models Dan Gillick July 27, 2004.
8/12/2003 Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye International Computer Science Institute.
1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.
A new framework for Language Model Training David Huggins-Daines January 19, 2006.
1 Smoothing LING 570 Fei Xia Week 5: 10/24/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A AA A A A.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
9/20/2004Speech Group Lunch Talk Speaker ID Smorgasbord or How I spent My Summer at ICSI Kofi A. Boakye International Computer Science Institute.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
I256 Applied Natural Language Processing Fall 2009 Lecture 7 Practical examples of Graphical Models Language models Sparse data & smoothing Barbara Rosario.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 18: 10/26.
Scalable Text Mining with Sparse Generative Models
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
1 Advanced Smoothing, Evaluation of Language Models.
Introduction to Automatic Speech Recognition
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
Lecture 6: The Ultimate Authorship Problem: Verification for Short Docs Moshe Koppel and Yaron Winter.
NLP Language Models1 Language Models, LM Noisy Channel model Simple Markov Models Smoothing Statistical Language Models.
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.
Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.
Tokenization & POS-Tagging
LING 388: Language and Computers Sandiway Fong Lecture 27: 12/6.
Lecture 4 Ngrams Smoothing
Language modelling María Fernández Pajares Verarbeitung gesprochener Sprache.
N-gram Models CMSC Artificial Intelligence February 24, 2005.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Yuya Akita , Tatsuya Kawahara
National Taiwan University, Taiwan
Ngram models and the Sparcity problem. The task Find a probability distribution for the current word in a text (utterance, etc.), given what the last.
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
Estimating N-gram Probabilities Language Modeling.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Scoring Guide Please score only the sections you’re being asked to score!
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Natural Language Processing Statistical Inference: n-grams
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Statistical NLP Spring 2011 Lecture 3: Language Models II Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
Language Model for Machine Translation Jang, HaYoung.
A Simple Approach for Author Profiling in MapReduce
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
Language Models for Information Retrieval
CSCE 771 Natural Language Processing
Presentation transcript:

Language modeling for speaker recognition Dan Gillick January 20, 2004

Dan Gillick (2)January 20, 2004Language modeling for speaker recognition Outline Author identification Trying to beat Doddington’s “idiolect” modeling strategy (speaker recognition) My next project

Dan Gillick (3)January 20, 2004Language modeling for speaker recognition Author ID (undergrad. thesis) Problem: –train models for each of k authors –given some test text written by 1 of those authors, identify the correct author Variations: –different kinds of models –different size test samples –different k

Dan Gillick (4)January 20, 2004Language modeling for speaker recognition Character n-gram models What? –27 tokens: a-z, –some text generated from such a trigram model: “you orthad gool of anythilly uncand or prafecaustiont and to hing that put ably”

Dan Gillick (5)January 20, 2004Language modeling for speaker recognition Character n-gram models Why? –very simple –data sparseness less troublesome than with word n-grams –supposed to be state-of-the-art or at least close to it (Khmelev, D, Tweedie, F.J. “Using Markov Chains for the Identification of Writers”: Literary and Linguistic Computing, 16(4): )

Dan Gillick (6)January 20, 2004Language modeling for speaker recognition Character n-grams: Setup task: pick correct author from 10 possible authors training data: 3 novels for each author test data: text from a held-out novel jack-knifing: 4 novels for each of 20 authors

Dan Gillick (7)January 20, 2004Language modeling for speaker recognition Character n-grams: Results task: picking 1 author from 10 possible authors training data size: 3 novels

Dan Gillick (8)January 20, 2004Language modeling for speaker recognition Character n-gram models Why does it work? –captures some word choice information –picks up word endings (–ing, -tion, -ly, etc.) –not hurt much by data sparseness issues

Dan Gillick (9)January 20, 2004Language modeling for speaker recognition Key-list models Incentive: –ought to be able to beat character n-grams –develop a new modeling method more focused on that which differentiates between authors (characters and words are both useful for topic recognition, but that doesn’t mean they are best for author recognition)

Dan Gillick (10)January 20, 2004Language modeling for speaker recognition Key-list models Idea: –convert the text stream into a stream of only authorship-relevant symbols (I called these lists of symbols key-lists) –each symbol is a regular expression to allow for broad definitions (/*tion/ captures any nounification) –text not accounted for by the key-list is represented by,, or markers –build n-gram models from these new streams

Dan Gillick (11)January 20, 2004Language modeling for speaker recognition Key-list models sample trigram: Regular ExpressionDescription (\w)(,)(\s)comma (\w)(\.)(\s)period (\b)(of|for|to|around|after| … )(\b)common prepositions (\b)(was|were \w*ed(\b)passive voice (\b)(is|was|will|are|were|am)(\b)is conjugations (\b)(\w*ing)(\b)ends in –ing (\b)(\w*ly)(\b)adverb (\b)(and|but|or|not|if|then|else)(\b)logical (\b)(as)(\b)as (\b)(would|should|could)(\b)modal verbs Sample key-list:

Dan Gillick (12)January 20, 2004Language modeling for speaker recognition Key-list models: Results task: picking 1 author from 10 possible authors training data size: 3 novels

Dan Gillick (13)January 20, 2004Language modeling for speaker recognition Key-list models: Results Some other interesting results: –key-lists with just punctuation (as well as,, ) performed almost as well as the best key-lists –all key-lists were outperformed by the best n- letter model when test data size < 10,000 chars. but all key-list models eventually surpassed the n-letter models

Dan Gillick (14)January 20, 2004Language modeling for speaker recognition Key-list models Things I didn’t do: –vary amount of training data –spend a long time trying different key-lists –combine key-list results with each other or with the character results –a lot of other stuff The thesis is available on the web:

Dan Gillick (15)January 20, 2004Language modeling for speaker recognition Outline Author identification Trying to beat Doddington’s “idiolect” modeling strategy (speaker recognition) My next project

Dan Gillick (16)January 20, 2004Language modeling for speaker recognition G. Doddington’s LM strategy create LMs with a limited vocabulary of the most commonly occurring 2000 bigrams to smooth out zeroes, boost each bigram prob. by score by calculating: logprob(test|target) – logprob(test|bkg) logprobs are joint probabilities logprob(AB) = logprob(A) + logprob(B|A)

Dan Gillick (17)January 20, 2004Language modeling for speaker recognition G. Doddington’s LM: Setup Switchboard 1 data: –collected in early ’90s from all over the US –2,400 (~5 min.) conversations among 543 speakers –corpus divided into 6 splits and tested using jack- knifing through the splits –manual transcripts provided by MS. State Task: –8 conversation sides used as training data to build models for each target speaker –1 conversation side used as test data –background model built from 3 splits of held-out data –jack-knifing allowed for almost 10,000 trials

Dan Gillick (18)January 20, 2004Language modeling for speaker recognition G. Doddington’s LM: Results Notes: –these results are my own attempt to replicate the original experiments –SRI reported EER = 8.65% for this same experiment

Dan Gillick (19)January 20, 2004Language modeling for speaker recognition Adapted bigram models Incentive: –adapting target models from a much larger background model should yield better estimates of probabilities in the language models Specifically: –use same 2000 bigram vocabulary –target probabilities are a mixture of training probabilities and background probabilities –mixture weight is 2:1 target data:bkg. data

Dan Gillick (20)January 20, 2004Language modeling for speaker recognition Adapted bigram models: Results Notes: –nearly identical performance –combination of the 2 systems yields almost no improvement –why isn’t the adapted version better?

Dan Gillick (21)January 20, 2004Language modeling for speaker recognition Can anything improve on 8.68? Trigrams? –use same count threshold to make a list of the top 700 trigrams (“a lot of”, “I don’t know” were among the most common) Character models? –worked well for authorship… –included all character combinations (no limited vocabulary) –tried bigram and trigram models

Dan Gillick (22)January 20, 2004Language modeling for speaker recognition Scores and combinations adapt. word bigrams EER = 8.89% adapt. word trigrams EER = 11.88% adapt.char. bigrams EER = 13.73% adapt. char. trigrams EER = 17.92% adapted words EER = 8.46% adapted words + adapted characters EER = 7.89% adapted characters EER = 13.24% GD bigrams EER = 8.68%

Dan Gillick (23)January 20, 2004Language modeling for speaker recognition Final Comparison

Dan Gillick (24)January 20, 2004Language modeling for speaker recognition What about less training data? 1 conversation-side training –character models might provide more of an advantage with less data? –not so. GD EER = 22.5% adapted character EER = 30% adapted word EER = 20% –maybe these character models pick up on the topic of that 1 conversation –haven’t tried any other size training data

Dan Gillick (25)January 20, 2004Language modeling for speaker recognition Outline Author identification Trying to beat GD’s result My next project

Dan Gillick (26)January 20, 2004Language modeling for speaker recognition Key-lists for speaker recognition key-list n-grams picked up on phrasing (comma and period were valuable tokens) –automatic transcripts don’t have punctuation but they do have pause and duration information use reg. exps. and duration info. to capture idiosynchratic speaker phrasing capture other speech information in key- lists? (energy, f0, etc.)

Dan Gillick (27)January 20, 2004Language modeling for speaker recognition Acknowledgements Thanks to: Anand and Luciana at SRI for trying to help me replicate their results Barbara for providing advice Barry and Kofi for helping with computers and stuff George