Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mandarin-English Information (MEI): Investigating Translingual Speech Retrieval Johns Hopkins University Center of Language and Speech Processing Summer.

Similar presentations


Presentation on theme: "Mandarin-English Information (MEI): Investigating Translingual Speech Retrieval Johns Hopkins University Center of Language and Speech Processing Summer."— Presentation transcript:

1 Mandarin-English Information (MEI): Investigating Translingual Speech Retrieval Johns Hopkins University Center of Language and Speech Processing Summer Workshop 2000 The MEI Team August 23, 2000

2 MEI Team Senior Members Students Helen Meng Chinese University of Hong Kong Erika Grams Advanced Analytic Tools Sanjeev Khudanpur Johns Hopkins University Gina-Anne Levow University of Maryland Douglas Oard University of Maryland Patrick Schone US Department of Defense Hsin-Min Wang Academia Sinica, Taiwan Berlin Chen National Taiwan University Wai-Kit Lo Chinese University of Hong Kong Karen Tang Princeton University Jianqiang Wang University of Maryland

3 Outline Motivation Background The Multi-scale Paradigm –multi-scale query processing –multi-scale document indexing –multi-scale retrieval The Perfect Retrieval Myth Experiments and Findings Conclusions and Future Work

4 Motivation Monolingual speech retrieval applications are emerging, e.g. –http://speechbot.research.compaq.com source: www.real.com, Feb 2000 Internet-accessible Radio and Television Stations

5 Source: Global Reach English 20002005 Motivation (cont): Internet User Population Chinese

6 MEI: The Big Picture Interactive Refinement Speech-to-Speech Translation English Spoken Documents Retrieval Engine English Text Query (Exemplar) English-to-Chinese Translation Mandarin Audio News Broadcasts Mandarin Audio Indexing (ASR) Ranked List of Mandarin Spoken Documents

7 Concept Demo Karen, Erika

8 Two Prevailing Problems in CL-SDR Translation problem –out-of-vocabulary (OOV) in translation –too many translations Recognition problem –OOV in recognition –acoustic confusions Solution: subword units may help –transliteration, e.g. Northern Ireland /bei3 ai4 er3 lan2/ (in query) –recognition of subword units, e.g. Iraq --> a rock (in document)

9 Background for Mandarin Speech Recognition 400 syllables –full phonological coverage in Mandarin Chinese 6,800 characters –full textual coverage in written Chinese (GB-coded) –each character pronounced as a syllable Unknown number of Chinese words –one to several characters per word –character combinations create different meanings –ambiguity in word tokenization

10 OOV and Acoustic Confusions in Mandarin SDR Query: …Iraq...

11 Subwords for Retrieval Character n-grams –robust to word-level mismatches due to different tokenization Syllable n-grams –robust to word/character-level mismatches due to homophones Partial matches possible Pros Con Subwords contain reduced lexical knowledge c.f. words

12 The MEI Investigation Use of a multi-scale representation for crosslingual spoken document retrieval (CL-SDR) Words and subwords Research Challenges Multi-scale query translation Multi-scale audio indexing Multi-scale retrieval

13 Query by Example English Newswire Exemplars Mandarin Audio Stories President Bill Clinton and Chinese President Jiang Zemin engaged in a spirited, televised debate Saturday over human rights and the Tiananmen Square crackdown, and announced a string of agreements on arms control, energy and environmental matters. There were no announced breakthroughs on American human rights concerns, including Tibet, but both leaders accentuated the positive … 美国总统克林顿的助手赞扬中国官员允许电视现场直播克林顿和江泽民在首脑会晤后举行 的联合记者招待会。。特别是一九八九镇压民主运动的决定。他表示镇压天安门民主运动 是错误的,他还批评了中国对西藏精神领袖达 国家安全事务助理伯格表示,这次直播让中国 人第一次在种公开的论坛上听到围绕敏感的人权问题的讨论。在记者招待会上 …

14 Evaluation Collection 2265 manually segmented stories 3371 manually segmented stories Development Collection: TDT-2 Evaluation Collection: TDT-3 Mar 98 Oct 98Dec 98 17 topics, variable number of exemplars Jun 98Jan 98 Exhaustive relevance assessment based on event overlap English text topic exemplars: Associated Press New York Times Mandarin audio broadcast news: Voice of America 56 topics, variable number of exemplars

15 Cross-Language Speech Retrieval American English Text Exemplar Ranked List of News Stories Mandarin Chinese Broadcast News Abstract Task Model

16 Evaluation of Ranked Lists VOA 0427.22 VOA 0521.14 VOA 0604.39 VOA 0419.12 VOA 0527.13 VOA 0513.17 … Relevant Not Relevant Not Relevant … Relevance Judgments

17 Recall-Precision Graph

18 Variation Across Exemplars

19 Average Across Exemplars 0.353

20 Variation Across Topics 0.0 0.2 0.4 0.6 0.8 1.0 Mean Uninterpolated Average Precision Topic

21 Comparing Two Systems Topic

22 Significance Testing Statistical significance –Null hypothesis: mean average precision across topics is drawn from same distribution –Paired 2-tailed t-test, significant if p<0.05 For System A vs. System B, p=0.94 Meaningful differences –Rule of thumb: 5-10% relative For System A vs. System B, relative difference is <1%

23 Translingual and Multi-Scale Query Processing

24 Mandarin Audio Term Translation President Bill Clinton and… English Exemplar Term Selection Bilingual Term List Query Construction Mandarin IR System Story Boundaries Evaluation Named Entity Tagging Document Construction Speech Recognition Relevance Judgments Ranked List BBN U Mass LDC Cornell Dragon LDC 000100010000010100 Mean Uninterpolated Average Precision

25 Multi-Scale Query Translation Words and Phrases (Gina, Sanjeev) Subwords (Helen, Wai-Kit, Berlin, Karen)

26 Bilingual Term List Combination of –LDC English-Chinese bilingual term list –Chinese-English Translation Assistance File (CETA) [inverted] 199,444 395,216 81,127 105,750 Total English Terms Total Translation Pairs Phrasal Terms Phrasal Translation Pairs Term human right(s) human rights # translations 7 30 1

27 Query Term Selection Tagged named entities (BBN Identifinder) –Person: partners of Goldman, Sachs, & Co. –Organization: UN Security Council Dictionary-based “phrases” –translatable multi-word units, e.g –“Wall Street”, “best interests”, “guiding principles”, “human rights” –automatic tagging: greedy, left-to-right, max match Chi-squared filtering –Compared to English background model

28 Query Term Translation Named entities –if absent from dictionary, translate individual terms e.g. “Security Council” versus “First Bank of Siam” Numeric Expressions –special processing for digits e.g. “12:30 pm, June 15, 1969” Remaining terms –Consult bilingual term list, lemmatize if necessary e.g. “televised” translates as “television”

29 Query Construction Unbalanced queries –Use all plausible translations for each term Balanced queries –Pseudo-term weight: average of translations’ weights Structured queries –Recompute pseudo-term weight from translations’ term frequency and document frequency

30 Strategies in Query Translation Phrase based translation is significantly better Named entities and numeral translations are (barely) helpful Balanced translation matches Structured queries –also extends easily to subword units

31 Untranslatable Terms suharto97 (# of occurrences) netanyahu88 starr62 arafat50 bjp45 vajpayee44 estrada44 …. hsu19 zemin7 # (by token) 87,004 3,028 # (by type) 12,402 1,122 Terms total OOV

32 Subword Transliteration English Query Exemplar Mandarin Audio Document ……..Kosovo…... …../ke-suo-fo/…. Sound alike --> match in phonetic space? Kosovo (/ke1-suo3-wo4/, /ke1-suo3-fo2/, /ke1-suo3-fu1, /ke1-suo3-fu2/)

33 Subword Transliteration Procedure (1) Named Entities PinYin / WadeGiles Spellings e.g. Wang Jianqiang, Wang Hsinmin Syllables, e.g. wang jian qiang wang xin min Acquire English Pronunciation PRONLEX Lookup Spelling-to-Pron Generation e.g. christopher English Phones, e.g. /kk rr ih ss tt aa ff er/ Trans. Error-Driven Learning [Brill 1994] PRONLEX, 85K(train), 4.5K (test) 82%(phoneme), 45% (word)

34 Subword Transliteration Procedure (2) Cross-lingual Phonetic Mapping English phones to Chinese “phones” Trans. Error-Driven Learning 4800 words (train) [Chen H. H., NTU; WWW] FST aligns Eng / Chin phones /k e l i s i t uo f u/ Chinese phone lattice generation Syllable bigram language model N-best syllable sequence hyp N=1 (one-best hypothesis) /ji li si te fu/ (hyp) /ke li si tuo fu/ (ref) /kk rr ih ss tt aa ff er/ Cross-lingual Phonological Rules Syllable nuclei insertion Handle consonant clusters Word-final consonants, etc. /kk ax rr ih ss ax tt aa ff er/

35 Subword Transliteration /kk ax rr ih ss ax tt aa ff er/ Cross-lingual Phonetic Mapping English phones to Chinese “phones” Trans. Error-Driven Learning 4800 words (train) [Chen H. H., NTU, WWW] FST-aligned phones /k e l i s i t uo f u/ Chinese phone lattice generation Syllable bigram language model N-best syllable sequence hyp N=1 (one-best hypothesis) /ji li si te fu/ (hyp) /ke li si tuo fu/ (ref) N.B. Character bigram language model can produce

36 Cross Lingual Phonetic Matching Documents are indexed with syllable bigrams (in addition to words and character bigrams if necessary) Query terms are translated as words where possible, phonetically where necessary

37 Multi-Scale Query Construction Helen

38 Multi-Scale Query Construction: Objectives Query Construction Bag of English query terms (selected) Multi-scale query representation in Chinese Multi-scale representation integrates: translated phrases, named entities, numeric expressions, translated terms transliterated syllables words, characters and syllable n-grams

39 Multi-Scale Query Construction Procedures Syllable bigrams and Transliterations yi-se se-lie shou-xiang ben-jie jie-ming ne-tan tan-ya ya-hu English Bag of Terms Israeli Prime Minister Benjamin Netanyahu Chinese Translations and Transliteration ne-tan tan-ya ya-hu Character bigrams and Transliterations ne-tan tan-ya ya-hu words + syl bigrams char + syl bigrams syl bigrams

40 Multi-Scale Audio Document Indexing Hsin-min, Helen, Berlin, and Wai-kit

41 Previous Chinese Example

42 Audio Document Indexing Objectives Augment words with subword-based indexing Dragon word recognition outputs are provided Character-based indexing –Characters derived from Dragon’s recognized words Syllable-based indexing –Syllables derived by pronunciation lookup using Dragon’s recognized words Address Dragon’s ASR errors –Augment with alternative (word/char/syl) hypotheses e.g. syllable lattice [Chen & Wang, ICASSP-2000]

43 Syllable Lattice Development Dragon’s syl Dragon’s recognition accuracies –Evaluated against anchor scripts –82.0%(word) 87.9%(char) 92.1%(syl) –Syllable substitution errors (5.2%) MEI’s syllable recognition accuracy –Trained on Hub4 Mandarin (VOA, 11 hours, 1997) –70.2% (syl)  !!! Alternative syl Develop a syllable recognizer to produce lattice representation

44 Strategy Improve MEI’s syllable recognizer Design a structure for document indexing which incorporates –Dragon’s word / character / syllable hypotheses –MEI’s syllable hypotheses (hopefully complementary to Dragon’s syllables)

45 MEI Syllable Recognizer: Improve Acoustic Models VOA Audio for Doc i Forced Alignment Speaker Adaptation Speaker-Adapted Acoustic Models Baseline Acoustic Models Syllable Recognition MEI Syllables for Doc i Forced alignment with Dragon’s output for each document Blind speaker adaptation with Dragon’s syllables MEI syllable accuracy: 70.2%(original)  87.7% !!! Dragon Outputs for Doc i

46 MEI Syllable Recognizer: Incorporate Language Model VOA Audio for Doc i Dragon Outputs for Doc i Forced Alignment Speaker Adaptation Speaker-Adapted Acoustic Models Baseline Acoustic Models Syllable Recognition MEI Syllables for Doc i 1998 Xinhua Language Models Syllable trigram language model MEI syllable accuracy: 70.2%  87.7%  90.0% !!!

47 Audio Document Indexing with Multiple Syllable Recognition Outputs Dragon’s syl MEI ’ s syl Two separate recognition outputs Dragon’s syl MEI ’ s syl The revised syllable lattice

48 Multi-scale Audio Document Indexing MEI ’ s syl Dragon’s word Dragon’s syl Dragon’s chr

49 Fusion of Words and Subwords in Multi-Scale Retrieval Wai-Kit Lo, Pat Schone

50 Merging ranked lists from separate runs For each query and document pair, the score is recalculated as –w k are the weights for different retrieval runs –K denotes a retrieval run at some scale (word, characters, syllables, combinations) –S k (Q i, D j ) is a rank-based score between query i and document j in retrieval run k Loose Coupling

51 Word Char2 Syl2 Word fusion

52 Tight Coupling Unified indexing of words and subword ngrams For query and documents –Combine terms at different scales to form a multi- scale query/document representation, e.g. Multi-scale retrieval produces a single ranked list yi-se se-lie shou-xiang ben-jie jie-ming ne-tan tan-ya ya-hu

53 Loose vs Tight Coupling Tight coupling combines document scores before ranking –may need weight optimization Loose coupling combines lists post-hoc –outperforms individual lists

54 The Perfect Retrieval Myth Erika, Helen, Hsin-Min, Jian Qiang, Berlin

55 Differences in News Sources The Perfect Retrieval Myth 100% Average Precision = ALL relevant docs and ZERO non-relevant docs retrieved Query Processing English Newswire Article Term selection Translation errors Translation ambiguity OOV Document Processing Mandarin Audio Files Speech recognition errors Word tokenization ambiguity OOV Is corrupted by...

56 “Bounds” on Word-Based Systems Using Mandarin VOA documents as exemplars –matched condition Using Xinhua text documents as exemplar –source mismatch Using manual translations of NYT documents as exemplars

57 “Bounds” on Subword-Based Systems Character bigrams for indexing –marginally outperforms word-based systems Syllable bigrams –are quite competitive, though somewhat behind Mean average precision ~0.6 is a good CL-SDR target

58 TDT-2 Results

59 Retrieval Performance on TDT2

60 TDT-3 Results

61 Retrieval Performance on TDT3

62 Summary and Conclusions Novel multi-scale paradigm for CL-SDR –ameliorates the translation and recognition OOV problems Multi-scale query and document processing –cross-lingual subword transliteration procedure (CLPM) –query and document construction embeds words / characters / syllables –balanced and structured queries Multi-scale retrieval –tight and loose coupling strategies to fuse words and subwords for retrieval

63 Summary and Conclusions (2) Extensive experiments on TDT-2, TDT-3 –character bigrams typically outperform words or syllable bigrams in retrieval –fusion of word and subword units shows potential in multi-scale retrieval –syllable lattice needs further investigation

64 Future Work Word-subword fusion techniques merit further investigation Multi-scale query expansion for retrieval performance improvement (Wai-Kit) Incorporation of acoustic scores in syllable lattice representation for documents

65 END


Download ppt "Mandarin-English Information (MEI): Investigating Translingual Speech Retrieval Johns Hopkins University Center of Language and Speech Processing Summer."

Similar presentations


Ads by Google