Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mandarin-English Information (MEI) Johns Hopkins University Summer Workshop 2000 presented at the TDT-3 Workshop February 28, 2000 Helen Meng The Chinese.

Similar presentations


Presentation on theme: "Mandarin-English Information (MEI) Johns Hopkins University Summer Workshop 2000 presented at the TDT-3 Workshop February 28, 2000 Helen Meng The Chinese."— Presentation transcript:

1 Mandarin-English Information (MEI) Johns Hopkins University Summer Workshop 2000 presented at the TDT-3 Workshop February 28, 2000 Helen Meng The Chinese University of Hong Kong Sanjeev Khudanpur Johns Hopkins University Douglas W. Oard University of Maryland Hsin-Min Wang Academia Sinica, Taiwan

2 Outline Background The MEI Project –Multiscale Retrieval –Multiscale Translation Using the TDT-3 collection Schedule

3 Motivation Emerging speech retrieval applications –E.g., http://speechbot.research.compaq.com Increasing need for translingual audio search –1896 Internet accessible radio & TV stations –529 of these (28%) are not in English source: www.real.com

4 The Big Picture Speech to Speech Translation Translingual Audio Browsing Translingual Audio Search English Query English Audio SelectExamine MEI

5 Related Work TREC Spoken Document Retrieval –Close coupling of recognition and retrieval TREC Cross-Language Retrieval –Close coupling of translation and retrieval TDT-3 –Coupling recognition, translation and retrieval –Using baseline recognizer transcripts

6 The MEI Project Closely coupling recognition and translation –For the purpose of retrieval English text queries, Mandarin news audio Specific research issues: –Multi-scale retrieval –Multi-scale translation

7 Multi-scale Analysis of Mandarin Initial/Final Preme/Core Final Preme/Toneme /iang/ /ji/ /j/ /ang/ /i//a/ /ng/ /j/

8 Multi-scale Retrieval Subword-scale –Syllable lattice matching [Chen, Wang & Lee, 2000] –Overlapping syllable n-grams [Meng et al., 1999] –Skipped syllable pairs [Chen, Wang & Lee, 2000] –Syllable confusion matrix [Meng et al., 1999] Word-scale –Structured queries [Pirkola, 1998] Multi-scale –Unified retrieval using a merged feature set –Scale-optimized retrieval with result-set merging

9 Why Multi-scale Retrieval? Word-based retrieval exploits lexical knowledge –Enhances precision Subword units achieve complete phonological coverage –Enhances recall Combination of evidence may beat either alone

10 Multi-scale Translation Word-scale –Dictionary-based [Levow & Oard, 2000] –Parallel corpora [Nie, 1999] –Comparable corpora [Fung, 1998] Subword-scale –Cross-language phonetic map [Knight & Graehl, 1997] /bei2 ai4 er3 lan2/ Kosovo (/ke1-sou3-wo4/, /ke1-sou3-fo2/, /ke1-sou3-fu1/, /ke1-sou3-fu2/)

11 Using the TDT-3 Collection English queries formed from topic descriptions –2-4 words (simulated Web search) –Full topic description (simulated routing profile) Mandarin broadcast news audio (121 hours) –Story-boundary-known condition (4624 stories) –Baseline recognizer transcripts provide words

12 Schedule DecFebJunAprAug Six Weeks: Summer Workshop Planning Meeting First MEI Team Planning Meeting Second MEI Team Planning Meeting

13 Things We Need Ideas –To sharpen our focus Connections –To build a community of interest Resources –To build on what others have done

14 Background: Chinese Many dialects (e.g., Mandarin and Cantonese) –differences in phonetics, vocabularies, syntax… Syllable-based language –~400 base syllables, 4 lexical tones + light tone Syllable structure (CG)V(X) –(CG): onset, optional, consonant+medial glide –V:nuclear vowel –X:coda, glide / alveolar nasal / velar nasal –~ 21 initials, 39 finals

15 Background: Chinese (cont) Characters (written) -> syllables (spoken) Degenerate mapping – /hang2/, /hang4/, /heng2/ or /xing2/ –/fu4 shu4/ (LDC’s CALLHOME lexicon) Tokenization / Segmentation –/zhe4 yi1 wan3 hui4 ru2 chang2 ju3 xing2/


Download ppt "Mandarin-English Information (MEI) Johns Hopkins University Summer Workshop 2000 presented at the TDT-3 Workshop February 28, 2000 Helen Meng The Chinese."

Similar presentations


Ads by Google