Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church Dec 9, 20091.

Similar presentations


Presentation on theme: "Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church Dec 9, 20091."— Presentation transcript:

1 Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church Kenneth.Church@jhu.edu Dec 9, 20091

2 Applications Recognition: Shannon’s Noisy Channel Model – Speech, Optical Character Recognition (OCR), Spelling Transduction – Part of Speech (POS) Tagging – Machine Translation (MT) Parsing: ??? Ranking – Information Retrieval (IR) – Lexicography Discrimination: – Sentiment, Text Classification, Author Identification, Word Sense Disambiguation (WSD) Segmentation – Asian Morphology (Word Breaking), Text Tiling Alignment: Bilingual Corpora, Dotplots Compression Language Modeling: good for everything Dec 9, 20092

3 3 Speech  Language Shannon’s: Noisy Channel Model I  Noisy Channel  O I΄ ≈ ARGMAX I Pr(I|O) = ARGMAX I Pr(I) Pr(O|I) Trigram Language Model WordRankMore likely alternatives We9 The This One Two A Three Please In need7are will the would also do to1 resolve85have know do… all9 The This One Two A Three Please In of2 The This One Two A Three Please In the1 important657document question first… issues14thing point to Channel Model ApplicationInputOutput Speech Recognitionwriterrider OCR (Optical Character Recognition) allalla1la1l Spelling Correctiongovernmentgoverment Channel Model Language Model Application Independent Dec 9, 2009

4 4 Speech  Language Using (Abusing) Shannon’s Noisy Channel Model: Part of Speech Tagging and Machine Translation Speech – Words  Noisy Channel  Acoustics OCR – Words  Noisy Channel  Optics Spelling Correction – Words  Noisy Channel  Typos Part of Speech Tagging (POS): – POS  Noisy Channel  Words Machine Translation: “Made in America” – English  Noisy Channel  French Didn’t have the guts to use this slide at Eurospeech (Geneva) Dec 9, 2009

5 5

6 Spelling Correction Dec 9, 20096

7 7

8 8

9 9

10 10

11 Evaluation Dec 9, 200911

12 Performance Dec 9, 200912

13 The Task is Hard without Context Dec 9, 200913

14 Easier with Context actuall, actual, actually – … in determining whether the defendant actually will die. constuming, consuming, costuming conviced, convicted, convinced confusin, confusing, confusion workern, worker, workers Dec 9, 200914

15 Dec 9, 200915 Easier with Context

16 Context Model Dec 9, 200916

17 Dec 9, 200917

18 Dec 9, 200918

19 Dec 9, 200919

20 Dec 9, 200920

21 Future Improvements Add More Factors – Trigrams – Thesaurus Relations – Morphology – Syntactic Agreement – Parts of Speech Improve Combination Rules – Shrink (Meaty Methodology) Dec 9, 200921

22 Dec 9, 200922

23 Conclusion (Spelling Correction) There has been a lot of interest in smoothing – Good-Turing estimation – Knesser-Ney Is it worth the trouble? Ans: Yes (at least for recognition applications) Dec 9, 200923

24 Dec 9, 200924

25 Dec 9, 200925

26 Dec 9, 200926

27 Dec 9, 200927

28 Dec 9, 200928

29 Dec 9, 200929

30 Dec 9, 200930

31 Dec 9, 200931

32 Dec 9, 200932

33 Dec 9, 200933

34 Dec 9, 200934

35 Dec 9, 200935

36 Dec 9, 200936

37 Dec 9, 200937

38 Dec 9, 200938

39 Dec 9, 200939

40 Dec 9, 200940

41 Dec 9, 200941

42 Dec 9, 200942

43 Dec 9, 200943

44 Dec 9, 200944

45 Dec 9, 200945

46 Dec 9, 200946

47 Aligning Words Dec 9, 200947

48 Dec 9, 200948

49 Dec 9, 200949

50 Dec 9, 200950

51 Dec 9, 200951

52 Dec 9, 200952

53 Dec 9, 200953

54 Dec 9, 200954

55 Dec 9, 200955

56 Dec 9, 200956

57 Dec 9, 200957

58 Dec 9, 200958

59 Dec 9, 200959

60 Dec 9, 200960

61 Dec 9, 200961

62 Dec 9, 200962

63 Dec 9, 200963

64 Dec 9, 200964


Download ppt "Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church Dec 9, 20091."

Similar presentations


Ads by Google