Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church Dec 9, 20091.

Slides:



Advertisements
Similar presentations
Sentiment Analysis and The Fourth Paradigm MSE 2400 EaLiCaRA Spring 2014 Dr. Tom Way.
Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Evaluating the Waspbench A Lexicography Tool Incorporating Word Sense Disambiguation Rob Koeling, Adam Kilgarriff, David Tugwell, Roger Evans ITRI, University.
Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church Dec 9,
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Chapter 1: Introduction to Pattern Recognition
Midterm Review CS4705 Natural Language Processing.
CS 4705 Lecture 13 Corpus Linguistics I. From Knowledge-Based to Corpus-Based Linguistics A Paradigm Shift begins in the 1980s –Seeds planted in the 1950s.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Part 5 Language Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ELN – Natural Language Processing Giuseppe Attardi
Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
NLP Language Models1 Language Models, LM Noisy Channel model Simple Markov Models Smoothing Statistical Language Models.
Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Introduction to CL & NLP CMSC April 1, 2003.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
For Wednesday No reading Homework –Chapter 23, exercise 15 –Process: 1.Create 5 sentences 2.Select a language 3.Translate each sentence into that language.
ICS 482: Natural language Processing Pre-introduction
Chapter 23: Probabilistic Language Models April 13, 2004.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
CSA3202 Human Language Technology HMMs for POS Tagging.
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
Problem 1: Word Segmentation whatdoesthisreferto.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
Computational Linguistics Courses Experiment Test.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A method to restrict the blow-up of hypotheses... A method to restrict the blow-up of hypotheses of a non-disambiguated shallow machine translation system.
23.3 Information Extraction More complicated than an IR (Information Retrieval) system. Requires a limited notion of syntax and semantics.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
10/13/2017.
Statistical NLP: Lecture 13
Text Analytics Giuseppe Attardi Università di Pisa
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Statistical Machine Translation Part IIIb – Phrase-based Model
Lecture 13 Corpus Linguistics I CS 4705.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church Dec 9, 20091

Applications Recognition: Shannon’s Noisy Channel Model – Speech, Optical Character Recognition (OCR), Spelling Transduction – Part of Speech (POS) Tagging – Machine Translation (MT) Parsing: ??? Ranking – Information Retrieval (IR) – Lexicography Discrimination: – Sentiment, Text Classification, Author Identification, Word Sense Disambiguation (WSD) Segmentation – Asian Morphology (Word Breaking), Text Tiling Alignment: Bilingual Corpora, Dotplots Compression Language Modeling: good for everything Dec 9, 20092

3 Speech  Language Shannon’s: Noisy Channel Model I  Noisy Channel  O I΄ ≈ ARGMAX I Pr(I|O) = ARGMAX I Pr(I) Pr(O|I) Trigram Language Model WordRankMore likely alternatives We9 The This One Two A Three Please In need7are will the would also do to1 resolve85have know do… all9 The This One Two A Three Please In of2 The This One Two A Three Please In the1 important657document question first… issues14thing point to Channel Model ApplicationInputOutput Speech Recognitionwriterrider OCR (Optical Character Recognition) allalla1la1l Spelling Correctiongovernmentgoverment Channel Model Language Model Application Independent Dec 9, 2009

4 Speech  Language Using (Abusing) Shannon’s Noisy Channel Model: Part of Speech Tagging and Machine Translation Speech – Words  Noisy Channel  Acoustics OCR – Words  Noisy Channel  Optics Spelling Correction – Words  Noisy Channel  Typos Part of Speech Tagging (POS): – POS  Noisy Channel  Words Machine Translation: “Made in America” – English  Noisy Channel  French Didn’t have the guts to use this slide at Eurospeech (Geneva) Dec 9, 2009

5

Spelling Correction Dec 9, 20096

7

8

9

10

Evaluation Dec 9,

Performance Dec 9,

The Task is Hard without Context Dec 9,

Easier with Context actuall, actual, actually – … in determining whether the defendant actually will die. constuming, consuming, costuming conviced, convicted, convinced confusin, confusing, confusion workern, worker, workers Dec 9,

Dec 9, Easier with Context

Context Model Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Future Improvements Add More Factors – Trigrams – Thesaurus Relations – Morphology – Syntactic Agreement – Parts of Speech Improve Combination Rules – Shrink (Meaty Methodology) Dec 9,

Dec 9,

Conclusion (Spelling Correction) There has been a lot of interest in smoothing – Good-Turing estimation – Knesser-Ney Is it worth the trouble? Ans: Yes (at least for recognition applications) Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Aligning Words Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,

Dec 9,