1 What are n-grams good for?. 600.465 – Intro to NLP – J. Eisner2 Language ID (see transparencies) Useful for search engines, indexers, etc. Useful for.

Slides:



Advertisements
Similar presentations
1 Drafting Preparing the Initial Composition. 2 Drafting Basics While prewriting activities and tools can assist the writer during initial drafting, the.
Advertisements

Teaching Order of the Tenses; Backshift
Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Chapter 5: Introduction to Information Retrieval
The Practical Value of Statistics for Sentence Generation: The Perspective of the Nitrogen System Irene Langkilde-Geary.
Introduction to Information Retrieval
This is a Mr. Levoy PowerPoint The United States Court System.
Bell Ringer Who was the Supreme Allied Commander of the Normandy Invasion? (hint: he was American) Which US President made the decision to drop the Atomic.
Multilingual Text Retrieval Applications of Multilingual Text Retrieval W. Bruce Croft, John Broglio and Hideo Fujii Computer Science Department University.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.
Morris LeBlanc.  Why Image Retrieval is Hard?  Problems with Image Retrieval  Support Vector Machines  Active Learning  Image Processing ◦ Texture.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Natural Language Processing Prof: Jason Eisner Webpage: syllabus, announcements, slides, homeworks.
The Bill of Rights First 10 Amendments to the U.S. Constitution.
Chapter 5: Information Retrieval and Web Search
By: Sudesh Kalyanswamy and Ross Carstens
SI485i : NLP Day 1 Intro to NLP. Assumptions about You You know… how to program Java basic UNIX usage basic probability and statistics (we’ll also review)
Evidence from Content INST 734 Module 2 Doug Oard.
Gen 3:15 And I will put enmity between you and the woman, and between your seed and her Seed; He will bruise your head, and you shall bruise His heel.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
ECA 228 Internet/Intranet Design I Meta Tags & Directories.
The Bill of Rights History Alive Chapter 15.
About the Author … George Orwell’s real name is Eric Blair.
Vitaly Klyuev, Software Engineering Lab, room 342-C Software Engineering Lab is looking for students who are interested in Software Engineering for Internet.
Evaluation David Kauchak cs160 Fall 2009 adapted from:
Text Analysis Everything Data CompSci Spring 2014.
1 Ling 569: Introduction to Computational Linguistics Jason Eisner Johns Hopkins University Tu/Th 1:30-3:20 (also this Fri 1-5)
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
Slide No. 1 Searching the Web H Search engines and directories H Locating these resources H Using these resources H Interpreting results H Locating specific.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
1 Computational Linguistics Ling 200 Spring 2006.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
Natural Language Processing Daniele Quercia Fall, 2000.
Intro to NLP - J. Eisner1 Bayes’ Theorem.
Natural language processing tools Lê Đức Trọng 1.
Introduction to Computational Linguistics
NLP. Introduction to NLP Very important for language processing Example in speech recognition: –“recognize speech” vs “wreck a nice beach” Example in.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Higher Vision, language and movement. Strong AI Is the belief that AI will eventually lead to the development of an autonomous intelligent machine. Some.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
 SEO Terms A few additional terms Search site: This Web site lets you search through some kind of index or directory of Web sites, or perhaps both an.
The Nuremberg Trials lasted a little over a year. 24 Nazis were put on trial. 12 were sentenced to death 2 died before the trial was over 3 were acquitted.
Language Model for Machine Translation Jang, HaYoung.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
NLP.
Search in Google's N-grams
User Modeling for Personal Assistant
Measuring Monolinguality
What are n-grams good for?
Fidel Castro: Information Retrieval for 37 years of Socialist Oratory
Basque language: is IT right on?
10/13/2017.
Boolean Retrieval Term Vocabulary and Posting Lists Web Search Basics
Multimedia Information Retrieval
World War I Do Nows US History 11B.
Word embeddings (continued)
Natural Language Processing (NLP) Systems Joseph E. Gonzalez
START Grammar Rules and Exercises Márton Lévai May 19, 2017
Presentation transcript:

1 What are n-grams good for?

– Intro to NLP – J. Eisner2 Language ID (see transparencies) Useful for search engines, indexers, etc. Useful for text-to-speech (how to pronounce the name “Jan Lukasiewicz”?)

– Intro to NLP – J. Eisner3 How to Build a Language IDer? Histogram of letters? Histogram of bigrams? Compare to histograms from known language text But how to aggregate the evidence? –Could steal techniques from information retrieval –Treat every language as a huge document and input text as a query Nicer solution: How likely is this text to be generated randomly?

– Intro to NLP – J. Eisner4 Text Categorization Automatic Yahoo classification, etc. Similar to language ID … Topic 1 sample: In the beginning God created … Topic 2 sample: The history of all hitherto existing society is the history of class struggles. … Input text: Matt’s Communist Homepage. Capitalism is unfair and has been ruining the lives of millions of people around the world. The profits from the workers’ labor … Input text: And they have beat their swords to ploughshares, And their spears to pruning-hooks. Nation doth not lift up sword unto nation, neither do they learn war any more. …

– Intro to NLP – J. Eisner5 Topic Segmentation Break big document or media stream into indexable chunks From NPR’s All Things Considered: The U. N. says its observers will stay in Liberia only as long as West African peacekeepers do, but West African states are threatening to pull out of the force unless Liberia’s militia leaders stop violating last year’s peace accord after 7 weeks of chaos in the capital, Monrovia … Human rights groups cite peace troops as among those smuggling the arms. I’m Jennifer Ludden, reporting. Whitewater prosecution witness David Hale began serving a 28-month prison sentence today. The Arkansas judge and banker pleaded guilty two years ago to defrauding the Small Business Administration. Hale was the main witness in the Whitewater-related trial that led to the convictions …

– Intro to NLP – J. Eisner6 Contextual Spelling Correction Which is most probable? –… I think they’re okay … –… I think there okay … –… I think their okay … Which is most probable? –… by the way, are they’re likely to … –… by the way, are there likely to … –… by the way, are their likely to …

– Intro to NLP – J. Eisner7 Speech Recognition Listen carefully: what am I saying? How do you recognize speech? How do you wreck a nice beach? Put the file in the folder Put the file and the folder

– Intro to NLP – J. Eisner8 Language generation Choose randomly among outputs: –Visitant which came into the place where it will be Japanese has admired that there was Mount Fuji. Top 10 outputs according to bigram probabilities: –Visitors who came in Japan admire Mount Fuji. –Visitors who came in Japan admires Mount Fuji. –Visitors who arrived in Japan admire Mount Fuji. –Visitors who arrived in Japan admires Mount Fuji. –Visitors who came to Japan admire Mount Fuji. –A visitor who came in Japan admire Mount Fuji. –The visitor who came in Japan admire Mount Fuji. –Visitors who came in Japan admire Mount Fuji. –The visitor who came in Japan admires Mount Fuji. –Mount Fuji is admired by a visitor who came in Japan.

– Intro to NLP – J. Eisner9 Machine Translation good English? (n-gram) good match to French? Jon appeared in TV. Appeared on Jon TV. In Jon appeared TV. Jon is happy today. Jon appeared on TV. TV appeared on Jon. TV in Jon appeared. Jon was not happy.