CS 4705 Lecture 13 Corpus Linguistics I. From Knowledge-Based to Corpus-Based Linguistics A Paradigm Shift begins in the 1980s –Seeds planted in the 1950s.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word-counts, visualizations and N-grams Eric Atwell, Language Research.
Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Introduction to Computational Linguistics
N-Grams and Corpus Linguistics 6 July Linguistics vs. Engineering “But it must be recognized that the notion of “probability of a sentence” is an.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 11 (Lab): Probability reminder.
CMSC 723 / LING 645: Intro to Computational Linguistics February 25, 2004 Lecture 5 (Dorr): Intro to Probabilistic NLP and N-grams (chap ) Prof.
1 I256: Applied Natural Language Processing Marti Hearst Sept 13, 2006.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
A BAYESIAN APPROACH TO SPELLING CORRECTION. ‘Noisy channels’ In a number of tasks involving natural language, the problem can be viewed as recovering.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Advanced AI - Part II Luc De Raedt University of Freiburg WS 2004/2005 Many slides taken from Helmut Schmid.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
September BASIC TECHNIQUES IN STATISTICAL NLP Word prediction n-grams smoothing.
N-Grams and Language Modeling
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
I256 Applied Natural Language Processing Fall 2009 Lecture 7 Practical examples of Graphical Models Language models Sparse data & smoothing Barbara Rosario.
Introduction to Language Models Evaluation in information retrieval Lecture 4.
Spelling Checkers Daniel Jurafsky and James H. Martin, Prentice Hall, 2000.
COMP205 Comparative Programming Languages Part 1: Introduction to programming languages Lecture 2: Structure of programs and programming languages as communication.
Statistical Natural Language Processing Advanced AI - Part II Luc De Raedt University of Freiburg WS 2005/2006 Many slides taken from Helmut Schmid.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Metodi statistici nella linguistica computazionale The Bayesian approach to spelling correction.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Natural Language Understanding
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
NGrams 09/16/2004 Instructor: Rada Mihalcea Note: some of the material in this slide set was adapted from an NLP course taught by Bonnie Dorr at Univ.
Formal Models of Language. Slide 1 Language Models A language model an abstract representation of a (natural) language phenomenon. an approximation to.
Using Assistive Technology to Teach Students with Learning Disabilities By: Alexis Schoen Educ. 509: Computers in Elementary Education
BİL711 Natural Language Processing1 Statistical Language Processing In the solution of some problems in the natural language processing, statistical techniques.
9/8/20151 Natural Language Processing Lecture Notes 1.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church Dec 9,
1 Computational Linguistics Ling 200 Spring 2006.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.
NLP Language Models1 Language Models, LM Noisy Channel model Simple Markov Models Smoothing Statistical Language Models.
Language Modeling Anytime a linguist leaves the group the recognition rate goes up. (Fred Jelinek)
Introduction to CL & NLP CMSC April 1, 2003.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Introduction to Computational Linguistics
Introduction Chapter 1 Foundations of statistical natural language processing.
A COMPARISON OF HAND-CRAFTED SEMANTIC GRAMMARS VERSUS STATISTICAL NATURAL LANGUAGE PARSING IN DOMAIN-SPECIFIC VOICE TRANSCRIPTION Curry Guinn Dave Crist.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
What makes communication by language possible? “What makes the task [of understanding others] practicable at all is the structure the normative character.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Spelling Correction and the Noisy Channel Real-Word Spelling Correction.
PRESENTED BY: PEAR A BHUIYAN
10/13/2017.
Statistical NLP: Lecture 13
Introduction to Textual Analysis
CS416 Compiler Design lec00-outline September 19, 2018
CS621/CS449 Artificial Intelligence Lecture Notes
Natural Language Processing
CS416 Compiler Design lec00-outline February 23, 2019
Lecture 13 Corpus Linguistics I CS 4705.
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Artificial Intelligence 2004 Speech & Natural Language Processing
CS249: Neural Language Model
Presentation transcript:

CS 4705 Lecture 13 Corpus Linguistics I

From Knowledge-Based to Corpus-Based Linguistics A Paradigm Shift begins in the 1980s –Seeds planted in the 1950s (Harris, Firth) –Cut off by Chomsky –Renewal due to Interest in practical applications (ASR, MT, …) Availability at major industrial labs of powerful machines and large amounts of storage Increasing availability of large online texts and speech data Crossover efforts with ASR community, fostered by DARPA

For many practical tasks, statistical methods perform better Less knowledge required by researchers

Next Word Prediction An ostensibly artificial task: predicting the next word in a sequence. From a NY Times story... –Stocks plunged this …. –Stocks plunged this morning, despite a cut in interest rates –Stocks plunged this morning, despite a cut in interest rates by the Federal Reserve, as Wall... –Stocks plunged this morning, despite a cut in interest rates by the Federal Reserve, as Wall Street began

–Stocks plunged this morning, despite a cut in interest rates by the Federal Reserve, as Wall Street began trading for the first time since last … –Stocks plunged this morning, despite a cut in interest rates by the Federal Reserve, as Wall Street began trading for the first time since last Tuesday's terrorist attacks.

Human Word Prediction Clearly, at least some of us have the ability to predict future words in an utterance. How? –Domain knowledge –Syntactic knowledge –Lexical knowledge

Claim A useful part of the knowledge needed to allow Word Prediction (guessing the next word) can be captured using simple statistical techniques. In particular, we'll rely on the notion of the probability of a sequence (e.g., sentence) and the likelihood of words co-occurring

Why would we want to do this? Why would anyone want to predict a word? –If you say you can predict the next word, it means you can rank the likelihood of sequences containing various alternative words, or, alternative hypotheses –You can assess the likelihood/goodness of an hypothesis

Many NLP problems can be modeled as mapping from one string of symbols to another. In statistical language applications, knowledge of the source (e.g, a statistical model of word sequences) is referred to as a Language Model or a Grammar

Why is this useful? Example applications that employ language models: Speech recognition Handwriting recognition Spelling correction Machine translation systems Optical character recognizers

Real Word Spelling Errors They are leaving in about fifteen minuets to go to her house. The study was conducted mainly be John Black. The design an construction of the system will take more than a year. Hopefully, all with continue smoothly in my absence. Can they lave him my messages? I need to notified the bank of…. He is trying to fine out.

Handwriting Recognition Assume a note is given to a bank teller, which the teller reads as I have a gub. (cf. Woody Allen) NLP to the rescue …. –gub is not a word –gun, gum, Gus, and gull are words, but gun has a higher probability in the context of a bank

For Spell Checkers Collect a list of commonly substituted words –piece/peace, whether/weather, their/there... –Whenever you encounter one of these words in a sentence, construct the alternative sentence as well –Assess the goodness of each and choose the one (word) with the more likely sentence E.g. On Tuesday, the whether On Tuesday, the weather

The Noisy Channel Model A probabilistic model developed by Claude Shannon to model communication (as over a phone line)  Noisy Channel  O = argmaxPr(I|O) = argmaxPr(I) Pr(O|I) I I the most likely input Pr(I) the prior probability Pr(I|O) the most likely I given O Pr(O|I) the probability that O is the output if I is the input

Review: Basic Probability Prior Probability (or unconditional probability) –P(A), where A is some event –Possible events: it raining, the next person you see being Scandinavian, a child getting the measles, the word ‘warlord’ occurring in the newspaper Conditional Probability –P(A | B) –the probability of A, given that we know B –E.g. it raining, given that we know it’s October; the next person you see being Scandinavian, given that you’re in Sweden, the word ‘warlord’ occurring in a story about Afghanistan

Example F F F F F F I I I I P(Finn) =.6 P(skier) =.5 P(skier|Finn) =.67 P(Finn|skier) =.8

Next class Midterm Next class: –Hindle & Rooth 1993 –Begin studying semantics, Ch. 14