Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting.

Slides:



Advertisements
Similar presentations
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Advertisements

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Statistical NLP: Lecture 3
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Advanced AI - Part II Luc De Raedt University of Freiburg WS 2004/2005 Many slides taken from Helmut Schmid.
Part II. Statistical NLP Advanced Artificial Intelligence Probabilistic Context Free Grammars Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Part II. Statistical NLP Advanced Artificial Intelligence Hidden Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
CS 4705 Lecture 13 Corpus Linguistics I. From Knowledge-Based to Corpus-Based Linguistics A Paradigm Shift begins in the 1980s –Seeds planted in the 1950s.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Part II. Statistical NLP Advanced Artificial Intelligence Probabilistic DCGs and SLPs Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Statistical Natural Language Processing Advanced AI - Part II Luc De Raedt University of Freiburg WS 2005/2006 Many slides taken from Helmut Schmid.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
9/8/20151 Natural Language Processing Lecture Notes 1.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Natural Language Processing Lecture 6 : Revision.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
SYNTAX Lecture -1 SMRITI SINGH.
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
Introduction to CL & NLP CMSC April 1, 2003.
NLP. Introduction to NLP Is language more than just a “bag of words”? Grammatical rules apply to categories and groups of words, not individual words.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
Linguistic Essentials
Rules, Movement, Ambiguity
Artificial Intelligence: Natural Language
CSA2050 Introduction to Computational Linguistics Parsing I.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
A COMPARISON OF HAND-CRAFTED SEMANTIC GRAMMARS VERSUS STATISTICAL NATURAL LANGUAGE PARSING IN DOMAIN-SPECIFIC VOICE TRANSCRIPTION Curry Guinn Dave Crist.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Natural Language Processing Slides adapted from Pedro Domingos
SYNTAX.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
NATURAL LANGUAGE PROCESSING
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
SYNTAX.
General Information on Context-free and Probabilistic Context-free Grammars İbrahim Hoça CENG784, Fall 2013.
Natural Language Processing Vasile Rus
Statistical NLP: Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Machine Learning in Natural Language Processing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
Linguistic Essentials
Lecture 13 Corpus Linguistics I CS 4705.
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
David Kauchak CS159 – Spring 2019
Presentation transcript:

Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting Some slides taken from Helmut Schmid, Rada Mihalcea, Bonnie Dorr, Leila Kosseim, Peter Flach and others

Topic  Statistical Natural Language Processing  Applies Machine Learning / Statistics to  Learning : the ability to improve one’s behaviour at a specific task over time - involves the analysis of data (statistics) Natural Language Processing  Following parts of the book Statistical NLP (Manning and Schuetze), MIT Press, 1999.

Contents  Motivation  Zipf’s law  Some natural language processing tasks  Non-probabilistic NLP models Regular grammars and finite state automata Context-Free Grammars Definite Clause Grammars  Motivation for statistical NLP  Overview of the rest of this part

Rationalism versus Empiricism  Rationalist Noam Chomsky - innate language structures AI : hand coding NLP Dominant view Cf. e.g. Steven Pinker’s The language instinct. (popular science book)  Empiricist Ability to learn is innate AI : language is learned from corpora Dominant and becoming increasingly important

Rationalism versus Empiricism  Noam Chomsky: But it must be recognized that the notion of “probability of a sentence” is an entirely useless one, under any known interpretation of this term  Fred Jelinek (IBM 1988) Every time a linguist leaves the room the recognition rate goes up. (Alternative: Every time I fire a linguist the recognizer improves)

This course  Empiricist approach Focus will be on probabilistic models for learning of natural language  No time to treat natural language in depth ! (though this would be quite useful and interesting) Deserves a full course by itself  Covered in more depth in Logic, Language and Learning (SS 05, prob. SS 06)

Ambiguity

Statistical Disambiguation Define a probability model for the data Compute the probability of each alternative Choose the most likely alternative NLP and Statistics

Statistical Methods deal with uncertainty. They predict the future behaviour of a system based on the behaviour observed in the past.  Statistical Methods require training data. The data in Statistical NLP are the Corpora NLP and Statistics

 Corpus: text collection for linguistic purposes  Tokens How many words are contained in Tom Sawyer?   Types How many different words are contained in T.S.?   Hapax Legomena words appearing only once Corpora

 The most frequent words are function words wordfreqwordfreq the3332in906 and2972that877 a1775he877 to1725I783 of1440his772 was1161you686 it1027Tom679 Word Counts

f n f > How many words appear f times? Word Counts About half of the words occurs just once About half of the text consists of the 100 most common words ….

Word Counts (Brown corpus)

wordfr f*rwordfr f*r the turned and you‘ll a name he comes but group be lead there friends one begin about family more brushed never sins Oh Could two Applausive Zipf‘s Law: f~1/r (f*r = const) Zipf‘s Law Minimize effort

Conclusions  Overview of some probabilistic and machine learning methods for NLP  Also very relevant to bioinformatics ! Analogy between parsing  A sentence  A biological string (DNA, protein, mRNA, …)

Language and sequences  Natural language processing Is concerned with the analysis of sequences of words / sentences Construction of language models  Two types of models Non-probabilistic Probabilistic

Human Language is highly ambiguous at all levels acoustic level recognize speech vs. wreck a nice beach morphological level saw: to see (past), saw (noun), to saw (present, inf) syntactic level I saw the man on the hill with a telescope semantic level One book has to be read by every student Key NLP Problem: Ambiguity

Language Model  A formal model about language  Two types Non-probabilistic  Allows one to compute whether a certain sequence (sentence or part thereof) is possible  Often grammar based Probabilistic  Allows one to compute the probability of a certain sequence  Often extends grammars with probabilities

Example of bad language model

A bad language model

A good language model  Non-Probabilistic “I swear to tell the truth” is possible “I swerve to smell de soup” is impossible  Probabilistic P(I swear to tell the truth) ~.0001 P(I swerve to smell de soup) ~ 0

Why language models ?  Consider a Shannon Game Predicting the next word in the sequence  Statistical natural language ….  The cat is thrown out of the …  The large green …  Sue swallowed the large green …  …  Model at the sentence level

Applications  Spelling correction  Mobile phone texting  Speech recognition  Handwriting recognition  Disabled users  …

Spelling errors  They are leaving in about fifteen minuets to go to her house.  The study was conducted mainly be John Black.  Hopefully, all with continue smoothly in my absence.  Can they lave him my messages?  I need to notified the bank of….  He is trying to fine out.

Handwriting recognition  Assume a note is given to a bank teller, which the teller reads as I have a gub. (cf. Woody Allen)  NLP to the rescue …. gub is not a word gun, gum, Gus, and gull are words, but gun has a higher probability in the context of a bank

For Spell Checkers  Collect list of commonly substituted words piece/peace, whether/weather, their/there...  Example: “On Tuesday, the whether …’’ “On Tuesday, the weather …”

Another dimension in language models  Do we mainly want to infer (probabilities) of legal sentences / sequences ? So far  Or, do we want to infer properties of these sentences ? E.g., parse tree, part-of-speech-tagging Needed for understanding NL  Let’s look at some tasks

Sequence Tagging  Part-of-speech tagging He drives with his bike N V PR PN N noun, verb, preposition, pronoun, noun  Text extraction The job is that of a programmer X X X X X X JobType The seminar is taking place from to X X X X X X Start End

Sequence Tagging  Predicting the secondary structure of proteins, mRNA, … X = A,F,A,R,L,M,M,A,… Y = he,he,st,st,st,he,st,he, …

Parsing  Given a sentence, find its parse tree  Important step in understanding NL

Parsing  In bioinformatics, allows to predict (elements of) structure from sequence

Language models based on Grammars  Grammar Types Regular grammars and Finite State Automata Context-Free Grammars Definite Clause Grammars  A particular type of Unification Based Grammar (Prolog)  Distinguish lexicon from grammar Lexicon (dictionary) contains information about words, e.g.  word - possible tags (and possibly additional information)  flies - V(erb) - N(oun) Grammar encode rules

Grammars and parsing  Syntactic level best understood and formalized  Derivation of grammatical structure: parsing (more than just recognition)  Result of parsing mostly parse tree: showing the constituents of a sentence, e.g. verb or noun phrases  Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata  Lexical information - which words are ? Det(erminer) N(oun) Vi (intransitive verb) - no argument Pn (pronoun) Vt (transitive verb) - takes an argument Adj (adjective)  Now accept The cat slept Det N Vi  As regular grammar S -> [Det] S1 % [ ] : terminal S1 -> [N] S2 S2 -> [Vi]  Lexicon The - Det Cat - N Slept - Vi …

Finite State Automaton  Sentences John smiles - Pn Vi The cat disappeared - Det N Vi These new shoes hurt - Det Adj N Vi John liked the old cat PN Vt Det Adj N

Phrase structure S NP DN VP NPV D N PP PNP D N the dog chased acatintothegarden

Notation  S : sentence  D or Det : Determiner (e.g., articles)  N : noun  V : verb  P : preposition  NP : noun phrase  VP : verb phrase  PP : prepositional phrase

Context Free Grammar S-> NP VP NP-> D N VP-> V NP VP-> V NP PP PP-> P NP D-> [the] D-> [a] N-> [dog] N-> [cat] N-> [garden] V-> [chased] V-> [saw] P-> [into] Terminals ~ Lexicon

Phrase structure  Formalism of context-free grammars Nonterminal symbols: S, NP, VP,... Terminal symbols: dog, cat, saw, the,...  Recursion „The girl thought the dog chased the cat“ VP-> V, S N-> [girl] V-> [thought]

Top-down parsing  S -> NP VP  S -> Det N VP  S -> The N VP  S -> The dog VP  S -> The dog V NP  S -> The dog chased NP  S -> The dog chased Det N  S-> The dog chased the N  S-> The dog chased the cat

Context-free grammar S--> NP,VP. NP--> PN. %Proper noun NP--> Art, Adj, N. NP--> Art,N. VP--> VI. %intransitive verb VP--> VT, NP. %transitive verb Art--> [the]. Adj--> [lazy]. Adj--> [rapid]. PN--> [achilles]. N--> [turtle]. VI--> [sleeps]. VT--> [beats].

Parse tree S NP NP VP VP Art Art Adj Adj N Vt Vt NP NP PN PN achillesbeatsturtlerapidthe

Definite Clause Grammars Non-terminals may have arguments S--> NP(N),VP(N). NP(N)--> Art(N),N(N). VP(N)--> VI(N). Art(singular)--> [a]. Art(singular)--> [the]. Art(plural)--> [the]. N(singular)--> [turtle]. N(plural)--> [turtles]. VI(singular)--> [sleeps]. VI(plural)--> [sleep]. Number Agreement

DCGs  Non-terminals may have arguments Variables (start with capital)  E.g. Number, Any Constants (start with lower case)  E.g. singular, plural Structured terms (start with lower case, and take arguments themselves)  E.g. vp(V,NP)  Parsing needs to be adapted Using unification

Unification in a nutshell (cf. AI course)  Substitutions E.g. {Num / singular } {T / vp(V,NP)}  Applying substitution Simultaneously replace variables by corresponding terms S(Num) {Num / singular } = S(singular)

Unification  Take two non-terminals with arguments and compute (most general) substitution that makes them identical, e.g., Art(singular) and Art(Num)  Gives { Num / singular } Art(singular) and Art(plural)  Fails Art(Num1) and Art(Num2)  {Num1 / Num2} PN(Num, accusative) and PN(singular, Case)  {Num/singular, Case/accusative}

Parsing with DCGs  Now require successful unification at each step  S -> NP(N), VP(N)  S -> Art(N), N(N), VP(N) {N/singular}  S -> a N(singular), VP(singular)  S -> a turtle VP(singular)  S -> a turtle sleeps  S-> a turtle sleep fails

Case Marking PN(singular,nominative)--> [he];[she] PN(singular,accusative)--> [him];[her] PN(plural,nominative)--> [they] PN(plural,accusative)--> [them] S --> NP(Number,nominative), NP(Number) VP(Number) --> V(Number), VP(Any,accusative) VP(Number,Case) --> PN(Number,Case) VP(Number,Any) --> Det, N(Number) He sees her. She sees him. They see her. But not Them see he.

DCGs  Are strictly more expressive than CFGs  Can represent for instance S(N) -> A(N), B(N), C(N) A(0) -> [] B(0) -> [] C(0) -> [] A(s(N)) -> A(N), [A] B(s(N)) -> B(N), [B] C(s(N)) -> C(N), [C]

Probabilistic Models  Traditional grammar models are very rigid, essentially a yes / no decision  Probabilistic grammars Define a probability models for the data Compute the probability of each alternative Choose the most likely alternative  Ilustrate on Shannon Game Spelling correction Parsing

Some probabilistic models  N-grams Predicting the next word Artificial intelligence and machine …. Statistical natural language ….  Probabilistic Regular (Markov Models) Hidden Markov Models Conditional Random Fields Context-free grammars (Stochastic) Definite Clause Grammars

Illustration  Wall Street Journal Corpus  words  Correct parse tree for sentences known Constructed by hand Can be used to derive stochastic context free grammars SCFG assign probability to parse trees  Compute the most probable parse tree

Sequences are omni-present  Therefore the techniques we will see also apply to Bioinformatics  DNA, proteins, mRNA, … can all be represented as strings Robotics  Sequences of actions, states, … …

Rest of the Course  Limitations traditional grammar models motivate probabilistic extensions Regular grammars and Finite State Automata  All use principles of Part I on Graphical Models  Markov Models using n-gramms  (Hidden) Markov Models  Conditional Random Fields As an example of using undirected graphical models Probabilistic Context Free Grammars Probabilistic Definite Clause Grammars