Download presentation
Presentation is loading. Please wait.
1
Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting Some slides taken from Helmut Schmid, Rada Mihalcea, Bonnie Dorr, Leila Kosseim, Peter Flach and others
2
Topic Statistical Natural Language Processing Applies Machine Learning / Statistics to Learning : the ability to improve one’s behaviour at a specific task over time - involves the analysis of data (statistics) Natural Language Processing Following parts of the book Statistical NLP (Manning and Schuetze), MIT Press, 1999.
3
Contents Motivation Zipf’s law Some natural language processing tasks Non-probabilistic NLP models Regular grammars and finite state automata Context-Free Grammars Definite Clause Grammars Motivation for statistical NLP Overview of the rest of this part
4
Rationalism versus Empiricism Rationalist Noam Chomsky - innate language structures AI : hand coding NLP Dominant view 1960-1985 Cf. e.g. Steven Pinker’s The language instinct. (popular science book) Empiricist Ability to learn is innate AI : language is learned from corpora Dominant 1920-1960 and becoming increasingly important
5
Rationalism versus Empiricism Noam Chomsky: But it must be recognized that the notion of “probability of a sentence” is an entirely useless one, under any known interpretation of this term Fred Jelinek (IBM 1988) Every time a linguist leaves the room the recognition rate goes up. (Alternative: Every time I fire a linguist the recognizer improves)
6
This course Empiricist approach Focus will be on probabilistic models for learning of natural language No time to treat natural language in depth ! (though this would be quite useful and interesting) Deserves a full course by itself Covered in more depth in Logic, Language and Learning (SS 05, prob. SS 06)
7
Ambiguity
8
Statistical Disambiguation Define a probability model for the data Compute the probability of each alternative Choose the most likely alternative NLP and Statistics
9
Statistical Methods deal with uncertainty. They predict the future behaviour of a system based on the behaviour observed in the past. Statistical Methods require training data. The data in Statistical NLP are the Corpora NLP and Statistics
10
Corpus: text collection for linguistic purposes Tokens How many words are contained in Tom Sawyer? 71.370 Types How many different words are contained in T.S.? 8.018 Hapax Legomena words appearing only once Corpora
11
The most frequent words are function words wordfreqwordfreq the3332in906 and2972that877 a1775he877 to1725I783 of1440his772 was1161you686 it1027Tom679 Word Counts
12
f n f 13993 21292 3 664 4 410 5 243 6 199 7 172 8 131 9 82 10 91 11-50 540 51-100 99 > 100 102 How many words appear f times? Word Counts About half of the words occurs just once About half of the text consists of the 100 most common words ….
13
Word Counts (Brown corpus)
15
wordfr f*rwordfr f*r the33321 3332turned5120010200 and29722 5944you‘ll30300 9000 a17753 5235name21400 8400 he87710 8770comes16500 8000 but41020 8400group13600 7800 be29430 8820lead11700 7700 there22240 8880friends10800 8000 one17250 8600begin9900 8100 about15860 9480family81000 8000 more13870 9660brushed42000 8000 never12480 9920sins23000 6000 Oh1169010440Could24000 8000 two10410010400Applausive18000 8000 Zipf‘s Law: f~1/r (f*r = const) Zipf‘s Law Minimize effort
16
Conclusions Overview of some probabilistic and machine learning methods for NLP Also very relevant to bioinformatics ! Analogy between parsing A sentence A biological string (DNA, protein, mRNA, …)
17
Language and sequences Natural language processing Is concerned with the analysis of sequences of words / sentences Construction of language models Two types of models Non-probabilistic Probabilistic
18
Human Language is highly ambiguous at all levels acoustic level recognize speech vs. wreck a nice beach morphological level saw: to see (past), saw (noun), to saw (present, inf) syntactic level I saw the man on the hill with a telescope semantic level One book has to be read by every student Key NLP Problem: Ambiguity
19
Language Model A formal model about language Two types Non-probabilistic Allows one to compute whether a certain sequence (sentence or part thereof) is possible Often grammar based Probabilistic Allows one to compute the probability of a certain sequence Often extends grammars with probabilities
20
Example of bad language model
21
A bad language model
23
A good language model Non-Probabilistic “I swear to tell the truth” is possible “I swerve to smell de soup” is impossible Probabilistic P(I swear to tell the truth) ~.0001 P(I swerve to smell de soup) ~ 0
24
Why language models ? Consider a Shannon Game Predicting the next word in the sequence Statistical natural language …. The cat is thrown out of the … The large green … Sue swallowed the large green … … Model at the sentence level
25
Applications Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users …
26
Spelling errors They are leaving in about fifteen minuets to go to her house. The study was conducted mainly be John Black. Hopefully, all with continue smoothly in my absence. Can they lave him my messages? I need to notified the bank of…. He is trying to fine out.
27
Handwriting recognition Assume a note is given to a bank teller, which the teller reads as I have a gub. (cf. Woody Allen) NLP to the rescue …. gub is not a word gun, gum, Gus, and gull are words, but gun has a higher probability in the context of a bank
28
For Spell Checkers Collect list of commonly substituted words piece/peace, whether/weather, their/there... Example: “On Tuesday, the whether …’’ “On Tuesday, the weather …”
29
Another dimension in language models Do we mainly want to infer (probabilities) of legal sentences / sequences ? So far Or, do we want to infer properties of these sentences ? E.g., parse tree, part-of-speech-tagging Needed for understanding NL Let’s look at some tasks
30
Sequence Tagging Part-of-speech tagging He drives with his bike N V PR PN N noun, verb, preposition, pronoun, noun Text extraction The job is that of a programmer X X X X X X JobType The seminar is taking place from 15.00 to 16.00 X X X X X X Start End
31
Sequence Tagging Predicting the secondary structure of proteins, mRNA, … X = A,F,A,R,L,M,M,A,… Y = he,he,st,st,st,he,st,he, …
32
Parsing Given a sentence, find its parse tree Important step in understanding NL
33
Parsing In bioinformatics, allows to predict (elements of) structure from sequence
34
Language models based on Grammars Grammar Types Regular grammars and Finite State Automata Context-Free Grammars Definite Clause Grammars A particular type of Unification Based Grammar (Prolog) Distinguish lexicon from grammar Lexicon (dictionary) contains information about words, e.g. word - possible tags (and possibly additional information) flies - V(erb) - N(oun) Grammar encode rules
35
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure: parsing (more than just recognition) Result of parsing mostly parse tree: showing the constituents of a sentence, e.g. verb or noun phrases Syntax usually specified in terms of a grammar consisting of grammar rules
36
Regular Grammars and Finite State Automata Lexical information - which words are ? Det(erminer) N(oun) Vi (intransitive verb) - no argument Pn (pronoun) Vt (transitive verb) - takes an argument Adj (adjective) Now accept The cat slept Det N Vi As regular grammar S -> [Det] S1 % [ ] : terminal S1 -> [N] S2 S2 -> [Vi] Lexicon The - Det Cat - N Slept - Vi …
37
Finite State Automaton Sentences John smiles - Pn Vi The cat disappeared - Det N Vi These new shoes hurt - Det Adj N Vi John liked the old cat PN Vt Det Adj N
38
Phrase structure S NP DN VP NPV D N PP PNP D N the dog chased acatintothegarden
39
Notation S : sentence D or Det : Determiner (e.g., articles) N : noun V : verb P : preposition NP : noun phrase VP : verb phrase PP : prepositional phrase
40
Context Free Grammar S-> NP VP NP-> D N VP-> V NP VP-> V NP PP PP-> P NP D-> [the] D-> [a] N-> [dog] N-> [cat] N-> [garden] V-> [chased] V-> [saw] P-> [into] Terminals ~ Lexicon
41
Phrase structure Formalism of context-free grammars Nonterminal symbols: S, NP, VP,... Terminal symbols: dog, cat, saw, the,... Recursion „The girl thought the dog chased the cat“ VP-> V, S N-> [girl] V-> [thought]
42
Top-down parsing S -> NP VP S -> Det N VP S -> The N VP S -> The dog VP S -> The dog V NP S -> The dog chased NP S -> The dog chased Det N S-> The dog chased the N S-> The dog chased the cat
43
Context-free grammar S--> NP,VP. NP--> PN. %Proper noun NP--> Art, Adj, N. NP--> Art,N. VP--> VI. %intransitive verb VP--> VT, NP. %transitive verb Art--> [the]. Adj--> [lazy]. Adj--> [rapid]. PN--> [achilles]. N--> [turtle]. VI--> [sleeps]. VT--> [beats].
44
Parse tree S NP NP VP VP Art Art Adj Adj N Vt Vt NP NP PN PN achillesbeatsturtlerapidthe
45
Definite Clause Grammars Non-terminals may have arguments S--> NP(N),VP(N). NP(N)--> Art(N),N(N). VP(N)--> VI(N). Art(singular)--> [a]. Art(singular)--> [the]. Art(plural)--> [the]. N(singular)--> [turtle]. N(plural)--> [turtles]. VI(singular)--> [sleeps]. VI(plural)--> [sleep]. Number Agreement
46
DCGs Non-terminals may have arguments Variables (start with capital) E.g. Number, Any Constants (start with lower case) E.g. singular, plural Structured terms (start with lower case, and take arguments themselves) E.g. vp(V,NP) Parsing needs to be adapted Using unification
47
Unification in a nutshell (cf. AI course) Substitutions E.g. {Num / singular } {T / vp(V,NP)} Applying substitution Simultaneously replace variables by corresponding terms S(Num) {Num / singular } = S(singular)
48
Unification Take two non-terminals with arguments and compute (most general) substitution that makes them identical, e.g., Art(singular) and Art(Num) Gives { Num / singular } Art(singular) and Art(plural) Fails Art(Num1) and Art(Num2) {Num1 / Num2} PN(Num, accusative) and PN(singular, Case) {Num/singular, Case/accusative}
49
Parsing with DCGs Now require successful unification at each step S -> NP(N), VP(N) S -> Art(N), N(N), VP(N) {N/singular} S -> a N(singular), VP(singular) S -> a turtle VP(singular) S -> a turtle sleeps S-> a turtle sleep fails
50
Case Marking PN(singular,nominative)--> [he];[she] PN(singular,accusative)--> [him];[her] PN(plural,nominative)--> [they] PN(plural,accusative)--> [them] S --> NP(Number,nominative), NP(Number) VP(Number) --> V(Number), VP(Any,accusative) VP(Number,Case) --> PN(Number,Case) VP(Number,Any) --> Det, N(Number) He sees her. She sees him. They see her. But not Them see he.
51
DCGs Are strictly more expressive than CFGs Can represent for instance S(N) -> A(N), B(N), C(N) A(0) -> [] B(0) -> [] C(0) -> [] A(s(N)) -> A(N), [A] B(s(N)) -> B(N), [B] C(s(N)) -> C(N), [C]
52
Probabilistic Models Traditional grammar models are very rigid, essentially a yes / no decision Probabilistic grammars Define a probability models for the data Compute the probability of each alternative Choose the most likely alternative Ilustrate on Shannon Game Spelling correction Parsing
53
Some probabilistic models N-grams Predicting the next word Artificial intelligence and machine …. Statistical natural language …. Probabilistic Regular (Markov Models) Hidden Markov Models Conditional Random Fields Context-free grammars (Stochastic) Definite Clause Grammars
54
Illustration Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known Constructed by hand Can be used to derive stochastic context free grammars SCFG assign probability to parse trees Compute the most probable parse tree
56
Sequences are omni-present Therefore the techniques we will see also apply to Bioinformatics DNA, proteins, mRNA, … can all be represented as strings Robotics Sequences of actions, states, … …
57
Rest of the Course Limitations traditional grammar models motivate probabilistic extensions Regular grammars and Finite State Automata All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields As an example of using undirected graphical models Probabilistic Context Free Grammars Probabilistic Definite Clause Grammars
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.