Download presentation
Presentation is loading. Please wait.
Published byJuliet Roberts Modified over 9 years ago
1
Linguistically Rich Statistical Models of Language Joseph Smarr M.S. Candidate Symbolic Systems Program Advisor: Christopher D. Manning December 5 th, 2002
2
Grand Vision Talk to your computer like another human HAL, Star Trek, etc. Ask your computer a question, it finds the answer “Who’s speaking at this week’s SymSys Forum?” Computer can read and summarize text for you “What’s the cutting edge in NLP these days?”
3
We’re Not There (Yet) Turns out behaving intelligently is difficult What does it take to achieve the grand vision? General Artificial Intelligence problems Knowledge representation, common sense reasoning, etc. Language-specific problems Complexity, ambiguity, and flexibility of language Always underestimated because language is so easy for us!
4
Are There Useful Sub-Goals? Grand vision is still too hard, but we can solve simpler problems that are still valuable Filter news for stories about new tech gadgets Take the SSP talk email and add it to my calendar Dial my cell phone by speaking my friend’s name Automatically reply to customer service e- mails Find out which episode of The Simpsons is tonight Two approaches to understanding language: Theory-driven: Theoretical Linguistics Task-driven: Natural Language Processing
5
Theoretical Linguistics vs. NLP Theoretical Linguistics Goal: Understand people’s Knowledge of language Method: Rich logical representations of language’s hidden structure and meaning Guiding principles: Separation of (hidden) knowledge of language and (observable) performance Grammaticality is categorical (all or none) Describe what are possible and impossible utterances Natural Language Processing Goal: Develop practical tools for analyzing speech / text Method: Simple, robust models of everyday language use that are sufficient to perform tasks Guiding principles Exploit (empirical) regularities and patterns in examples of language in text collections Sentence “goodness” is gradient (better or worse) Deal with the utterances you’re given, good or bad
6
Theoretical Linguistics vs. NLP LinguisticsNLP
7
Linguistic Puzzle When dropping an argument, why do some verbs keep the subject and some keep the object? John sang the song John sang John broke the vase The vase broke Not just “quirkiness of language” Similar patterns show up in other languages Seems to involve deep aspects of verb meaning Rules to account for this phenomenon Two classes of verbs (unergative & unaccusative) Remaining argument must be realized as subject
8
Exception: Imperatives “Open the pod bay doors, Hal” Different goals lead to study of different problems. In NLP... Need to recognize this as a command Need to figure out what specific action to take Irrelevant how you’d say it in French Describing language vs. working with language But both tasks clearly share many sub- problems
9
Theoretical Linguistics vs. NLP Potential for much synergy between linguistics and NLP However, historically they have remained quite distinct Chomsky (founder of generative grammar): “It must be recognized that the notion ‘probability of a sentence’ is an entirely useless one, under any known interpretation of this term.” Karttunen (founder of finite state technologies at Xerox) Linguists’ reaction to NLP: “Not interested. You do not understand Theory. Go away you geek.” Jelinek (former head of IBM speech project): “Every time I fire a linguist, the performance of our speech recognition system goes up.”
10
Potential Synergies Lexical acquisition (unknown words) Statistically infer new lexical entries from context Modeling “naturalness” and “conventionality” Use corpus data to weight constructions Dealing with ungrammatical utterances Find “most similar / most likely” correction Richer patterns for finding information in text Use argument structure / semantic dependencies More powerful models for speech recognition Progressively build parse tree while listening
11
Finding Information in Text US Government has sponsored lots of research in “information extraction” from news articles Find mentions of terrorists and which locations they’re targeting Find which companies are being acquired by which others and for how much Progress driven by simplifying the models used Early work used rich linguistic parsers Unable to robustly handle natural text Modern work is mainly finite state patterns Regular expressions are very practical and successful
12
Web Information Extraction How much does that text book cost on Amazon? Learn patterns for finding relevant fields Concept:Book Title:Foundations of Statistical Natural Language Processing Author(s) : Christopher D. Manning & Hinrich Schütze Price:$58.45 Our Price: $##.##
13
Improving IE Performance on Natural Text Documents How can we scale IE back up for natural text? Need to look elsewhere for regularities to exploit Idea: Consider grammatical structure Run shallow parser on each sentence Flatten output into sequence of “typed chunks” Example of Tagged Sentence: Uba2p is located largely in the nucleus. NP_SEGVP_SEGPP_SEGNP_SEG
14
Power of Linguistic Features 21% increase65% increase45% increase
15
Linguistically Rich(er) IE Exploit more grammatical structure for patterns e.g. Tim Grow’s work on IE with PCFGs S{pur, acq, amt} NP{pur} VP{acq, amt} NNP MD PP{amt} VB NNP VP{acq, amt} NP{amt} NNPCD threemillion for acquire will First ShelandBankInc UnionCorp NP{acq} NNP IN {pur} {acq} {amt} dollars
16
Classifying Unknown Words Which of the following is the name of a city? Cotrimoxazole Wethersfield Alien Fury: Countdown to Invasion Most linguistic grammars assume a fixed lexicon How do humans learn to deal with new words? Context (“I spent a summer living in Wethersfield”) Makeup of the word itself (“phonesthetics”) Idea: Learn distinguishing letter sequences
17
What’s in a Name? oxa:field
18
Generative Model of PNPs Length n-gram model and word model P(pnp|c) = P n-gram (word-lengths(pnp)) * word i pnp P(w i |word-length(w i )) Word model: mixture of character n-gram model and common word model P(w i |len) = len *P n-gram (w i |len) k/len + (1- len )* P word (w i |len) N-Gram Models: deleted interpolation P 0-gram (symbol|history) = uniform-distribution P n-gram (s|h) = C(h) P empirical (s|h) + (1- C(h) )P (n-1)-gram (s|h)
19
Experimental Results pairwise 1-all n-way
20
Knowledge of Frequencies Linguistics traditionally assumes Knowledge of Language doesn’t involve counting Letter frequencies are clearly an important source of knowledge for unknown words Similarly, we saw before that there are regular patterns to exploit in grammatical information Take home point: Combining Statistical NLP methods with richer linguistic representations is a big win!
21
Language is Ambiguous! Ban on Nude Dancing on Governor’s Desk – from a Georgia newspaper column discussing current legislation Lebanese chief limits access to private parts – talking about an Army General’s initiative Death may ease tension – an article about the death of Colonel Jean-Claude Paul in Haiti Iraqi Head Seeks Arms Juvenile Court to Try Shooting Defendant Teacher Strikes Idle Kids Stolen Painting Found By Tree
22
Language is Ambiguous! Local HS Dropouts Cut in Half Obesity Study Looks for Larger Test Group British Left Waffles on Falkland Islands Red Tape Holds Up New Bridges Man Struck by Lightning Faces Battery Charge Clinton Wins on Budget, but More Lies Ahead Hospitals Are Sued by 7 Foot Doctors Kids Make Nutritious Snacks
23
Coping With Ambiguity Categorical grammars like HPSG provide many possible analyses for sentences 455 parses for “List the sales of the products produced in 1973 with the products produced in 1972.” (Martin et al, 1987) In most cases, only one interpretation is intended Initial solution was hand-coded preferences among rules Hard to manage as number of rules increase Need to capture interactions among rules
24
Statistical HPSG Parse Selection HPSG provides deep analyses of sentence structure and meaning Useful for NLP tasks like question answering Need to solve disambiguation problem to make using these richer representations practical Idea: Learn statistical preferences among constructions from hand-disambiguated collection of sentences Result: Correct analysis chosen >80% of the time StatNLP methods + Linguistic representation = Win
25
Towards Semantic Extraction HPSG provides representation of meaning Who did what to whom? Computers need meaning to do inference Can we extend information extraction methods to extract meaning representations from pages? Current project: IE for the semantic web Large project to build rich ontologies to describe the content of web pages for intelligent agents Use IE to extract new instances of concepts from web pages (as opposed to manual labeling) student(Joseph), univ(Stanford), at(Joseph, Stanford)
26
Towards the Grand Vision? Collaboration between Theoretical Linguistics and NLP is important step forward Practical tools with sophisticated language power How can we ever teach computers enough about language and the world? Hawking: Moore’s Law is sufficient Moravec: mobile robots must learn like children Kurzweil: reverse-engineer the human brain The experts agree: Symbolic Systems is the future!
27
Upcoming Convergence Courses Ling 139M Machine Translation Win Ling 239E Grammar Engineering Win CS 276B Text Information Retrieval Win Ling 239A Parsing and Generation Spr CS 224N Natural Language Processing Spr Get Involved!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.