Download presentation
Presentation is loading. Please wait.
1
Natural Language Processing (NLP) Overview and history of the field Knowledge of language The role of ambiguity Models and Algorithms Eliza, Turing, and conversational agents History of speech and language processing Regular expressions
2
Computer Speech and Language Processing What is it? Getting computers to perform useful tasks involving human languages whether for: –Enabling human-machine communication –Improving human-human communication –Doing stuff with language objects Examples: –Question Answering –Machine Translation –Spoken Conversational Agents
3
Knowledge needed to build speaking computer Speech recognition and synthesis Dictionaries (how words are pronounced) Phonetics (how to recognize/produce each sound of English) Natural language understanding Knowledge of the English words involved –What they mean –How they combine (what is a `pod bay door’?) Knowledge of syntactic structure –I’m I do, Sorry that afraid Dave I’m can’t
4
Eliza: Weizenbaum (1966) User: You are like my father in some ways. ELIZA: What resemblance do you see User: You are not very aggressive but I think you don’t want me to notice that. ELIZA: What makes you think I am not aggressive User: You don’t argue with me ELIZA: Why do you think I don’t argue with you User: You are afraid of me ELIZA: Does it please you to believe I am afraid of you
5
Ambiguity Computational linguists are obsessed with ambiguity Ambiguity is a fundamental problem of computational linguistics Resolving ambiguity is a crucial goal
6
Ambiguity Find at least 5 meanings of this sentence: I made her duck
7
Ambiguity Find at least 5 meanings of this sentence: I made her duck I cooked waterfowl for her benefit (to eat) I cooked waterfowl belonging to her I created the (plaster?) duck she owns I caused her to quickly lower her head or body I waved my magic wand and turned her into undifferentiated waterfowl
8
Ambiguity is Pervasive I caused her to quickly lower her head or body Lexical category : “duck” can be a N or V I cooked waterfowl belonging to her. Lexical category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun I made the (plaster) duck statue she owns Lexical Semantics: “make” can mean “create” or “cook”
9
Ambiguity is Pervasive Grammar: Make can be: Transitive: (verb has a noun direct object) –I cooked [waterfowl belonging to her] Ditransitive: (verb has 2 noun objects) –I made [her] (into) [undifferentiated waterfowl] Action-transitive (verb has a direct object and another verb) - I caused [her] [to move her body]
10
Ambiguity is Pervasive Phonetics! I mate or duck I’m eight or duck Eye maid; her duck Aye mate, her duck I maid her duck I’m aid her duck I mate her duck I’m ate her duck I’m ate or duck I mate or duck
11
Models and Algorithms Models: formalisms used to capture the various kinds of linguistic structure. State machines (fsa, transducers, markov models) Formal rule systems (context-free grammars, feature systems) Logic (predicate calculus, inference) Probabilistic versions of all of these + others (gaussian mixture models, probabilistic relational models, etc etc) Algorithms used to manipulate representations to create structure. Search (A*, dynamic programming) Supervised learning, etc etc
12
Language, Thought, Understanding A Gedanken Experiment: Turing Test Question “can a machine think” is not operational. Operational version: 2 people and a computer Interrogator talks to contestant and computer via teletype Task of machine is to convince interrogator it is human Task of contestant is to convince interrogator she and not machine is human.
13
History: foundational insights 1940s-1950s Automaton: Turing 1936 McCulloch-Pitts neuron (1943) –http://diwww.epfl.ch/mantra/tutorial/english/m cpits/html/http://diwww.epfl.ch/mantra/tutorial/english/m cpits/html/ Kleene (1951/1956) Shannon (1948) link between automata and Markov models Chomsky (1956)/Backus (1959)/Naur(1960): CFG Probabilistic/Information-theoretic models Shannon (1948) Bell Labs speech recognition (1952)
14
History: the two camps: 1957-1970 Symbolic Zellig Harris 1958 TDAP first parser –Cascade of finite-state transducers Chomsky AI workshop at Dartmouth (McCarthy, Minsky, Shannon, Rochester) Newell and Simon: Logic Theorist, General Problem Solver Statistical Bledsoe and Browning (1959): Bayesian OCR Mosteller and Wallace (1964): Bayesian authorship attribution Denes (1959): ASR combining grammar and acoustic probability
15
Four paradigms: 1970-1983 Stochastic Hidden Markov Model 1972 –Independent application of Baker (CMU) and Jelinek/Bahl/Mercer lab (IBM) following work of Baum and colleagues at IDA Logic-based Colmerauer (1970,1975) Q-systems Definite Clause Grammars (Pereira and Warren 1980) Kay (1979) functional grammar, Bresnan and Kaplan (1982) unification Natural language understanding Winograd (1972) Shrdlu Schank and Abelson (1977) scripts, story understanding Influence of case-role work of Fillmore (1968) via Simmons (1973), Schank. Discourse Modeling Grosz and colleagues: discourse structure and focus Perrault and Allen (1980) BDI model
16
Finite State Approach 83 - 93 Finite State Models Kaplan and Kay (1981): Phonology/Morphology Church (1980): Syntax Return of Probabilistic Models: Corpora created for language tasks Early statistical versions of NLP applications (parsing, tagging, machine translation) Increased focus on methodological rigor: –Can’t test your hypothesis on the data you used to build it! –Training sets and test sets
17
The field comes together: 1994-2007 NLP has borrowed statistical modeling from speech recognition, is now standard: ACL conference: –1990: 39 articles 1 statistical –2003 62 articles 48 statistical Machine learning techniques key NLP has borrowed focus on web and search and “bag of words models” from information retrieval Unified field: NLP, MT, ASR, TTS, Dialog, IR
18
Regular expressions A formal language for specifying text strings How can we search for any of these? woodchuck woodchucks Woodchuck Woodchucks
19
Regular Expressions Basic regular expression patterns Perl-based syntax (slightly different from other notations for regular expressions) Disjunctions /[wW]oodchuck/
20
Regular Expressions Ranges [A-Z] Negations [^Ss]
21
Regular Expressions Optional characters ?,* and + ? (0 or 1) –/colou?r/ color or colour * (0 or more) –/oo*h!/ oh! or Ooh! or Ooooh! – + (1 or more) /o+h!/ oh! or Ooh! or Ooooh! Wild cards. - /beg.n/ begin or began or begun
22
Regular Expressions Anchors ^ and $ /^[A-Z]/ “Ramallah, Palestine” /^[^A-Z]/ “¿verdad?” “really?” /\.$/ “It is over.” /.$/ ? Boundaries \b and \B /\bon\b/ “on my way” “Monday” /\Bon\b/ “automaton” Disjunction | /yours|mine/ “it is either yours or mine”
23
Disjunction, Grouping, Precedence Column 1 Column 2 Column 3 … How do we express this? /Column [0-9]+ */ /(Column [0-9]+ +)*/ Precedence Parenthesis () Counters * + ? {} Sequences and anchors the ^my end$ Disjunction |
24
Example Find me all instances of the word “the” in a text. /the/ Misses capitalized examples /[tT]he/ –Returns other or theology /\b[tT]he\b/ /[^a-zA-Z][tT]he[^a-zA-Z]/ /(^|[^a-zA-Z])[tT]he[^a-zA-Z]/
25
Errors The process we just went through was based on fixing two kinds of errors Matching strings that we should not have matched (there, then, other) –False positives Not matching things that we should have matched (The) –False negatives
26
More complex RE example Regular expressions for prices /$[0-9]+/ Doesn’t deal with fractions of dollars /$[0-9]+\.[0-9][0-9]/ Doesn’t allow $199, not word-aligned \b$[0-9]+(\.[0-9]0-9])?\b)
27
Advanced operators
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.