Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Dan Jurafsky 1/16/2019 LING 138/238 Autumn 2004.

Similar presentations


Presentation on theme: "LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Dan Jurafsky 1/16/2019 LING 138/238 Autumn 2004."— Presentation transcript:

1 LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing
Dan Jurafsky 1/16/2019 LING 138/238 Autumn 2004

2 Today 9/28 Week 1 Overview and history of the field Administration
Knowledge of language The role of ambiguity Models and Algorithms Eliza, Turing, and conversational agents History of speech and language processing Administration Overview of course topics 1 week on each course in NLP+Speech+Dialog! Regular expressions Start of finite automata 1/16/2019 LING 138/238 Autumn 2004

3 Computer Speech and Language Processing
What is it? Getting computers to perform useful tasks involving human languages whether for: Enabling human-machine communication Improving human-human communication Doing stuff with language objects Examples: Question Answering Machine Translation Spoken Conversational Agents 1/16/2019 LING 138/238 Autumn 2004

4 Kinds of knowledge needed?
Consider the following interaction with HAL the computer from 2001: A Space Odyssey Dave: Open the pod bay doors, Hal. HAL: I’m sorry Dave, I’m afraid I can’t do that. 1/16/2019 LING 138/238 Autumn 2004

5 Knowledge needed to build HAL?
Speech recognition and synthesis Dictionaries (how words are pronounced) Phonetics (how to recognize/produce each sound of English) Natural language understanding Knowledge of the English words involved What they mean How they combine (what is a `pod bay door’?) Knowledge of syntactic structure I’m I do, Sorry that afraid Dave I’m can’t 1/16/2019 LING 138/238 Autumn 2004

6 What’s needed? Dialog and pragmatic knowledge
“open the door” is a REQUEST (as opposed to a STATEMENT or information-question) It is polite to respond, even if you’re planning to kill someone. It is polite to pretend to want to be cooperative (I’m afraid, I can’t…) What is `that’ in `I can’t do that’? Even a system to book airline flights needs many of this kind of knowledge 1/16/2019 LING 138/238 Autumn 2004

7 Question Answering What does “door” mean?
What year was Abraham Lincoln born? How many states were in the United States when Lincoln was born? Was there a military draft during the Hoover administration? What do US scientists think about whether human cloning should be legal? 1/16/2019 LING 138/238 Autumn 2004

8 Machine Translation Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come As she lay there along, Dai-yu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry. 1/16/2019 LING 138/238 Autumn 2004

9 Machine Translation The Story of the Stone
=The Dream of the Red Chamber (Cao Xueqin 1792) Issues: Breaking up into sentences Zero-anaphora Penetrate -> penetrated Bamboo tip plaintain leaf -> bamboos and plantains Curtain -> curtains of her bed 1/16/2019 LING 138/238 Autumn 2004

10 Ambiguity Find at least 5 meanings of this sentence: I made her duck
1/16/2019 LING 138/238 Autumn 2004

11 Ambiguity Find at least 5 meanings of this sentence:
I made her duck I cooked waterfowl for her benefit (to eat) I cooked waterfowl belonging to her I created the (plaster?) duck she owns I caused her to quickly lower her head or body I waved my magic wand and turned her into undifferentiated waterfowl At least one other meaning that’s inappropriate for gentle company. 1/16/2019 LING 138/238 Autumn 2004

12 Ambiguity is Pervasive
I caused her to quickly lower her head or body Grammar: “duck” can be a noun (waterfowl) or a verb (move body) I cooked waterfowl belonging to her. Grammar: “her” can be a possessive (“of her”) or dative (“for her”) pronoun I made the (plaster) duck statue she owns Meaning: “make” can mean “create” or “cook” 1/16/2019 LING 138/238 Autumn 2004

13 Ambiguity is Pervasive
Grammar: Make can be: Transitive: (verb has a noun direct object) I cooked [waterfowl belonging to her] Ditransitive: (verb has 2 noun objects) I made [her] (into) [undifferentiated waterfowl] Action-transitive (verb has a direct object and another verb) I caused [her] [to move her body] 1/16/2019 LING 138/238 Autumn 2004

14 Ambiguity is Pervasive
Phonetics! I mate or duck I’m eight or duck Eye maid; her duck Aye mate, her duck I maid her duck I’m aid her duck I mate her duck I’m ate her duck I’m ate or duck 1/16/2019 LING 138/238 Autumn 2004

15 Models and Algorithms Models: formalisms used to capture the various kinds of linguistic structure. State machines (fsa, transducers, markov models) Formal rule systems (context-free grammars, feature systems) Logic (predicate calculus, inference) Probabilistic versions of all of these + others (gaussian mixture models, probabilistic relational models, etc etc) Algorithms used to manipulate representations to create structure. Search (A*, dynamic programming) Supervised learning, etc etc 1/16/2019 LING 138/238 Autumn 2004

16 Language, Thought, Understanding
A Gedanken Experiment: Turing Test Question “can a machine think” is not operational. Operational version: 2 people and a computer Interrogator talks to contestant and computer via teletype Task of machine is to convince interrogator it is human Task of contestant is to convince interrogator she and not machine is human. 1/16/2019 LING 138/238 Autumn 2004

17 Eliza: Weizenbaum (1966) User: You are like my father in some ways.
ELIZA: What resemblance do you see User: You are not very aggressive but I think you don’t want me to notice that. ELIZA: What makes you think I am not aggressive User: You don’t argue with me ELIZA: Why do you think I don’t argue with you User: You are afraid of me ELIZA: Does it please you to believe I am afraid of you 1/16/2019 LING 138/238 Autumn 2004

18 History: foundational insights 1940s-1950s
Automaton: Turing 1936 McCulloch-Pitts neuron (1943) Kleene (1951/1956) Shannon (1948) link between automata and Markov models Chomsky (1956)/Backus (1959)/Naur(1960): CFG Probabilistic/Information-theoretic models Shannon (1948) Bell Labs speech recognition (1952) 1/16/2019 LING 138/238 Autumn 2004

19 History: the two camps: 1957-1970
Symbolic Zellig Harris 1958 TDAP first parser? Cascade of finite-state transducers Chomsky AI workshop at Dartmouth (McCarthy, Minsky, Shannon, Rochester) Newell and Simon: Logic Theorist, General Problem Solver Statistical Bledsoe and Browning (1959): Bayesian OCR Mosteller and Wallace (1964): Bayesian authorship attribution Denes (1959): ASR combining grammar and acoustic probability 1/16/2019 LING 138/238 Autumn 2004

20 Four paradigms: 1970-1983 Stochastic Logic-based
Hidden Markov Model 1972 Independent application of Baker (CMU) and Jelinek/Bahl/Mercer lab (IBM) following work of Baum and colleagues at IDA Logic-based Colmerauer (1970,1975) Q-systems Definite Clause Grammars (Pereira and Warren 1980) Kay (1979) functional grammar, Bresnan and Kaplan (1982) unification Natural language understanding Winograd (1972) Shrdlu Schank and Abelson (1977) scripts, story understanding Influence of case-role work of Fillmore (1968) via Simmons (1973), Schank. Discourse Modeling Grosz and colleagues: discourse structure and focus Perrault and Allen (1980) BDI model 1/16/2019 LING 138/238 Autumn 2004

21 Empiricism and Finite State Redux: 1983-1993
Finite State Models Kaplan and Kay (1981): Phonology/Morphology Church (1980): Syntax Return of Empiricism: Probabilistic models return to language processing Corpora created for language tasks Early statistical versions of NLP applications (parsing, tagging, machine translation) Training sets and test sets 1/16/2019 LING 138/238 Autumn 2004

22 The field comes together: 1994-2004
Statistical models standard ACL conference: 1990: 39 articles 1 statistical articles statistical Machine learning techniques key Information retrieval meets NLP Unified field: IR, NLP, MT, ASR, TTS, Dialog 1/16/2019 LING 138/238 Autumn 2004

23 How this course fits in This is our new introductory course in natural language, speech, and dialog processing Other courses: This course will cover 1 week each on material from these other courses! 1/16/2019 LING 138/238 Autumn 2004

24 Requirements and Grading
Readings: Selected chapters from Speech and Language Processing by Jurafsky and Martin, Prentice-Hall 2000 We are writing the 2nd edition, so you get to be the guinea-pigs! A few conference and journal papers Best 7 of 8 assignments Grading Homework: 84% Participation: 16% 1/16/2019 LING 138/238 Autumn 2004

25 Overview of the course http://www.stanford.edu/class/linguist238
1/16/2019 LING 138/238 Autumn 2004

26 Some brief demos Machine Translation
TTS: QA: 1/16/2019 LING 138/238 Autumn 2004

27 Regular Expressions and Text Searching
Emacs, vi, perl, grep, etc.. //: search delimiter []: character disjunction [a-f]: character range disjunction [^a]: character negation ?: zero or one instance of previous *: Kleene star, zero or more instances of prev. ^: anchors start of line \b: anchors word boundary |: disjunction (): grouping, precedence 1/16/2019 LING 138/238 Autumn 2004

28 Example Find me all instances of the word “the” in a text. /the/
Misses capitalized examples /[tT]he/ Returns other or theology /\b[tT]he\b/ 1/16/2019 LING 138/238 Autumn 2004

29 Errors The process we just went through was based on two fixing kinds of errors Matching strings that we should not have matched (there, then, other) False positives Not matching things that we should have matched (The) False negatives 1/16/2019 LING 138/238 Autumn 2004

30 Errors cont. We’ll be telling the same story for many tasks, all quarter. Reducing the error rate for an application often involves two antagonistic efforts: Increasing accuracy (minimizing false positives) Increasing coverage (minimizing false negatives). 1/16/2019 LING 138/238 Autumn 2004

31 More complex RE example
Regular expressions for prices /$[0-9]+/ Doesn’t deal with fractions of dollars /$[0-9]+\.[0-9][0-9]/ Doesn’t allow $199, not word-aligned \b$[0-9]+(\.[0-9]0-9])?\b) 1/16/2019 LING 138/238 Autumn 2004

32 RE substitution, memory, ELIZA
s/.* you are (depressed|sad) .*/I am sorry to hear you are \1/ s/.* you are (depressed|sad) .*/Why do you think you are \1/ s/.* all .*/In what way/ S/.* always .*/Can you think of a specific example/ 1/16/2019 LING 138/238 Autumn 2004


Download ppt "LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Dan Jurafsky 1/16/2019 LING 138/238 Autumn 2004."

Similar presentations


Ads by Google