LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Dan Jurafsky 1/16/2019 LING 138/238 Autumn 2004.

Slides:



Advertisements
Similar presentations
Natural Language Processing (or NLP) Reading: Chapter 1 from Jurafsky and Martin, Speech and Language Processing: An Introduction to Natural Language Processing,
Advertisements

Language Processing Technology Machines and other artefacts that use language.
Leksička semantika i pragmatika 5. predavanje. Ambiguity Find at least 5 meanings of this sentence: –I made her duck I cooked waterfowl for her benefit.
Introduction to Natural Language Processing A.k.a., “Computational Linguistics”
Chapter 1. Introduction to NLP
Leksička semantika i pragmatika 6. predavanje. Headlines Police Begin Campaign To Run Down Jaywalkers Iraqi Head Seeks Arms Teacher Strikes Idle Kids.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Oct 2009HLT1 Human Language Technology Overview. Oct 2009HLT2 Acknowledgement Material for some of these slides taken from J Nivre, University of Gotheborg,
Introduction to Semantics and Pragmatics. LING NLP 2 NLP tends to focus on: Syntax – Grammars, parsers, parse trees, dependency structures.
Course Info Course Topics and approximate Schedule Assignments and Grade Breakdown The usual Stuff including "How to fail this course" Students introduce.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Introduction to Deep Processing Techniques for NLP Deep Processing Techniques for Natural Language Processing Ling 571 January 3, 2011 Gina-Anne Levow.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
CS147 - Terry Winograd - 1 Lecture 14 – Agents and Natural Language Terry Winograd CS147 - Introduction to Human-Computer Interaction Design Computer Science.
Natural Language Processing (NLP) Overview and history of the field Knowledge of language The role of ambiguity Models and Algorithms Eliza, Turing, and.
LING 388: Language and Computers Sandiway Fong Lecture 28: 12/6.
By Rohana Mahmud (NLP week 1-2)
Big Ideas in Cmput366. Search Blind Search Iterative deepening Heuristic Search A* Local and Stochastic Search Randomized algorithm Constraint satisfaction.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Lecture 2, 7/22/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 2 22 July 2005.
9/8/20151 Natural Language Processing Lecture Notes 1.
Search and Decoding in Speech Recognition
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Computational Linguistics Ling 200 Spring 2006.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
LING 388: Language and Computers Sandiway Fong Lecture 30 12/8.
Chapter 2. Regular Expressions and Automata From: Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,
1 LING 6932 Spring 2007 LING 6932 Topics in Computational Linguistics Hana Filip Lecture 1: Introduction to Field, History, Quick Review of Regular Expressions,
CS 8520: Artificial Intelligence Natural Language Processing Introduction Paula Matuszek Fall, 2008.
Leksička semantika i pragmatika 3. predavanje. Machine Translation The Story of the Stone –=The Dream of the Red Chamber (Cao Xueqin 1792) Issues: (“Language.
Introduction to CL & NLP CMSC April 1, 2003.
Text Language Technology Natural Language Understanding Natural Language Generation Speech Recognition Speech Synthesis Text Meaning Speech.
CS 124/LINGUIST 180 From Languages to Information
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Basic Text Processing Regular Expressions. Dan Jurafsky 2 The original slides from: tml Some changes.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Introduction to Deep Processing Techniques for NLP Deep Processing Techniques for NLP Ling 571 January 6, 2014 Gina-Anne Levow.
Introduction to Deep Processing Techniques for NLP Deep Processing Techniques for NLP Ling 571 January 5, 2015 Gina-Anne Levow.
Chapter1 Introduction to NLP, CL, and Speech Recognition Hae-Chang Rim.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Introduction to Deep Processing Techniques for NLP Deep Processing Techniques for NLP Ling 571 January 4, 2016 Gina-Anne Levow.
Natural Language Processing Vasile Rus
Natural Language Processing [05 hours/week, 09 Credits] [Theory]
/208/.
Natural Language Processing (NLP)
Natural Language Processing
Natural Language Understanding
Introduction CI612 Compiler Design CI612 Compiler Design.
CSCI 5832 Natural Language Processing
CSC NLP - Regex, Finite State Automata
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
CS4705 Natural Language Processing
Natural Language Understanding
Natural Language Processing
CPSC 503 Computational Linguistics
A User study on Conversational Software
Natural Language Processing (NLP)
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Natural Language Processing
Natural Language Processing (NLP)
Natural Language Processing (NLP)
Presentation transcript:

LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Dan Jurafsky 1/16/2019 LING 138/238 Autumn 2004

Today 9/28 Week 1 Overview and history of the field Administration Knowledge of language The role of ambiguity Models and Algorithms Eliza, Turing, and conversational agents History of speech and language processing Administration Overview of course topics 1 week on each course in NLP+Speech+Dialog! Regular expressions Start of finite automata 1/16/2019 LING 138/238 Autumn 2004

Computer Speech and Language Processing What is it? Getting computers to perform useful tasks involving human languages whether for: Enabling human-machine communication Improving human-human communication Doing stuff with language objects Examples: Question Answering Machine Translation Spoken Conversational Agents 1/16/2019 LING 138/238 Autumn 2004

Kinds of knowledge needed? Consider the following interaction with HAL the computer from 2001: A Space Odyssey Dave: Open the pod bay doors, Hal. HAL: I’m sorry Dave, I’m afraid I can’t do that. 1/16/2019 LING 138/238 Autumn 2004

Knowledge needed to build HAL? Speech recognition and synthesis Dictionaries (how words are pronounced) Phonetics (how to recognize/produce each sound of English) Natural language understanding Knowledge of the English words involved What they mean How they combine (what is a `pod bay door’?) Knowledge of syntactic structure I’m I do, Sorry that afraid Dave I’m can’t 1/16/2019 LING 138/238 Autumn 2004

What’s needed? Dialog and pragmatic knowledge “open the door” is a REQUEST (as opposed to a STATEMENT or information-question) It is polite to respond, even if you’re planning to kill someone. It is polite to pretend to want to be cooperative (I’m afraid, I can’t…) What is `that’ in `I can’t do that’? Even a system to book airline flights needs many of this kind of knowledge 1/16/2019 LING 138/238 Autumn 2004

Question Answering What does “door” mean? What year was Abraham Lincoln born? How many states were in the United States when Lincoln was born? Was there a military draft during the Hoover administration? What do US scientists think about whether human cloning should be legal? 1/16/2019 LING 138/238 Autumn 2004

Machine Translation Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come As she lay there along, Dai-yu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry. 1/16/2019 LING 138/238 Autumn 2004

Machine Translation The Story of the Stone =The Dream of the Red Chamber (Cao Xueqin 1792) Issues: Breaking up into sentences Zero-anaphora Penetrate -> penetrated Bamboo tip plaintain leaf -> bamboos and plantains Curtain -> curtains of her bed 1/16/2019 LING 138/238 Autumn 2004

Ambiguity Find at least 5 meanings of this sentence: I made her duck 1/16/2019 LING 138/238 Autumn 2004

Ambiguity Find at least 5 meanings of this sentence: I made her duck I cooked waterfowl for her benefit (to eat) I cooked waterfowl belonging to her I created the (plaster?) duck she owns I caused her to quickly lower her head or body I waved my magic wand and turned her into undifferentiated waterfowl At least one other meaning that’s inappropriate for gentle company. 1/16/2019 LING 138/238 Autumn 2004

Ambiguity is Pervasive I caused her to quickly lower her head or body Grammar: “duck” can be a noun (waterfowl) or a verb (move body) I cooked waterfowl belonging to her. Grammar: “her” can be a possessive (“of her”) or dative (“for her”) pronoun I made the (plaster) duck statue she owns Meaning: “make” can mean “create” or “cook” 1/16/2019 LING 138/238 Autumn 2004

Ambiguity is Pervasive Grammar: Make can be: Transitive: (verb has a noun direct object) I cooked [waterfowl belonging to her] Ditransitive: (verb has 2 noun objects) I made [her] (into) [undifferentiated waterfowl] Action-transitive (verb has a direct object and another verb) I caused [her] [to move her body] 1/16/2019 LING 138/238 Autumn 2004

Ambiguity is Pervasive Phonetics! I mate or duck I’m eight or duck Eye maid; her duck Aye mate, her duck I maid her duck I’m aid her duck I mate her duck I’m ate her duck I’m ate or duck 1/16/2019 LING 138/238 Autumn 2004

Models and Algorithms Models: formalisms used to capture the various kinds of linguistic structure. State machines (fsa, transducers, markov models) Formal rule systems (context-free grammars, feature systems) Logic (predicate calculus, inference) Probabilistic versions of all of these + others (gaussian mixture models, probabilistic relational models, etc etc) Algorithms used to manipulate representations to create structure. Search (A*, dynamic programming) Supervised learning, etc etc 1/16/2019 LING 138/238 Autumn 2004

Language, Thought, Understanding A Gedanken Experiment: Turing Test Question “can a machine think” is not operational. Operational version: 2 people and a computer Interrogator talks to contestant and computer via teletype Task of machine is to convince interrogator it is human Task of contestant is to convince interrogator she and not machine is human. 1/16/2019 LING 138/238 Autumn 2004

Eliza: Weizenbaum (1966) User: You are like my father in some ways. ELIZA: What resemblance do you see User: You are not very aggressive but I think you don’t want me to notice that. ELIZA: What makes you think I am not aggressive User: You don’t argue with me ELIZA: Why do you think I don’t argue with you User: You are afraid of me ELIZA: Does it please you to believe I am afraid of you 1/16/2019 LING 138/238 Autumn 2004

History: foundational insights 1940s-1950s Automaton: Turing 1936 McCulloch-Pitts neuron (1943) http://marr.bsee.swin.edu.au/~dtl/het704/lecture10/ann/node1.html http://diwww.epfl.ch/mantra/tutorial/english/mcpits/html/ Kleene (1951/1956) Shannon (1948) link between automata and Markov models Chomsky (1956)/Backus (1959)/Naur(1960): CFG Probabilistic/Information-theoretic models Shannon (1948) Bell Labs speech recognition (1952) 1/16/2019 LING 138/238 Autumn 2004

History: the two camps: 1957-1970 Symbolic Zellig Harris 1958 TDAP first parser? Cascade of finite-state transducers Chomsky AI workshop at Dartmouth (McCarthy, Minsky, Shannon, Rochester) Newell and Simon: Logic Theorist, General Problem Solver Statistical Bledsoe and Browning (1959): Bayesian OCR Mosteller and Wallace (1964): Bayesian authorship attribution Denes (1959): ASR combining grammar and acoustic probability 1/16/2019 LING 138/238 Autumn 2004

Four paradigms: 1970-1983 Stochastic Logic-based Hidden Markov Model 1972 Independent application of Baker (CMU) and Jelinek/Bahl/Mercer lab (IBM) following work of Baum and colleagues at IDA Logic-based Colmerauer (1970,1975) Q-systems Definite Clause Grammars (Pereira and Warren 1980) Kay (1979) functional grammar, Bresnan and Kaplan (1982) unification Natural language understanding Winograd (1972) Shrdlu Schank and Abelson (1977) scripts, story understanding Influence of case-role work of Fillmore (1968) via Simmons (1973), Schank. Discourse Modeling Grosz and colleagues: discourse structure and focus Perrault and Allen (1980) BDI model 1/16/2019 LING 138/238 Autumn 2004

Empiricism and Finite State Redux: 1983-1993 Finite State Models Kaplan and Kay (1981): Phonology/Morphology Church (1980): Syntax Return of Empiricism: Probabilistic models return to language processing Corpora created for language tasks Early statistical versions of NLP applications (parsing, tagging, machine translation) Training sets and test sets 1/16/2019 LING 138/238 Autumn 2004

The field comes together: 1994-2004 Statistical models standard ACL conference: 1990: 39 articles 1 statistical 2003 62 articles 48 statistical Machine learning techniques key Information retrieval meets NLP Unified field: IR, NLP, MT, ASR, TTS, Dialog 1/16/2019 LING 138/238 Autumn 2004

How this course fits in This is our new introductory course in natural language, speech, and dialog processing Other courses: http://www.stanford.edu/~jurafsky/nlpcourses.html This course will cover 1 week each on material from these other courses! 1/16/2019 LING 138/238 Autumn 2004

Requirements and Grading Readings: Selected chapters from Speech and Language Processing by Jurafsky and Martin, Prentice-Hall 2000 We are writing the 2nd edition, so you get to be the guinea-pigs! A few conference and journal papers Best 7 of 8 assignments Grading Homework: 84% Participation: 16% 1/16/2019 LING 138/238 Autumn 2004

Overview of the course http://www.stanford.edu/class/linguist238 1/16/2019 LING 138/238 Autumn 2004

Some brief demos Machine Translation http://translate.google.com/translate_t TTS: http://www.rhetorical.com/cgi-bin/demo.cgi QA: http://www.languagecomputer.com/scripts/question.cgi 1/16/2019 LING 138/238 Autumn 2004

Regular Expressions and Text Searching Emacs, vi, perl, grep, etc.. //: search delimiter []: character disjunction [a-f]: character range disjunction [^a]: character negation ?: zero or one instance of previous *: Kleene star, zero or more instances of prev. ^: anchors start of line \b: anchors word boundary |: disjunction (): grouping, precedence 1/16/2019 LING 138/238 Autumn 2004

Example Find me all instances of the word “the” in a text. /the/ Misses capitalized examples /[tT]he/ Returns other or theology /\b[tT]he\b/ 1/16/2019 LING 138/238 Autumn 2004

Errors The process we just went through was based on two fixing kinds of errors Matching strings that we should not have matched (there, then, other) False positives Not matching things that we should have matched (The) False negatives 1/16/2019 LING 138/238 Autumn 2004

Errors cont. We’ll be telling the same story for many tasks, all quarter. Reducing the error rate for an application often involves two antagonistic efforts: Increasing accuracy (minimizing false positives) Increasing coverage (minimizing false negatives). 1/16/2019 LING 138/238 Autumn 2004

More complex RE example Regular expressions for prices /$[0-9]+/ Doesn’t deal with fractions of dollars /$[0-9]+\.[0-9][0-9]/ Doesn’t allow $199, not word-aligned \b$[0-9]+(\.[0-9]0-9])?\b) 1/16/2019 LING 138/238 Autumn 2004

RE substitution, memory, ELIZA s/.* you are (depressed|sad) .*/I am sorry to hear you are \1/ s/.* you are (depressed|sad) .*/Why do you think you are \1/ s/.* all .*/In what way/ S/.* always .*/Can you think of a specific example/ 1/16/2019 LING 138/238 Autumn 2004