Statistical NLP Winter 2008 Lecture 1: Introduction Roger Levy большое спасибо to Dan Klein for slides.

Slides:



Advertisements
Similar presentations
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Advertisements

For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
CEP Welcome September 1, Matthew J. Koehler September 1, 2005CEP Cognition and Technology Who’s Who?  Team up with someone you don’t.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Chapter 1 Program Design
Natural Language Processing Prof: Jason Eisner Webpage: syllabus, announcements, slides, homeworks.
Lecture 1 Introduction: Linguistic Theory and Theories
Natural Language Processing Ellen Back, LIS489, Spring 2015.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
SI485i : NLP Day 1 Intro to NLP. Assumptions about You You know… how to program Java basic UNIX usage basic probability and statistics (we’ll also review)
Statistical NLP Winter 2008 Lecture 1: Introduction Roger Levy (with grateful borrowing from Dan Klein)
Natural Language Understanding
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
9/8/20151 Natural Language Processing Lecture Notes 1.
Introduction to Natural Language Processing Heshaam Faili University of Tehran.
1 Ling 569: Introduction to Computational Linguistics Jason Eisner Johns Hopkins University Tu/Th 1:30-3:20 (also this Fri 1-5)
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Computational Linguistics Ling 200 Spring 2006.
CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
CHAPTER 13 NATURAL LANGUAGE PROCESSING. Machine Translation.
Introduction to CL & NLP CMSC April 1, 2003.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Tokenization & POS-Tagging
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
CS 188: Artificial Intelligence Spring 2009 Natural Language Processing Dan Klein – UC Berkeley 1.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Statistical Natural Language Parsing Parsing: The rise of data and statistics.
/665 Natural Language Processing
Statistical NLP Spring 2010
Natural Language Processing (NLP)
Machine Learning in Natural Language Processing
Natural Language Processing
CS246: Information Retrieval
Natural Language Processing (NLP)
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Statistical NLP Winter 2008
Natural Language Processing (NLP)
Presentation transcript:

Statistical NLP Winter 2008 Lecture 1: Introduction Roger Levy большое спасибо to Dan Klein for slides

Course Info Meeting times Lectures: WF 12:45-2pm, AP&M 4301 Office hours: WF 2-3pm, AP&M 4220 Communication Web page: Class mailing list: TBA

Access / Computation Computing resources I’ll make some data and code available on the web There is also a range of linguistic datasets at UCSD that we should make sure you can access Major order of business: making sure you all have reasonable access to good computing environments with good computing power

The Dream It’d be great if machines could Process our (usefully) Translate languages accurately Help us manage, summarize, and aggregate information Use speech as a UI (when needed) Talk to us / listen to us But they can’t: Language is complex, ambiguous, flexible, and subtle Good solutions need linguistics and machine learning knowledge So:

What is NLP? Fundamental goal: deep understand of broad language Not just string processing or keyword matching! End systems that we want to build: Ambitious: speech recognition, machine translation, information extraction, dialog interfaces, question answering… Modest: spelling correction, text categorization…

Automatic Speech Recognition (ASR) Audio in, text out SOTA: 0.3% for digit strings, 5% dictation, 50%+ TV Text to Speech (TTS) Text in, audio out SOTA: totally intelligible (if sometimes unnatural) Speech systems currently: Model the speech signal Model language (next class) In practice, speech interfaces usually wired up to dialog systems Speech Systems “Speech Lab”

Machine Translation Translation systems encode: Something about fluent language (next class) Something about how two languages correspond (middle of term) SOTA: for easy language pairs, better than nothing, but more an understanding aid than a replacement for human translators

Information Extraction Information Extraction (IE) Unstructured text to database entries SOTA: perhaps 70% accuracy for multi-sentence temples, 90%+ for single easy fields New York Times Co. named Russell T. Lewis, 45, president and general manager of its flagship New York Times newspaper, responsible for all business-side activities. He was executive vice president and deputy general manager. He succeeds Lance R. Primis, who in September was named president and chief operating officer of the parent. PersonCompanyPostState Russell T. LewisNew York Times newspaper president and general manager start Russell T. LewisNew York Times newspaper executive vice president end Lance R. PrimisNew York Times Co.president and CEOstart

Question Answering Question Answering: More than search Ask general comprehension questions of a document collection Can be really easy: “What’s the capital of Wyoming?” Can be harder: “How many US states’ capitals are also their largest cities?” Can be open ended: “What are the main issues in the global warming debate?” SOTA: Can do factoids, even when text isn’t a perfect match

What is nearby NLP? Computational Linguistics Using computational methods to learn more about how language works We end up doing this and using it Cognitive Science Figuring out how the human brain works Includes the bits that do language Humans: the only working NLP prototype! We’ll cover a bit of this near the end of the course Speech? Mapping audio signals to text Traditionally separate from NLP, converging? Two components: acoustic models and language models Language models in the domain of stat NLP We won’t cover speech, but early in the course we’ll do “speechy” stuff

What is this Class? Three aspects to the course: Linguistic Issues What are the range of language phenomena? What are the knowledge sources that let us disambiguate? What representations are appropriate? How do you know what to model and what not to model? Technical Methods Learning and parameter estimation Increasingly complex model structures Efficient algorithms: dynamic programming, search Engineering Methods Issues of scale Sometimes, very ugly hacks We’ll focus on what makes the problems hard, and what works in practice…

Outline of Topics Word level models N-gram models and smoothing Classification and clustering Sequences Part-of-speech tagging Trees Syntactic parsing Semantic representations Higher order units: discourse… Computational Psycholinguistics Unsupervised learning

Class Requirements and Goals Class requirements Uses a variety of skills / knowledge: Basic probability and statistics Basic linguistics background Decent coding skills Most people are probably missing one of the above We’ll address some review concepts as needed You will have to work on your own as well Class goals Learn the issues and techniques of statistical NLP Build first passes at the real tools used in NLP (language models, taggers, parsers) Be able to read current research papers in the field See where the holes in the field still are!

Course Work Readings: Texts Manning and Schütze (available online) Jurafsky and Martin, 2 nd edition (so hot it’s not yet off the presses!) Papers (on web page) Lectures Discussion (during lecture) Assignments/Grading Written assignments (~20% of your grade) Programming assignments (~40% of your grade) Final project (~40% of your grade) You get 5 late days to use at your discretion After that, you lose 10% per day

Assignments Written assignments will involve linguistics, math, and careful thinking (little or minimal computation) Programming assignments: all of the above plus programming Expect the programming assignments to take more time than the written assignments Final projects are up to your own devising You’ll need to come up with: a model; data to examine; and a computer implementation of the model, fit to the data Start thinking about the project early, and start working on it early In all cases, collaboration is strongly encouraged!

Some BIG Disclaimers The purpose of this class is to train NLP researchers Some people will put in a LOT of time There will be a LOT of reading, some required, some not – doing it all would be time-consuming There will be a LOT of coding and running systems on non-toy data There will be a LOT of statistical modeling (though we do use a few basic techniques heavily) There will be discussion and questions in class that may push past what I’ve presented in lecture, and I’ll answer them Don’t say I didn’t warn you!

Some Early NLP History 1950s: Foundational work: automata, information theory, etc. First speech systems Machine translation (MT) hugely funded by military (imagine that) Toy models: MT using basically word-substitution Optimism! 1960s and 1970s: NLP Winter Bar-Hillel (FAHQT) and ALPAC reports kills MT Work shifts to deeper models, syntax … but toy domains / grammars (SHRDLU, LUNAR) 1980s/1990s: The Empirical Revolution Expectations get reset Corpus-based methods become central Deep analysis often traded for robust and simple approximations Evaluate everything

NLP: Annotation John bought a blue car Much of NLP is annotating text with structure which specifies how it’s assembled. Syntax: grammatical structure Semantics: “meaning,” either lexical or compositional

What Made NLP Hard? The core problems: Ambiguity Sparsity Scale Unmodeled Variables

Problem: Ambiguities Headlines: Iraqi Head Seeks Arms Ban on Nude Dancing on Governor’s Desk Juvenile Court to Try Shooting Defendant Teacher Strikes Idle Kids Stolen Painting Found by Tree Kids Make Nutritious Snacks Local HS Dropouts Cut in Half Hospitals Are Sued by 7 Foot Doctors Why are these funny?

Syntactic Ambiguities Maybe we’re sunk on funny headlines, but normal, boring sentences are unambiguous? Fed raises interest rates 0.5 % in a measure against inflation

Classical NLP: Parsing Write symbolic or logical rules: Use deduction systems to prove parses from words Minimal grammar on “Fed raises” sentence: 36 parses Simple 10-rule grammar: 592 parses Real-size grammar: many millions of parses This scaled very badly, didn’t yield broad-coverage tools Grammar (CFG)Lexicon ROOT  S S  NP VP NP  DT NN NP  NN NNS NN  interest NNS  raises VBP  interest VBZ  raises … NP  NP PP VP  VBP NP VP  VBP NP PP PP  IN NP

Dark Ambiguities Dark ambiguities: most analyses are shockingly bad (meaning, they don’t have an interpretation you can get your mind around) Unknown words and new usages Solution: We need mechanisms to focus attention on the best ones, probabilistic techniques do this This analysis corresponds to the correct parse of “This will panic buyers ! ”

Semantic Ambiguities Even correct tree-structured syntactic analyses don’t always nail down the meaning Every morning someone’s alarm clock wakes me up John’s boss said he was doing better

Other Levels of Language Tokenization/morphology: What are the words, what is the sub-word structure? Often simple rules work (period after “Mr.” isn’t sentence break) Relatively easy in English (text, not speech!), other languages are harder: Segementation (Chinese) Morphology (Hungarian) Discourse: how do sentences relate to each other? Pragmatics: what intent is expressed by the literal meaning, how to react to an utterance? Phonetics: acoustics and physical production of sounds Phonology: how sounds pattern in a language ha:z-unk-b ɔ n house-our-in ‘in our house’

Disambiguation for Applications Sometimes life is easy Can do text classification pretty well just knowing the set of words used in the document, same for authorship attribution Word-sense disambiguation not usually needed for web search because of majority effects or intersection effects (“jaguar habitat” isn’t the car) Sometimes only certain ambiguities are relevant Other times, all levels can be relevant (e.g., translation) he hoped to record a world record

PLURAL NOUN NOUNDET ADJ NOUN NP CONJ NP PP Problem: Scale People did know that language was ambiguous! …but they hoped that all interpretations would be “good” ones (or ruled out pragmatically) …they didn’t realize how bad it would be

Corpora A corpus is a collection of text Often annotated in some way Sometimes just lots of text Balanced vs. uniform corpora Examples Newswire collections: 500M+ words Brown corpus: 1M words of tagged “balanced” text Penn Treebank: 1M words of parsed WSJ Canadian Hansards: 10M+ words of aligned French / English sentences The Web: billions of words of who knows what

Corpus-Based Methods A corpus like a treebank gives us three important tools: It gives us broad coverage ROOT  S S  NP VP. NP  PRP VP  VBD ADJ

Corpus-Based Methods It gives us statistical information “Subject-object asymmetry”: All NPsNPs under SNPs under VP This is a very different kind of subject/object asymmetry than the traditional domain of interest for linguists However, there are connections to recent work with quantitative methods (e.g., Bresnan, Dingare, Manning 2003)

Corpus-Based Methods It lets us check our answers!

Problem: Sparsity However: sparsity is always a problem New unigram (word), bigram (word pair), and rule rates in newswire

The (Effective) NLP Cycle Pick a problem (usually some disambiguation) Get a lot of data (usually a labeled corpus) Build the simplest thing that could possibly work Repeat: See what the most common errors are Figure out what information a human would use Modify the system to exploit that information Feature engineering Representation design Machine learning/statistics We’re going to go through this cycle several times

Language isn’t Adversarial One nice thing: we know NLP can be done! Language isn’t adversarial: It’s produced with the intent of being understood With some understanding of language, you can often tell what knowledge sources are relevant But most variables go unmodeled Some knowledge sources aren’t easily available (real-world knowledge, complex models of other people’s plans) Some kinds of features are beyond our technical ability to model (especially cross-sentence correlations)

What’s Next? Next class: language models (modeling event sequences) Start with very simple models of language, work our way up Some basic statistics concepts that will keep showing up If you don’t know what conditional probabilities and maximum-likelihood estimators are, read up! (M&S chapter 2) Reading for next time: M&S 6 (online), J&M 4 (handout) Programming assignment 1 will go out soon