1 SIMS 290-2: Applied Natural Language Processing Marti Hearst August 30, 2004.

Slides:



Advertisements
Similar presentations
Future Problem Solving Training
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Natural Language Processing aka Computational Linguistics aka Text Analytics:
1 I256: Applied Natural Language Processing Marti Hearst Aug 30, 2006.
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Choosing a Topic and Developing Research Questions
Study What’s that? Why? How?. School does not “do stuff” to you You do stuff to school – it is active You do stuff to school – it is active This is about.
Text Similarity David Kauchak CS457 Fall 2011.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Leksička semantika i pragmatika 6. predavanje. Headlines Police Begin Campaign To Run Down Jaywalkers Iraqi Head Seeks Arms Teacher Strikes Idle Kids.
Oct 2009HLT1 Human Language Technology Overview. Oct 2009HLT2 Acknowledgement Material for some of these slides taken from J Nivre, University of Gotheborg,
Drawing Trees & Ambiguity in Trees. Some Phrase Structure Rules of English S’ -> (Comp) S S’ -> (Comp) S S -> {NP/S’} (T) VP S -> {NP/S’} (T) VP VP 
Most Frequent Grammar Mistakes Solved!. Hers Hers is the third person singular feminine possessive pronoun - it replaces "her" + noun. Is this his or.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
1 Computer Processing of Natural Language Prof. Hearst i141 November 26, 2008.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
1 I256: Applied Natural Language Processing Marti Hearst August 28, 2006.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
1 Natural Language Processing for the Web Prof. Kathleen McKeown 722 CEPSR, Office Hours: Wed, 1-2; Mon 3-4 TA: Fadi Biadsy 702 CEPSR,
Project Report1 Dave Inman Project report. Project Report2 Ways to write a report Top down: Write the structure of the report (maybe use the web templates.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
Meaning Vocabulary Ch. 6 Closely related to comprehension.
Natural Language Processing Prof: Jason Eisner Webpage: syllabus, announcements, slides, homeworks.
Programming. Software is made by programmers Computers need all kinds of software, from operating systems to applications People learn how to tell the.
How do I know the differences and uses of keyword versus subject searching in a database?
Overview of Search Engines
Natural Language Processing Ellen Back, LIS489, Spring 2015.
1 CS101 Introduction to Computing Lecture 19 Programming Languages.
SI485i : NLP Day 1 Intro to NLP. Assumptions about You You know… how to program Java basic UNIX usage basic probability and statistics (we’ll also review)
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
RESEARCHING TIPS & STRATEGIES Summer 2008 Melanie Wilson Academic Success Center MSC 207.
Tux Paint Reviewed by team iTeach Jodi Hovest, Scottie Fetters, & Melanie Stainbrook.
9/8/20151 Natural Language Processing Lecture Notes 1.
1 Ling 569: Introduction to Computational Linguistics Jason Eisner Johns Hopkins University Tu/Th 1:30-3:20 (also this Fri 1-5)
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
How to Teach Using Go for it! An Introduction. Each unit of the Go for it! textbook has the following: Language goals that are listed in the Teachers’
1 Computational Linguistics Ling 200 Spring 2006.
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
How to make your writing class INTERESTING Do you recognise yourself as one of these teachers? Tick the box if you agree with the statement. My students.
CHAPTER 13 NATURAL LANGUAGE PROCESSING. Machine Translation.
Introduction to CL & NLP CMSC April 1, 2003.
1 Overview of Class #7 Teaching Segment #3: Warm-up problem Introduction to base-ten blocks Analysis of student thinking using samples of students’ written.
NACLO 2008 North American Computation Linguistics Olympiad Brandeis CL Olympiad Team James Pustejovsky Tai Sassen-Liang Sharone Horowit-Hendler Noam Sienna.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
Lesson 2 Artificial Intelligence Lesson 2 Artificial Intelligence.
By: Mrs. Abdallah. The way we taught students in the past simply does not prepare them for the higher demands of college and careers today and in the.
Introduction Chapter 1 Foundations of statistical natural language processing.
CS 188: Artificial Intelligence Spring 2009 Natural Language Processing Dan Klein – UC Berkeley 1.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
Drawing Trees & Ambiguity in Trees
Fall CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
BCM U-PHONE ENGLISH NEWSPAPER COURSE LESSON GUIDELINE.
Bloom’s Taxonomy Dr. Middlebrooks. Bloom’s Taxonomy.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Comprehension in KS2. By the end of the session  Understand what inference and deduction are.  Know why inference and deduction are important skills.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Language Learning for Busy People These documents are private and confidential. Please do not distribute.. Intermediate: I Disagree.
Key Stage 2 SATs Parents’ Meeting Wednesday 4 th March 2015.
/665 Natural Language Processing
Artificial Intelligence
What the problem looks like:
Strategies for Multiplication
Presentation transcript:

1 SIMS 290-2: Applied Natural Language Processing Marti Hearst August 30, 2004

2 Today Motivation: SIMS student projects Course Goals Why NLP is difficult How to solve it? Corpus-based statistical approaches What we’ll do in this course

3 ANLP Motivation: SIMS Masters Projects Breaking Story (2002) Summarize trends in news feeds Needs categories and entities assigned to all news articles BriefBank (2002) System for entering legal briefs Needs a topic category system for browsing Chronkite (2003) Personalized RSS feeds Needs categories and entities assigned to all web pages Paparrazi (2004) Analysis of blog activity Needs categories assigned to blog content

4

5

6

7

8

9

10 Goals of this Course Learn about the problems and possibilities of natural language analysis: What are the major issues? What are the major solutions? –How well do they work –How do they work (but to a lesser extent than CS ) At the end you should: Agree that language is subtle and interesting! Feel some ownership over the algorithms Be able to assess NLP problems –Know which solutions to apply when, and how Be able to read papers in the field

11 Today Motivation: SIMS student projects Course Goals Why NLP is difficult How to solve it? Corpus-based statistical approaches What we’ll do in this course

12 We’ve past the year 2001, but we are not close to realizing the dream (or nightmare …)

Dave Bowman: “Open the pod bay doors, HAL” HAL 9000: “I’m sorry Dave. I’m afraid I can’t do that.”

14 Why is NLP difficult? Computers are not brains There is evidence that much of language understanding is built-in to the human brain Computers do not socialize Much of language is about communicating with people Key problems: Representation of meaning Language presupposed knowledge about the world Language only reflects the surface of meaning Language presupposes communication between people

15 Adapted from Robert Berwick's 6.863J Hidden Structure English plural pronunciation Toy + s  toyz; add z Book + s  books; add s Church + s  churchiz; add iz Box + s  boxiz; add iz Sheep + s  sheep; add nothing What about new words? Bach+ ‘s  boxs; why not boxiz?

16 Language subtleties Adjective order and placement A big black dog A big black scary dog A big scary dog A scary big dog A black big dog Antonyms Which sizes go together? –Big and little –Big and small –Large and small Large and little

17 Adapted from Robert Berwick's 6.863J World Knowledge is subtle He arrived at the lecture. He chuckled at the lecture. He arrived drunk. He chuckled drunk. He chuckled his way through the lecture. He arrived his way through the lecture.

18 Adapted from Robert Berwick's 6.863J Words are ambiguous (have multiple meanings) I know that. I know that block. I know that blocks the sun. I know that block blocks the sun.

19 Adapted from Robert Berwick's 6.863J Headline Ambiguity Iraqi Head Seeks Arms Juvenile Court to Try Shooting Defendant Teacher Strikes Idle Kids Kids Make Nutritious Snacks British Left Waffles on Falkland Islands Red Tape Holds Up New Bridges Bush Wins on Budget, but More Lies Ahead Hospitals are Sued by 7 Foot Doctors

20 The Role of Memorization Children learn words quickly As many as 9 words/day Often only need one exposure to associate meaning with word –Can make mistakes, e.g., overgeneralization “I goed to the store.” Exactly how they do this is still under study

21 The Role of Memorization Dogs can do word association too! Rico, a border collie in Germany Knows the names of each of 100 toys Can retrieve items called out to him with over 90% accuracy. Can also learn and remember the names of unfamiliar toys after just one encounter, putting him on a par with a three-year-old child.

22 Adapted from Robert Berwick's 6.863J But there is too much to memorize! establish establishment the church of England as the official state church. disestablishment antidisestablishment antidisestablishmentarian antidisestablishmentarianism is a political philosophy that is opposed to the separation of church and state.

23 Rules and Memorization Current thinking in psycholinguistics is that we use a combination of rules and memorization However, this is very controversial Mechanism: If there is an applicable rule, apply it However, if there is a memorized version, that takes precedence. (Important for irregular words.) –Artists paint “still lifes”  Not “still lives” –Past tense of  think  thought  blink  blinked This is a simplification; for more on this, see Pinker’s “Words and Language” and “The Language Instinct”.

24 Representation of Meaning I know that block blocks the sun. How do we represent the meanings of “block”? How do we represent “I know”? How does that differ from “I know that.”? Who is “I”? How do we indicate that we are talking about earth’s sun vs. some other planet’s sun? When did this take place? What if I move the block? What if I move my viewpoint? How do we represent this?

25 How to tackle these problems? The field was stuck for quite some time. A new approach started around 1990 Well, not really new, but the first time around, in the 50’s, they didn’t have the text, disk space, or GHz Main idea: combine memorizing and rules How to do it: Get large text collections (corpora) Compute statistics over the words in those collections Surprisingly effective Even better now with the Web

26 Corpus-based Example: Pre-Nominal Adjective Ordering Important for translation and generation Examples: big fat Greek wedding fat Greek big wedding Some approaches try to characterize this as semantic rules, e.g.: Age < color, value < dimension Data-intensive approaches Assume adjective ordering is independent of the noun they modify Compare how often you see {a, b} vs {b, a} Keller & Lapata, “The Web as Baseline”, HLT-NAACL’04

27 Corpus-based Example: Pre-Nominal Adjective Ordering Data-intensive approaches Compare how often you see {a, b} vs {b, a} What happens when you encounter an unseen pair? –Shaw and Hatzivassiloglou ’99 use transitive closutres –Malouf ’00 uses a back-off bigram model  P( |{a,b}) vs. P( |{a,b})  He also uses morphological analysis, semantic similarity calculations and positional probabilities Keller and Lapata ’04 use just the very simple algorithm –But they use the web as their training set –Gets 90% accuracy on 1000 sequences –As good as or better than the complex algorithms Keller & Lapata, “The Web as Baseline”, HLT-NAACL’04

28 Adapted from Robert Berwick's 6.863J Real-World Applications of NLP Spelling Suggestions/Corrections Grammar Checking Synonym Generation Information Extraction Text Categorization Automated Customer Service Speech Recognition (limited) Machine Translation In the (near?) future: Question Answering Improving Web Search Engine results Automated Metadata Assignment Online Dialogs

29 NLP in the Real World Synonym generation for Suggesting advertising keywords Suggesting search result refinement and expansion

30 Synonym Generation

31 Synonym Generation

32 Synonym Generation

33 Synonym Generation

34 What We’ll Do in this Course Read research papers and tutorials Use NLTK (Natural Language ToolKit) to try out various algorithms Some homeworks will be to do some NLTK exercises Three mini-projects Two involve a selected collection The third is your choice, can also be on the selected collection

35 What We’ll Do in this Course Adopt a large text collection Use a wide range of NLP techniques to process it Release the results for others to use

36 Which Text Collection?

37 How to analyze a big collection? Your ideas go here

38 Python A terrific language Interpreted Object-oriented Easy to interface to other things (web, DBMS, TK) Good stuff from: java, lisp, tcl, perl Easy to learn –I learned it this summer by reading Learning Python FUN!

39 Questions?