I256 Applied Natural Language Processing Fall 2009 Lecture 1 Introduction Barbara Rosario.

Slides:



Advertisements
Similar presentations
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Advertisements

1 Text categorization Feature selection: chi square test.
Methods in Computational Linguistics II Queens College Lecture 1: Introduction.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Natural Language Processing Prof: Jason Eisner Webpage: syllabus, announcements, slides, homeworks.
Lecture 1 Introduction: Linguistic Theory and Theories
(C) 2000, The University of Michigan 1 Database Application Design Handout #11 March 24, 2000.
Natural Language Processing Ellen Back, LIS489, Spring 2015.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
ELN – Natural Language Processing Giuseppe Attardi
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
9/8/20151 Natural Language Processing Lecture Notes 1.
Introduction to Natural Language Processing Heshaam Faili University of Tehran.
1 Ling 569: Introduction to Computational Linguistics Jason Eisner Johns Hopkins University Tu/Th 1:30-3:20 (also this Fri 1-5)
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
CHAPTER 13 NATURAL LANGUAGE PROCESSING. Machine Translation.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Natural language processing tools Lê Đức Trọng 1.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
ICS 482: Natural language Processing Pre-introduction
Natural Language Processing Menu Based Natural Language Interfaces -Kyle Neumeier.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Language and Statistics
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Introduction Chapter 1 Foundations of statistical natural language processing.
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
CS 188: Artificial Intelligence Spring 2009 Natural Language Processing Dan Klein – UC Berkeley 1.
Information Retrieval
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
October 2005CSA3180: Text Processing II1 CSA3180: Natural Language Processing Text Processing 2 Python and NLTK Shallow Parsing and Chunking NLTK Lite.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Computer Science I ISMAIL ABUMUHFOUZ | CS 180. CS 180 Description BRIEF SUMMARY: This course covers a study of the algorithmic approach and the object.
Problem Solving with NLTK MSE 2400 EaLiCaRA Dr. Tom Way.
Natural Language Processing (NLP)
Text Analytics Giuseppe Attardi Università di Pisa
Machine Learning in Natural Language Processing
WORDS Lab CSC 9010: Special Topics. Natural Language Processing.
Language and Statistics
CSE 635 Multimedia Information Retrieval
Language and Learning Introduction to Artificial Intelligence COS302
Natural Language Processing (NLP)
Information Retrieval
Natural Language Processing (NLP)
Presentation transcript:

I256 Applied Natural Language Processing Fall 2009 Lecture 1 Introduction Barbara Rosario

Introductions Barbara Rosario –iSchool alumni (class 2005) –Intel Labs Gopal Vaswani –iSchool master student (class 2010) You?

Today Introductions Administrivia What is NLP NLP Applications Why is NLP difficult Corpus-based statistical approaches Course goals What we’ll do in this course

Administrivia Books: –Foundations of Statistical NLP, Manning and Schuetze, MIT pressFoundations of Statistical NLP –Natural Language Processing with Python, Bird, Klein & Loper, O'Reilly. (also on line)Natural Language Processing with Python –See Web site for additional resources Work: –Individual coding assignments (Python & NLTK-Natural Language Toolkit) (4 or 5) –Final group project –Participation Office hours: –Barbara: Thursday 2:00-3:00 in Room 6 –Gopal: Tuesday 2:00-3:00 in Room 6 (to be confirmed)

Administrivia Communication: –My –Gopal : –Mailing list: Send an to with subscribe i256 in the Through intranet –Announcements: webpage and/or mailing list and/or Bspace (TBA) –Public discussion: Bspace(?) Related course: Statistical Natural Language Processing, Spring 2009, CS 288 – –Instructor: Dan Klein –Much more emphasis on statistical algorithms Questions?

Natural Language Processing Fundamental goal: deep understand of broad language –Not just string processing or keyword matching! End systems that we want to build: –Ambitious: speech recognition, machine translation, question answering… –Modest: spelling correction, text categorization… Slide taken from Klein’s course: UCB CS 288 spring 09

Example: Machine Translation

NLP applications Text Categorization –Classify documents by topics, language, author, spam filtering, information retrieval (relevant, not relevant), sentiment classification (positive, negative) Spelling & Grammar Corrections Information Extraction Speech Recognition Information Retrieval –Synonym Generation Summarization Machine Translation Question Answering Dialog Systems –Language generation

Why NLP is difficult A NLP system needs to answer the question “who did what to whom” Language is ambiguous –At all levels: lexical, phrase, semantic –Iraqi Head Seeks Arms Word sense is ambiguous (head, arms) –Stolen Painting Found by Tree Thematic role is ambiguous: tree is agent or location? –Ban on Nude Dancing on Governor’s Desk Syntactic structure (attachment) is ambiguous: is the ban or the dancing on the desk? –Hospitals Are Sued by 7 Foot Doctors Semantics is ambiguous : what is 7 foot?

Why NLP is difficult Language is flexible –New words, new meanings –Different meanings in different contexts Language is subtle –He arrived at the lecture –He chuckled at the lecture –He chuckled his way through the lecture –**He arrived his way through the lecture Language is complex!

Why NLP is difficult MANY hidden variables –Knowledge about the world –Knowledge about the context –Knowledge about human communication techniques Can you tell me the time? Problem of scale –Many (infinite?) possible words, meanings, context Problem of sparsity –Very difficult to do statistical analysis, most things (words, concepts) are never seen before Long range correlations

Why NLP is difficult Key problems: –Representation of meaning –Language presupposes knowledge about the world –Language only reflects the surface of meaning –Language presupposes communication between people

Meaning What is meaning? –Physical referent in the real world –Semantic concepts, characterized also by relations. How do we represent and use meaning –I am Italian From lexical database (WordNet) Italian =a native or inhabitant of Italy  Italy = republic in southern Europe [..] –I am Italian Who is “I”? –I know she is Italian/I think she is Italian How do we represent “I know” and “I think” Does this mean that I is Italian? What does it say about the “I” and about the person speaking? –I thought she was Italian How do we represent tenses?

Today Introductions Administrivia What is NLP NLP Applications Why is NLP difficult Corpus-based statistical approaches Course goals What we’ll do in this course

Corpus-based statistical approaches to tackle NLP problem –How can a can a machine understand these differences? Decorate the cake with the frosting Decorate the cake with the kids –Rules based approaches, i.e. hand coded syntactic constraints and preference rules: The verb decorate require an animate being as agent The object cake is formed by any of the following, inanimate entities (cream, dough, frosting…..) –Such approaches have been showed to be time consuming to build, do not scale up well and are very brittle to new, unusual, metaphorical use of language To swallow requires an animate being as agent/subject and a physical object as object –I swallowed his story –The supernova swallowed the planet

Corpus-based statistical approaches to tackle NLP problem A Statistical NLP approach seeks to solve these problems by automatically learning lexical and structural preferences from text collections (corpora) Statistical models are robust, generalize well and behave gracefully in the presence of errors and new data. So: –Get large text collections –Compute statistics over those collections –(The bigger the collections, the better the statistics)

Corpus-based statistical approaches to tackle NLP problem Decorate the cake with the frosting Decorate the cake with the kids From (labeled) corpora we can learn that: #(kids are subject/agent of decorate) > #(frosting is subject/agent of decorate) From (UN-labeled) corpora we can learn that: #(“the kids decorate the cake”) >> #(“the frosting decorates the cake”) #(“cake with frosting”) >> #(“cake with kids”) etc.. Given these “facts” we then need a statistical model for the attachment decision

Corpus-based statistical approaches to tackle NLP problem Topic categorization: classify the document into semantics topics Document 1 The U.S. swept into the Davis Cup final on Saturday when twins Bob and Mike Bryan defeated Belarus's Max Mirnyi and Vladimir Voltchkov to give the Americans an unsurmountable 3-0 lead in the best-of-five semi-final tie. Topic = sport Document 2 One of the strangest, most relentless hurricane seasons on record reached new bizarre heights yesterday as the plodding approach of Hurricane Jeanne prompted evacuation orders for hundreds of thousands of Floridians and high wind warnings that stretched 350 miles from the swamp towns south of Miami to the historic city of St. Augustine. Topic = disaster

Corpus-based statistical approaches to tackle NLP problem Topic categorization: classify the document into semantics topics Document 1 (sport) The U.S. swept into the Davis Cup final on Saturday when twins Bob and Mike Bryan … Document 2 (disasters) One of the strangest, most relentless hurricane seasons on record reached new bizarre heights yesterday as…. From (labeled) corpora we can learn that: #(sport documents containing word Cup) > #(disaster documents containing word Cup) -- feature We then need a statistical model for the topic assignment

Corpus-based statistical approaches to tackle NLP problem Feature extractions (usually linguistics motivated) Statistical models Data (corpora, labels, linguistic resources)

Goals of this Course Learn about the problems and possibilities of natural language analysis: –What are the major issues? –What are the major solutions? At the end you should: –Agree that language is difficult, interesting and important –Be able to assess language problems Know which solutions to apply when, and how Feel some ownership over the algorithms –Be able to use software to tackle some NLP language tasks –Know language resources –Be able to read papers in the field

What We’ll Do in this Course Linguistic Issues –What are the range of language phenomena? –What are the knowledge sources that let us disambiguate? –What representations are appropriate? Applications Software (Python and NLTK) Statistical Modeling Methods

What We’ll Do in this Course Read books, research papers and tutorials Final project –Your own ideas or chose from some suggestions I will provide –We’ll talk later during the couse about ideas/methods etc. but come talk to me if you have already some ideas Learn Python Learn/use NLTK (Natural Language ToolKit) to try out various algorithms

Python - Simple yet powerful The zen of python : The zen of python Very clear, readable syntax Strong introspection capabilities – (recommended) Intuitive object orientation Natural expression of procedural code Full modularity, supporting hierarchical packages Exception-based error handling Very high level dynamic data types Extensive standard libraries and third party modules for virtually every task –Excellent functionality for processing linguistic data. –NLTK is one such extensive third party module. Source : python.org Python

Source : nltk.org Language processing taskNLTK modulesFunctionality Accessing corporanltk.corpusstandardized interfaces to corpora and lexicons String processingnltk.tokenize, nltk.stemtokenizers, sentence tokenizers, stemmers Collocation discoverynltk.collocationst-test, chi-squared, point-wise mutual information Part-of-speech taggingnltk.tagn-gram, backoff, Brill, HMM, TnT Classificationnltk.classify, nltk.clusterdecision tree, maximum entropy, naive Bayes, EM, k-means Chunkingnltk.chunkregular expression, n-gram, named-entity Parsingnltk.parsechart, feature-based, unification, probabilistic, dependency This is not the complete list NLTK NLTK defines an infrastructure that can be used to build NLP programs in Python. It provides basic classes for representing data relevant to natural language processing. Standard interfaces for performing tasks such as part-of-speech tagging, syntactic parsing, and text classification. Standard implementations for each task which can be combined to solve complex problems. Resources: Download at Getting started with NLTK Chapter 1 NLP and NLTK talk at google

Topics Text corpora & other resources Words (Morphology, tokenization, stemming, part-of- speech, WSD, collocations, lexical acquisition, language models) Syntax: chunking, PCFG & parsing Statistical models (esp. for classification) Applications –Text classification –Information extraction –Machine translation –Semantic Interpretation –Sentiment Analysis –QA / Summarization –Information retrieval

Next Assignment Due before next class Tue Sep 1 –No turn-in Download and install Python and NLTK Download the NLTK Book Collection, as described at the beginning of chapter 1 of the book Natural Language Processing with Pythonchapter 1Natural Language Processing with Python Readings: –Chapter 1 of the book Natural Language Processing with PythonChapter 1Natural Language Processing with Python –Chapter 3 of Foundations of Statistical NLP Next class: –Linguistic Essentials –Python Introduction