Natural Language Processing Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Fall 2013.

Slides:



Advertisements
Similar presentations
Natural Language Processing. According to research at an Elingsh uinervtisy, it deosn’t mttaer in what oredr the ltteers in a wrod are, the olny iprmoetnt.
Advertisements

For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
1 Introduction to Natural Language Processing (Lecture for CS410 Text Information Systems) Jan 28, 2011 ChengXiang Zhai Department of Computer Science.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
Introduction to Computational Linguistics Lecture 2.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Artificial Intelligence. Agenda StartEnd Introduction AI Future Recent Developments Turing Test Turing Test Evaluation.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
1 Pacific University Sheldon Liang, Ph D Computer Science Department.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes April 3and5, 2012.
ELN – Natural Language Processing Giuseppe Attardi
Fall 2004 Natural Language Processing Rada Mihalcea.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
Introduction to NLP.
9/8/20151 Natural Language Processing Lecture Notes 1.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Natural Language Processing Rada Mihalcea Fall 2008.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
1 Natural Language Processing Gholamreza Ghassem-Sani.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Natural Language Processing Introduction. Any Light at The End of The Tunnel ? Yahoo, Google, Microsoft  Information Retrieval Monster.com, HotJobs.com.
CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.
Introduction to CL & NLP CMSC April 1, 2003.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
ICS 482: Natural language Processing Pre-introduction
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Translingual Information Management Stephan Busemann Language Technology Lab German Research Center for Artificial Intelligence.
Natural Language Processing Chapter 1 : Introduction.
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Computational Linguistics Courses Experiment Test.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Basics of Natural Language Processing Introduction to Computational Linguistics.
Natural Language Processing Tasneem Ghnaimat Spring 2013.
COSC 6336 Natural Language Processing
Natural Language Processing [05 hours/week, 09 Credits] [Theory]
Natural Language Processing
ITCS 6157/8157: Visual Database
Computational UIUC Lane Schwartz Student Orientation August 18, 2016.
Natural Language Processing
Machine Learning in Natural Language Processing
CSE 635 Multimedia Information Retrieval
CS246: Information Retrieval
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Presentation transcript:

Natural Language Processing Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Fall 2013

Any Light at The End of The Tunnel ? Yahoo, Google, Microsoft  Information Retrieval Monster.com, HotJobs.com (Job finders)  Information Extraction + Information Retrieval Systran powers Babelfish  Machine Translation Ask Jeeves  Question Answering Myspace, Facebook, Blogspot  Processing of User-Generated Content Tools for “ business intelligence ” All “ Big Guys ” have (several) strong NLP research labs: –IBM, Microsoft, AT&T, Xerox, Sun, etc. Academia: research in an university environment

Why Natural Language Processing ? Huge amounts of data –Internet = at least 20 billions pages –Intranet Applications for processing large amounts of texts require NLP expertise Classify text into categories Index and search large texts Automatic translation Speech understanding –Understand phone conversations Information extraction –Extract useful information from resumes Automatic summarization –Condense 1 book into 1 page Question answering Knowledge acquisition Text generations / dialogues

Natural? Natural Language? –Refers to the language spoken by people, e.g. English, Japanese, Swahili, as opposed to artificial languages, like C++, Java, etc. Natural Language Processing –Applications that deal with natural language in a way or another [Computational Linguistics –Doing linguistics on computers –More on the linguistic side than NLP, but closely related ]

Why Natural Language Processing? kJfmmfj mmmvvv nnnffn333 Uj iheale eleee mnster vensi credur Baboi oi cestnitze Coovoel2^ ekk; ldsllk lkdf vnnjfj? Fgmflmllk mlfm kfre xnnn!

Computers Lack Knowledge! Computers “ see ” text in English the same you have seen the previous text! People have no trouble understanding language –Common sense knowledge –Reasoning capacity –Experience Computers have –No common sense knowledge –No reasoning capacity

Where does it fit in the CS taxonomy? Computers Artificial Intelligence AlgorithmsDatabasesNetworking Robotics Search Natural Language Processing Information Retrieval Machine Translation Language Analysis SemanticsParsing

Linguistics Levels of Analysis Speech Written language –Phonology: sounds / letters / pronunciation –Morphology: the structure of words –Syntax: how these sequences are structured –Semantics: meaning of the strings Interaction between levels

Issues in Syntax “ the dog ate my homework ” - Who did what? 1.Identify the part of speech (POS) Dog = noun ; ate = verb ; homework = noun English POS tagging: 95% 2. Identify collocations mother in law, hot dog Compositional versus non-compositional collocates

Issues in Syntax Shallow parsing: “ the dog chased the bear ” “ the dog ” “ chased the bear ” subject - predicate Identify basic structures NP-[the dog] VP-[chased the bear]

Issues in Syntax Full parsing: John loves Mary Help figuring out (automatically) questions like: Who did what and when?

More Issues in Syntax Anaphora Resolution: “ The dog entered my room. It scared me ” Preposition Attachment “ I saw the man in the park with a telescope ”

Issues in Semantics Understand language! How? “ plant ” = industrial plant “ plant ” = living organism Words are ambiguous Importance of semantics? –Machine Translation: wrong translations –Information Retrieval: wrong information –Anaphora Resolution: wrong referents

The sea is at the home for billions factories and animals The sea is home to million of plants and animals English  French [commercial MT system] Le mer est a la maison de billion des usines et des animaux French  English Why Semantics?

Issues in Semantics How to learn the meaning of words? From dictionaries: plant, works, industrial plant -- (buildings for carrying on industrial labor; "they built a large plant to manufacture automobiles") plant, flora, plant life -- (a living organism lacking the power of locomotion) They are producing about 1,000 automobiles in the new plant The sea flora consists in 1,000 different plant species The plant was close to the farm of animals.

Issues in Semantics Learn from annotated examples: –Assume 100 examples containing “ plant ” previously tagged by a human –Train a learning algorithm –How to choose the learning algorithm? –How to obtain the 100 tagged examples?

Issues in Information Extraction “ There was a group of about 8-9 people close to the entrance on Highway 75 ” Who? “ 8-9 people ” Where? “ highway 75 ” Extract information Detect new patterns: –Detect hacking / hidden information / etc. Gov./mil. puts lots of money put into IE research

Issues in Information Retrieval General model: –A huge collection of texts –A query Task: find documents that are relevant to the given query How? Create an index, like the index in a book More … –Vector-space models –Boolean models Examples: Google, Yahoo, Altavista, etc.

Issues in Information Retrieval Retrieve specific information Question Answering “ What is the height of mount Everest? ” 11,000 feet

Issues in Information Retrieval Find information across languages! Cross Language Information Retrieval “ What is the minimum age requirement for car rental in Italy? ” Search also Italian texts for “ eta minima per noleggio macchine ” Integrate large number of languages Integrate into performant IR engines

Issues in Machine Translations Text to Text Machine Translations Speech to Speech Machine Translations Most of the work has addressed pairs of widely spread languages like English-French, English- Chinese

Issues in Machine Translations How to translate text? –Learn from previously translated data  Need parallel corpora French-English, Chinese-English have the Hansards Reasonable translations Chinese-Hindi – no such tools available today!

Even More Discourse Summarization Subjectivity and sentiment analysis Text generation, dialog [pass the Turing test for some million dollars] – Loebner prize Knowledge acquisition [how to get that common sense knowledge] Speech processing

What will we study this semester? Intro to Perl –Great great for text processing –Fast: one person can do the work of ten others –Easy to pick up Some linguistic basics –Structure of English –Parts of speech, phrases, parsing Morphology N-grams –Also multi-word expressions Part of speech tagging Syntactic parsing Semantics –Word sense disambiguation –Semantic relations

What will we study this semester? Information Retrieval Question answering Text classification Text summarization Sentiment analysis Depending on time, we may touch on –Speech recognition –Dialogue –Text generation –Other topics of your interest

Administrivia Textbook: Speech and Language Processing, by Jurafsky and Martin (2 nd edition) –Recommended: Statistical Methods in NLP, by Manning and Schutze Other readings (papers) may be assigned throughout the semester Grading: Assignments, 2 exams, term project –Late submission policy for assignments: can submit up to three days late, with 10% penalty / day