COSC 6336 Natural Language Processing

Slides:



Advertisements
Similar presentations
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Advertisements

Natural Language Processing Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Fall 2013.
1 Introduction to Natural Language Processing (Lecture for CS410 Text Information Systems) Jan 28, 2011 ChengXiang Zhai Department of Computer Science.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Introduction to Computational Linguistics Lecture 2.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
1 Pacific University Sheldon Liang, Ph D Computer Science Department.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes April 3and5, 2012.
ELN – Natural Language Processing Giuseppe Attardi
Fall 2004 Natural Language Processing Rada Mihalcea.
9/8/20151 Natural Language Processing Lecture Notes 1.
Natural Language Processing Rada Mihalcea Fall 2008.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Natural Language Processing Introduction. Any Light at The End of The Tunnel ? Yahoo, Google, Microsoft  Information Retrieval Monster.com, HotJobs.com.
Language Technology I © 2005 Hans Uszkoreit Language Technology I 2005/06 Hans Uszkoreit Universität des Saarlandes and German Research Center for Artificial.
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
Introduction to CL & NLP CMSC April 1, 2003.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
Introduction to Dialogue Systems. User Input System Output ?
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Natural Language Processing Chapter 1 : Introduction.
Data Mining: Text Mining
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
Natural Language Processing Tasneem Ghnaimat Spring 2013.
Natural Language Processing
Computational UIUC Lane Schwartz Student Orientation August 18, 2016.
Information Retrieval and Web Search
Natural Language Processing (NLP)
Natural Language Processing
Information Retrieval and Web Search
Information Retrieval and Web Search
Text Analytics Giuseppe Attardi Università di Pisa
Machine Learning in Natural Language Processing
Automatic Detection of Causal Relations for Question Answering
How to publish in a format that enhances literature-based discovery?
Introduction to Machine Translation
Natural Language Processing
Lecture 8 Information Retrieval Introduction
CS246: Information Retrieval
A User study on Conversational Software
Natural Language Processing (NLP)
CS224N Section 3: Corpora, etc.
Information Retrieval and Web Search
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Extracting Why Text Segment from Web Based on Grammar-gram
Natural Language Processing (NLP) Chapter One Introduction to Natural Language Processing(NLP)
Natural Language Processing (NLP)
Presentation transcript:

COSC 6336 Natural Language Processing Fall 2016 Thamar Solorio Content on these slides was adapted from Rada Mihalcea

What this course is about

Why Natural Language Processing ? Huge amounts of data Internet = at least 14.5 trillion web pages Applications for processing large amounts of texts require NLP expertise Classify text into categories Index and search large texts Automatic translation Speech understanding Understand phone conversations Information extraction Extract useful information from resumes Automatic summarization Condense 1 book into 1 page Question answering Knowledge acquisition Text generation / dialogues

Natural? Natural Language? Natural Language Processing Refers to the language spoken by people, e.g. English, Japanese, Swahili, as opposed to artificial languages, like C++, Java, etc. Natural Language Processing Applications that deal with natural language in a way or another [Computational Linguistics Doing linguistics on computers More on the linguistic side than NLP, but closely related ]

Why Natural Language Processing? kJfmmfj mmmvvv nnnffn333 Uj iheale eleee mnster vensi credur Baboi oi cestnitze Coovoel2^ ekk; ldsllk lkdf vnnjfj? Fgmflmllk mlfm kfre xnnn!

Computers Lack Knowledge! People have no trouble understanding language Common sense knowledge Reasoning capacity Experience Computers have No common sense knowledge No reasoning capacity

Where does it fit in the CS taxonomy? Computers Databases Artificial Intelligence Algorithms Networking Robotics Natural Language Processing Search Information Retrieval Machine Translation Language Analysis Semantics Parsing

Linguistic Levels of Analysis Speech Written language Phonology: sounds / letters / pronunciation Morphology: the structure of words Syntax: how these sequences are structured Semantics: meaning of the strings Pragmatics: discourse Interaction between levels

Issues in Syntax “the dog ate my homework” - Who did what? Identify the part of speech (POS) Dog = noun ; ate = verb ; homework = noun English POS tagging: 95% 2. Identify collocations mother in law, hot dog

Issues in Syntax Shallow parsing: “the dog chased the bear” subject - predicate Identify basic structures NP-[the dog] VP-[chased the bear]

Issues in Syntax Full parsing: John loves Mary Help figuring out (automatically) questions like: Who did what and when?

More Issues in Syntax Anaphora Resolution: “The dog entered my room. It scared me” Preposition Attachment “I saw the man in the park with a telescope”

Issues in Semantics Understand language! How? “plant” = industrial plant “plant” = living organism Words are ambiguous Importance of semantics? Machine Translation: wrong translations Information Retrieval: wrong information Anaphora Resolution: wrong referents

Why Semantics? I am sick as an ostrich (I’m bored as an ostrich) English  Finnish  English

Issues in Semantics How to learn the meaning of words? From dictionaries: plant, works, industrial plant -- (buildings for carrying on industrial labor; "they built a large plant to manufacture automobiles") plant, flora, plant life -- (a living organism lacking the power of locomotion) They are producing about 1,000 automobiles in the new plant The sea flora consists of 1,000 different plant species The plant was close to the animal farm

Issues in Semantics Learn from annotated examples: Assume 100 examples containing “plant” previously tagged by a human Train a learning algorithm How to choose the learning algorithm? How to obtain the 100 tagged examples?

Issues in Learning Semantics Assume a (large) amount of annotated data = training Assume a new text not annotated = test Learn from previous experience (training) to classify new data (test) Decision trees, memory based learning, neural networks Machine Learning

Issues in Information Extraction “There was a group of about 8-9 people close to the entrance on Highway 290” Who? “8-9 people” Where? “highway 290” Extract information Detect new patterns: Detect hacking / hidden information / etc. Gov./mil. puts lots of money into IE research

Issues in Information Retrieval General model: A huge collection of texts A query Task: find documents that are relevant to the given query How? Create an index, like the index in a book More … Vector-space models Boolean models Examples: Google, Yahoo, Baidu, etc.

Issues in Information Retrieval Retrieve specific information Question Answering “What is the height of mount Everest?” 11,000 feet

Issues in Information Retrieval Find information across languages! Cross Language Information Retrieval “What is the minimum age requirement for car rental in Italy?” Search also Italian texts for “eta minima per noleggio macchine” Integrate large number of languages Integrate into performant IR engines

Issues in Machine Translation Text to Text Machine Translation Speech to Speech Machine Translation Most of the work has addressed pairs of widely spread languages like English-French, English-Chinese

Issues in Machine Translation How to translate text? Learn from previously translated data  Need parallel corpora French-English, Chinese-English have the Hansards Reasonable translations Chinese-Hindi – no such tools available today!

Even More Discourse Summarization Subjectivity and sentiment analysis Text generation, dialog [pass the Turing test for some million dollars] – Loebner prize Knowledge acquisition [how to get that common sense knowledge] Speech processing

What will we study this semester? Some linguistic basics Structure of English Parts of speech, phrases, parsing N-grams Also multi-word expressions Part of speech tagging Syntactic parsing Semantics Word sense disambiguation Machine Translation Other higher level NLP tasks

Logistics Piazza website: piazza.com/uh/fall2016/cosc6336/home Weekly reading assignments from J&M, and/or NLTK Link to the 3rd edition of J&M: http://web.stanford.edu/~jurafsky/slp3/