Natural Language Processing

Slides:



Advertisements
Similar presentations
Introduction to Computational Linguistics
Advertisements

Introduction to Computational Linguistics
Introduction to Statistical Machine Translation Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Search Engines and Information Retrieval
1 Natural Language Processing for the Web Prof. Kathleen McKeown 722 CEPSR, Office Hours: Wed, 1-2; Mon 3-4 TA: Fadi Biadsy 702 CEPSR,
Information Retrieval in Practice
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Natural Language Processing Prof: Jason Eisner Webpage: syllabus, announcements, slides, homeworks.
Natural Language Processing Ellen Back, LIS489, Spring 2015.
+ Natural Language Processing CS151 David Kauchak.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
SI485i : NLP Day 1 Intro to NLP. Assumptions about You You know… how to program Java basic UNIX usage basic probability and statistics (we’ll also review)
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
9/8/20151 Natural Language Processing Lecture Notes 1.
1 Ling 569: Introduction to Computational Linguistics Jason Eisner Johns Hopkins University Tu/Th 1:30-3:20 (also this Fri 1-5)
Search Engines and Information Retrieval Chapter 1.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
+. + Natural Language Processing CS159 – Fall 2014 David Kauchak.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Introduction GAM 376 Robin Burke Winter Outline Introductions Syllabus.
1 Computational Linguistics Ling 200 Spring 2006.
CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.
CHAPTER 13 NATURAL LANGUAGE PROCESSING. Machine Translation.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
NACLO 2008 North American Computation Linguistics Olympiad Brandeis CL Olympiad Team James Pustejovsky Tai Sassen-Liang Sharone Horowit-Hendler Noam Sienna.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
CS 188: Artificial Intelligence Spring 2009 Natural Language Processing Dan Klein – UC Berkeley 1.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Introduction to CSCI 1311 Dr. Mark C. Lewis
Information Retrieval in Practice
Information Retrieval in Practice
/665 Natural Language Processing
Infinitive or Gerund Solving the puzzle
ITCS 6157/8157: Visual Database
Done Done Course Overview What is AI? What are the Major Challenges?
Natural Language Processing
Introduction CSE 1310 – Introduction to Computers and Programming
Introduction to Machine Learning and NLP
Web IR: Recent Trends; Future of Web Search
Machine Learning in Natural Language Processing
WIRED Week 2 Syllabus Update Readings Overview.
Internet and Community Resources
Comp. II.
CSE 635 Multimedia Information Retrieval
Programming.
Natural Language Processing
Introduction to Information Retrieval
Multimedia Systems Reference Text
The. the of and a to in is you that with.
CS246: Information Retrieval
Medium-Fi Prototype Rachel J and Esther G

Lecture 21: Machine Learning Overview AP Computer Science Principles
David Kauchak CS159 – Spring 2019
Tokenizing Search/regex Statistics
CS Problem Solving and Object Oriented Programming Spring 2019
Lecture 9: Machine Learning Overview AP Computer Science Principles
Presentation transcript:

Natural Language Processing CS159 – Fall 2019 David Kauchak

Adding the course?

Administrivia Read the “administrivia” handout! http://www.cs.pomona.edu/classes/cs159/ Office hours, schedule, assigned readings, assignments Everything will be posted there Read the “administrivia” handout! ~7 assignments (in a variety of languages) 4 quizzes final project for the last 3-4 weeks teams of 2-3 people class participation Readings Academic Honesty Collaboration

Administrivia Assignment 0 posted already Assignment 1 posted soon Shouldn’t take too long Due Friday by 5pm Assignment 1 posted soon Won’t cover all material until next Monday Due Monday 2/4

Videos before class

What to expect… This course will be challenging for many of you assignments will be non-trivial content can be challenging But it is a fun field! We’ll cover basic linguistics probability the common problems many techniques and algorithms common machine learning techniques some recent advances in neural networks for language processing NLP applications

Requirements and goals Competent programmer Some assignments in Java, but I will allow/encourage other languages after the first few assignments Comfortable with mathematical thinking We’ll use a fair amount of probability, which I will review Other basic concepts, like logs, summation, etc. Data structures trees, hashtables, etc. Goals Learn the problems and techniques of NLP Build real NLP tools Understand what the current research problems are in the field

What is NLP? Natural language processing (NLP) is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages. - Wikipedia

What is NLP? The goal of this new field is to get computers to perform useful tasks involving human language… - The book

Key: Natural text “A growing number of businesses are making Facebook an indispensible part of hanging out their shingles. Small businesses are using …” Natural text is written by people, generally for people Why do we even care about natural text in computer science?

Why do we need computers for dealing with natural text? besides being a complex problem, there’s a lot of data! (which is a good thing and a bad thing) some people may say… outsource it (India, the Philippines, etc.). And there are frameworks, like mechanical Turk world population: ~7M let’s say you had a simple task to perform on each web page that only took 10 seconds to do. If every single person in the world worked on this non-stop, they would have to work for 20 days straight. what if it’s a task like translating the page or finding information that requires specialization? https://searchengineland.com/googles-search-indexes-hits-130-trillion-pages-documents-263378

Web is just the start… Blogs: e-mail corporate databases ~500 million tweets a day ~200-300 billion e-mails a day Blogs: ~200 million different blogs - web pages are just the beginning… corporate databases Facebook

Why is NLP hard? Iraqi Head Seeks Arms Juvenile Court to Try Shooting Defendant Stolen Painting Found by Tree Kids Make Nutritious Snacks Local HS Dropouts Cut in Half Obesity Study Looks for Larger Test Group British Left Waffles on Falkland Islands Red Tape Holds Up New Bridges Hospitals Are Sued by 7 Foot Doctors Iraqi Head (

Why is NLP hard? User: Where is Escape Room playing in the Claremont Area? System: Escape Room is playing at the Edwards in La Verne. User: When is it playing there? System: It’s playing at 2pm, 5pm and 8pm User: I’d like 1 adult and 2 children for the first show. How much would that cost?

Why is NLP hard? Natural language: is highly ambiguous at many different levels is complex and contains subtle use of context to convey meaning is probabilistic? involves reasoning about the world is highly social is a key part in how people interact However, some NLP problems can be surprisingly easy

Different levels of NLP pragmatics/discourse: how does the context affect the interpretation? semantics: what does it mean? syntax: phrases, how do words interact words: morphology, classes of words

NLP problems and applications What are some places where you have seen NLP used? What are NLP problems?

NLP problems and applications Lots of problems of varying difficulty Easier Word segmentation: where are the words? I would’ve liked Dr. Dave to finish early. But he didn’t.

NLP problems and applications Lots of problems of varying difficulty Easier Word segmentation: where are the words?

NLP problems and applications Lots of problems of varying difficulty Easier Speech segmentation Sentence splitting (aka sentence breaking, sentence boundary disambiguation) Language identification I would’ve liked Dr. Dave to finish early. But he didn’t. Soy un maestro con queso.

NLP problems and applications Easier continued truecasing spell checking OCR i would’ve liked dr. dave to finish early. but he didn’t. Identifying mispellings is challenging especially in the dessert.

NLP problems and applications Moderately difficult morphological analysis/stemming speech recognition text classification smarter smartly smartest smart smart   SPAM sentiment analysis

NLP problems and applications Moderately difficult continued text segmentation: break up text by topics part of speech tagging (and inducing word classes) parsing S VP NP NP PP PRP V N IN N I eat sushi with tuna

NLP problems and applications Moderately difficult continued word sense disambiguation grammar correction speech synthesis As he walked along the side of the stream, he spotted some money by the bank. The money had gotten muddy from being so close to the water.

NLP problems and applications Hard (many of these contain many smaller problems) Machine translation The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport . 美国关岛国际机场及其办公室均接获一名自称沙地阿拉伯富商拉登等发出的电子邮件,威胁将会向机场等公众地方发动生化袭击後,关岛经保持高度戒备。

NLP problems and applications Information extraction IBM hired Fred Smith as president.

NLP problems and applications Summarization

NLP problems and applications Natural language understanding Text => semantic representation (e.g. logic, probabilistic relationships) Information retrieval and question answering "How many programmers in the child care department make over $50,000?” “Who was the fourteenth president?” “How did he die?”

NLP problems and applications Text simplification Alfonso Perez Munoz, usually referred to as Alfonso, is a former Spanish footballer, in the striker position. Alfonso Perez is a former Spanish football player.

Where are we now? Many of the “easy” and “medium” problems have reasonable solutions spell checkers sentence splitters word segmenters/tokenizers

Where are we now? Parsing Stanford Parser (http://nlp.stanford.edu:8080/parser/)

Where are we now? Machine translation How is it?

Where are we now? Machine translation http://translate.google.com Getting better every year enough to get the jist of most content, but still no where near a human translation better for some types of text http://translate.google.com Many commercial versions… systran language weaver

Where are we now? Information extraction Structured documents (very good!) www.dealtime.com www.google.com/shopping AKT technologies Lots of these FlipDog WhizBang! Labs … work fairly well

Where are we now? CMU’s NELL (Never Ending Language Learner) http://rtw.ml.cmu.edu/rtw/

Where are we now? Why do people do this?

Where are we now? Information retrieval/query answering How are search engines? What are/aren’t the good at? How do they work?

Where are we now? Information retrieval/query answering search engines: pretty good for some things does mostly pattern matching and ranking no deep understanding still requires user to “find” the answer

Where are we now? Question answering wolfram alpha

Where are we now? Question answering: wolfram alpha

Question answering

Where are we now? Question answering Many others systems TREC question answering competition language computer corp answerbus …

Where are we now? Summarization NewsBlaster (Columbia)http://newsblaster.cs.columbia.edu/

Where are we now? Voice recognition Speech generation pretty good, particularly with speaker training Apple OS/Siri Android/Google Alexa, Google Assistant, etc… IBM ViaVoice Dragon Naturally Speaking Speech generation The systems can generate the words, but getting the subtle nuances right is still tricky Apple OS http://translate.google.com

Other problems Many problems untackled/undiscovered “That’s What She Said: Double Entendre Identification” ACL 2011 http://www.aclweb.org/anthology/P11-2016.pdf