Www.decideo.fr/bruley Natural Language Processing June 2013 Michel Bruley.

Slides:



Advertisements
Similar presentations
Introduction to Computational Linguistics
Advertisements

Motivating readersMotivating readers Reading in schoolReading in school Reading at homeReading at home Comprehension skillsComprehension skills.
Reading Instruction (NOT Instructions!) Key Concepts for Teaching Reading at the Secondary Level.
Presentations Jeanne LeBron, AMEC Earth & Environmental Student Initiative Mentoring Program Workshop Jan
A occdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is that the frist.
ITEC 1010 Information and Organizations Artificial Intelligence.
BT101: Hermeneutics Introduction. A. Description of Hermeneutics 1. General Hermeneutics The study of the activity of interpretation;
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Natural Language Processing. According to research at an Elingsh uinervtisy, it deosn’t mttaer in what oredr the ltteers in a wrod are, the olny iprmoetnt.
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.
Dr. Orla Murphy School of English 27 May 2011
Information Retrieval in Practice
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Search Engines and Information Retrieval
1 HRT 383 Written Communication. 2 Thank You to… Noel Cullen, author of Life Beyond the Line Gary Yukl, author of Leadership in Organizations Carol Roberts,
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Information Retrieval in Practice
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
What is science? Science: is a process by which we gain knowledge deals only with the natural world collects & organizes information (data/evidence) gives.
Overview of Search Engines
Knowledge Process Outsourcing1 Turning Information into Knowledge... for YOU The Gyaan Team.
* What is reading? * Challenges for older readers and writers * What can I do to help? * What is available to support me? * Questions * Reading and writers.
Text Analytics And Text Mining Best of Text and Data
Welcome to US Regents History 11 th Room Ms. Waters.
Exposure and Attention
9/8/20151 Natural Language Processing Lecture Notes 1.
Search Engines and Information Retrieval Chapter 1.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Neural Networks AI – Week 21 Sub-symbolic AI One: Neural Networks Lee McCluskey, room 3/10
Survey of Semantic Annotation Platforms
 The nugger was flinp.  The nugger was flinp and wugnet.  The nugger was flinp, wugnet and manple in my waslet.  What was flinp?  How else does the.
1 Computational Linguistics Ling 200 Spring 2006.
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Flexible Text Mining using Interactive Information Extraction David Milward
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
Teaching reading.
ToK - Language How much could you know about the world if you had no language or means of communicating with other people?
Technical Reading Presented by Beatrice Moore Luchin NUMBERS Mathematics Professional Development NUMBERSmpd.com.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
ITGS Databases.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Discover the Possibilities: Leadership Coaching 2004 Parks and Recreation Conference.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Sensation & Perception
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
1.
March, 2007RCO LLC, RCO Text Analysis Technologies for information extraction and business intelligence We can tell you everything about.
Natural Language Processing (NLP)
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
NATURAL LANGUAGE PROCESSING
Natural Language Processing Tasneem Ghnaimat Spring 2013.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Inspiring Youth to Live their Dreams! Scott Shickler Founder & CEO.
Language (Verbal Communication)
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Writing Analytics Clayton Clemens Vive Kumar.
CSE 635 Multimedia Information Retrieval
Natural Language Processing
Introduction to Information Retrieval
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Presentation transcript:

Natural Language Processing June 2013 Michel Bruley

Natural Language Processing (NLP) n NLP is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language n NLP is considered as a sub-field of artificial intelligence and has significant overlap with the field of computational linguistics. It is concerned with the interactions between computers and human (natural) languages. –Natural language generation systems convert information from computer databases into readable human language –Natural language understanding systems convert human language into representations that are easier for computer programs to manipulate. n NLP encompasses both text and speech, but work on speech processing has evolved into a separate field

Where does it fit in the CS* taxonomy? Computers Artificial Intelligence AlgorithmsDatabasesNetworking Robotics Search Natural Language Processing Information Retrieval Machine Translation Language Analysis SemanticsParsing* CS = Computer Science

Why Natural Language Processing? Applications for processing large amounts of texts require NLP expertise n Classify text into categories, index and search large texts: Classify documents by topics, language, author, spam filtering, information retrieval (relevant, not relevant), sentiment classification (positive, negative) n Extracting data from text: converting unstructured text into structure data n Information extraction: discover names of people and events they participate in, from a document, … n Automatic summarization: Condense 1 book into 1 page, … n Speech processing, artificial voice: get flight information or book a hotel over the phone, … n Question answering: find answers to natural language questions in a text collection or database n Spelling & Grammar Corrections n Plagiarism detection n Automatic translation n Etc.

The problem n When people see text, they understand its meaning (by and large) According to research, it deosn’t mttaer in what oredr the ltteers in a wrod are, the olny iprmoetnt tihng is that the frist and lsat ltteer are in the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by islelf but the wrod as a wlohe. n When computers see text, they get only character strings (and perhaps HTML tags) n We'd like computer agents to see meanings and be able to intelligently process text n These desires have led to many proposals for structured, semantically marked up formats n But often human beings still resolutely make use of text in human languages n This problem isn’t likely to just go away

Example: Natural language understanding Raw speech signal Speech recognition Sequence of words spoken Syntactic analysis using knowledge of the grammar Structure of the sentence Semantic analysis using info. about meaning of words Partial representation of meaning of sentence Pragmatic analysis using info. about context Final representation of meaning of sentence Natural language understanding process – Prof. Carolina Ruiz

Example detail: Syntactic Analysis The big cat is drinking milk Noun PhraseVerb Phrase DeterminerAdjective Phrase NounAuxiliaryVerbNoun Phrase Thebigcatisdrinkingmilk Syntactic analysis involves isolating phrases and sentences into a hierarchical structure, allowing the study of its constituents. For example the sentence “the big cat is drinking milk” can be broken up into the following constituents:

Why NLP is difficult n Language is flexible –New words, new meanings –Different meanings in different contexts n Language is subtle –He arrived at the lecture –He chuckled at the lecture –He chuckled his way through the lecture –**He arrived his way through the lecture n Language is complex!

Why NLP is difficult n MANY hidden variables –Knowledge about the world –Knowledge about the context –Knowledge about human communication techniques Can you tell me the time? n Problem of scale –Many (infinite?) possible words, meanings, context n Problem of sparsity –Very difficult to do statistical analysis, most things (words, concepts) are never seen before n Long range correlations

Why NLP is difficult n Key problems: –Representation of meaning –Language presupposes knowledge about the world –Language only reflects the surface of meaning –Language presupposes communication between people

Patented Natural Language Processing (NLP) “Reads” Every Communication  Each data feed is parsed through one or more of the 7 NLP engines  …it is then deconstructed to provide context, subject, and other information regarding the customer (gender, name etc.)  Finally each identified customer is matched back to the Discovery platform data to gain a full view Natural language processing (NLP) is the study of the interactions between computers and natural languages (e.g., English, Polish). The crucial challenge that NLP addresses is in deriving meaning from human or natural language input and allowing consumers to analyze parsed meanings in large volumes.

For Example…. I bought an iPad2 for my mom last week. She loves the weight, but doesn’t like the color. She wishes it came in blue. She says if it came in blue, then she’d buy one for all her friends  Entities (brands, people, locations, times, products…)  Events and relationships (purchasing event, my mom…)  Sentiment (product specifications)  Suggestions (feature specifications)  Intent (to purchase, to leave)  Geo/Temporal QUESTION: Why is this a big deal? NLP takes a simple English statement, parses them into the categories above (and more categories) and VOILA…we got STRUCTURED DATA

Aster ASTER DISCOVERY PLATFORM “Now- structured” data Architecture Customers / Sales / Other data Churn Score SQL MR Churn Score SQL MR Attensity Pipeline Real-time annotated social media data feed: 150+ million social and online sources Other Unstructured Data s; Surveys; CRM Notes…. Pipeline Connector ASAS Wrapper SQL MR ASAS Wrapper SQL MR NLP ETL Visualization (e.g., Tableau, MSTR) Predictive

 This integration provides types, subtypes, super types (“Savings”, “Checking”, “Investment”)  Inclusion of the Anaphora: Connecting a subject (George Harrison) without repeating the full name (“He”, “Him”)  Includes other languages besides English  Attensity’s Semantic Annotation Server (ASAS) capabilities  Entity Extraction: Automatic detection and extraction of more than 35 entities such as Name, Place  Uses Attensity Triples to create context on entities and identify verbs, relationships, actions  Auto Classification: Uses custom classification rules to classify articles by content, sort by relevance, and discovers repeated information  Exhaustive Extraction: Application of linguistic principles to extract context, entities, and relationships similar to how the human mind would  Voice Tags: to identify types of statements and auto classify them (Question, Intent, Conditional)  Creates a unique identifier for each entity for cross reference Aster + Attensity = Competitive Advantage

Structuring Unstructured Data: Process Flow The flight was delayed and flight attendant would not give us any new information.

New Table: Customer Reactions Database Record from a Customer Survey date region 0006 rec? 4 source telephone Why would you recommend/not recommend? The flight was delayed and flight attendant would not give us any new information. Who/What flight Behavior delay Fact/Triple flight : delay Same Record with Relational Facts Extracted from Notes Field dateregionsourcerec?who-whatBehaviorFact/Triple telephone4flightdelayflight : delay telephone4informationgive [not] information : give [not] ihappy [not]i : happy [not] repruderep : rude flightcancelflight : cancel Original Structured Data Newly Structured Data Provided by Attensity How Triples are Extracted & Structured Extract Extract relational facts & Triples from Notes field Then Fuse Populate new table with attribute values and fuse with structured data.

Team Power