Natural Language Processing

Slides:



Advertisements
Similar presentations
Natural Language Processing (or NLP) Reading: Chapter 1 from Jurafsky and Martin, Speech and Language Processing: An Introduction to Natural Language Processing,
Advertisements

Introduction to Computational Linguistics
Introduction to Computational Linguistics
Language Processing Technology Machines and other artefacts that use language.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Oct 2009HLT1 Human Language Technology Overview. Oct 2009HLT2 Acknowledgement Material for some of these slides taken from J Nivre, University of Gotheborg,
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Introduction to Computational Linguistics Lecture 2.
By Rohana Mahmud (NLP week 1-2)
CSE (c) S. Tanimoto, 2008 Natural Language Understanding 1 Natural Language Understanding Outline: Motivation Structural vs Statistical Approaches.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
What is Natural Language Processing (NLP)
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Lecture 2, 7/22/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 2 22 July 2005.
9/8/20151 Natural Language Processing Lecture Notes 1.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Computational Linguistics Ling 200 Spring 2006.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Natural Language Processing Rogelio Dávila Pérez Profesor – Investigador
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Introduction to CL & NLP CMSC April 1, 2003.
Natural Language Processing Daniele Quercia Fall, 2000.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
CSE573 Autumn /20/98 Planning/Language Administrative –PS3 due 2/23 –Midterms back today –Next topic: Natural Language Processing reading Chapter.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
CSE573 Autumn /23/98 Natural Language Processing Administrative –PS3 due today –PS4 out Wednesday, due Friday 3/13 (last day of class) special.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
Natural Language Processing Chapter 1 : Introduction.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
COSC 6336 Natural Language Processing
Natural Language Processing [05 hours/week, 09 Credits] [Theory]
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Basic Parsing with Context Free Grammars Chapter 13
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Natural Language Processing (NLP)
Natural Language Processing
Machine Learning in Natural Language Processing
Natural Language - General
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Natural Language Processing
CS246: Information Retrieval
Natural Language Processing (NLP)
David Kauchak CS159 – Spring 2019
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Natural Language Processing (NLP)
Presentation transcript:

Natural Language Processing Lecture 1 Sudeshna Sarkar 26 July 2007 11/19/2018

Notes adapted from Martin’s NLP slides 11/19/2018

Text Books Daniel Jurafsky, and James H. Martin, "Speech and Language Processing", Prentice Hall, 2000. Other References James Allen, "Natural Language Understanding", Second edition, Pearson Christopher D. Manning, and Hinrich Schutze, "Foundations of Statistical Natural Language Processing", The MIT Press, 1999. 11/19/2018

Final Project This will be a research-oriented project. The goal is to have a paper suitable for a conference submission. These will preferably be done in groups. 11/19/2018

Natural Language Processing What is it? We’re going to study what goes into getting computers to perform useful and interesting tasks involving human languages. We will be secondarily concerned with the insights that such computational work gives us into human processing of language. 11/19/2018

Why Should You Care? Two trends An enormous amount of knowledge is now available in machine readable form as natural language text Conversational agents are becoming an important form of human-computer communication 11/19/2018

Major Topics Applications Words Syntax Meaning Dialog and Discourse 11/19/2018

Applications First, what makes an application a language processing application (as opposed to any other piece of software)? An application that requires the use of knowledge about human languages Example: Is Unix wc (word count) a language processing application? 11/19/2018

Applications Word count? When it counts words: Yes To count words you need to know what a word is. That’s knowledge of language. When it counts lines and bytes: No Lines and bytes are computer artifacts, not linguistic entities 11/19/2018

Big Applications Question answering Conversational agents Summarization Machine translation 11/19/2018

Big Applications These kinds of applications require a tremendous amount of knowledge of language. Consider the following interaction with HAL the computer from 2001: A Space Odyssey 11/19/2018

HAL Dave: Open the pod bay doors, Hal. HAL: I’m sorry Dave, I’m afraid I can’t do that. 11/19/2018

What’s needed? Speech recognition and synthesis Knowledge of the English words involved What they mean How they combine (bay, vs. pod bay) How groups of words clump What the clumps mean 11/19/2018

What’s needed? Dialog It is polite to respond, even if you’re planning to kill someone. It is polite to pretend to want to be cooperative (I’m afraid, I can’t…) 11/19/2018

Real Example What is the Fed’s current position on interest rates? What or who is the “Fed”? What does it mean for it to to have a position? How does “current” modify that? 11/19/2018

Caveat NLP has an AI aspect to it. We’re often dealing with ill-defined problems We don’t often come up with perfect solutions/algorithms We can’t let either of those facts get in our way 11/19/2018

Preparation Familiarity with linguistics, psychology, and philosophy Basic algorithm and data structure analysis Ability to program Some exposure to logic Exposure to basic concepts in probability Familiarity with linguistics, psychology, and philosophy Ability to write well in English 11/19/2018

Topics: Linguistics Word-level processing Syntactic processing Lexical and compositional semantics Discourse and dialog processing 11/19/2018

Topics: Techniques Finite-state methods Context-free methods Augmented grammars Unification Logic Probabilistic versions Supervised machine learning 11/19/2018

Topics: Applications Spelling correction Word-sense disambiguation Small Spelling correction Medium Word-sense disambiguation Named entity recognition Information retrieval Large Question answering Conversational agents Machine translation 11/19/2018

Commercial World Lot’s of exciting stuff going on… Some samples… Machine translation Question answering Buzz analysis 11/19/2018

Google/Arabic 11/19/2018

Google/Arabic Translation 11/19/2018

Web Q/A 11/19/2018

Summarization Current web-based Q/A is limited to returning simple fact-like (factoid) answers (names, dates, places, etc). Multi-document summarization can be used to address more complex kinds of questions. Circa 2002: What’s going on with the Hubble? 11/19/2018

NewsBlaster Example The U.S. orbiter Columbia has touched down at the Kennedy Space Center after an 11-day mission to upgrade the Hubble observatory. The astronauts on Columbia gave the space telescope new solar wings, a better central power unit and the most advanced optical camera. The astronauts added an experimental refrigeration system that will revive a disabled infrared camera. ''Unbelievable that we got everything we set out to do accomplished,'' shuttle commander Scott Altman said. Hubble is scheduled for one more servicing mission in 2004. 11/19/2018

Weblog Analytics Textmining weblogs, discussion forums, user groups, and other forms of user generated media. Product marketing information Political opinion tracking Social network analysis Buzz analysis (what’s hot, what topics are people talking about right now). 11/19/2018

Web Analytics 11/19/2018

Umbria 11/19/2018

Forms of Natural Language The input/output of a NLP system can be: written text: newspaper articles, letters, manuals, prose, … Speech: read speech (radio, TV, dictations), conversational speech, commands, … To process written text, we need: lexical, syntactic, Semantic knowledge about the language discourse information, real world knowledge To process spoken language, we need additionally speech recognition speech synthesis 11/19/2018

Components of NLP Which is harder? Natural Language Understanding Mapping the given input in the natural language into a useful representation. Different level of analysis required: morphological analysis, syntactic analysis, semantic analysis, discourse analysis, … Natural Language Generation Producing output in the natural language from some internal representation. Different level of synthesis required: deep planning (what to say), syntactic generation Which is harder? 11/19/2018

Natural language understanding Uncovering the mappings between the linear sequence of words (or phonemes) and the meaning that it encodes. Representing this meaning in a useful (usually symbolic) representation. By definition - heavily dependent on the target task Words and structures mean different things in different contexts The required target representation is different for different tasks. Why is NLU hard? The mapping between words, their linguistic structure and the meaning that they encode is extremely complex and difficult to model and decompose. Natural language is very ambiguous The goal of understanding is itself task dependent and very complex. 11/19/2018

Why NL Understanding is hard? Natural language is extremely rich in form and structure, and very ambiguous. How to represent meaning, Which structures map to which meaning structures. Ambiguity: ne input can mean many different things Lexical (word level) ambiguity -- different meanings of words Syntactic ambiguity -- different ways to parse the sentence Interpreting partial information -- how to interpret pronouns Contextual information -- context of the sentence may affect the meaning of that sentence. Many input can mean the same thing. Interaction among components of the input. Noisy input (e.g. speech) 11/19/2018

Knowledge of Language Phonology – concerns how words are related to the sounds that realize them. Morphology – concerns how words are constructed from more basic meaning units called morphemes. A morpheme is the primitive unit of meaning in a language. Syntax – concerns how can be put together to form correct sentences and determines what structural role each word plays in the sentence and what phrases are subparts of other phrases. Semantics – concerns what words mean and how these meaning combine in sentences to form sentence meaning. The study of context-independent meaning. 11/19/2018

Knowledge of Language Pragmatics – concerns how sentences are used in different situations and how use affects the interpretation of the sentence. Discourse – concerns how the immediately preceding sentences affect the interpretation of the next sentence.For example, interpreting pronouns and interpreting the temporal aspects of the information. World Knowledge – includes general knowledge about the world. What each language user must know about the other’s beliefs and goals. 11/19/2018

Ambiguity At last, a computer that understands you like your mother. -- 1985 McDonnell-Douglas Ad Different interpretations: The computer understands you as well as your mother understands you. The computer understands that you like your mother. The computer understands you as well as it understands your mother. Speech : ….. a computer that understands your lie cured mother … 11/19/2018

Why is NLP difficult? Because Natural Language is highly ambiguous. Syntactic ambiguity The president spoke to the nation about the problem of drug use in the schools from one coast to the other. has 720 parses. Ex: “to the other” can attach to any of the previous NPs (ex. “the problem”), or the head verb  6 places “from one coast” has 5 places to attach … 11/19/2018

Why is NLP difficult? Word category ambiguity Word sense ambiguity book --> verb? or noun? Word sense ambiguity bank --> financial institution? building? or river side? Words can mean more than their sum of parts make up a story Fictitious worlds People on mars can fly. Defining scope People like ice-cream. Does this mean that all (or some?) people like ice cream? Language is changing and evolving I’ll email you my answer. This new S.U.V. has a compartment for your mobile phone. Googling, … 11/19/2018

Resolve Ambiguities We will introduce models and algorithms to resolve ambiguities at different levels. part-of-speech tagging -- Deciding whether duck is verb or noun. word-sense disambiguation -- Deciding whether make is create or cook. lexical disambiguation -- Resolution of part-of-speech and word-sense ambiguities are two important kinds of lexical disambiguation. syntactic ambiguity -- her duck is an example of syntactic ambiguity, and can be addressed by probabilistic parsing. 11/19/2018

Resolve Ambiguities (cont.) I made her duck S S NP VP NP VP I V NP NP I V NP made her duck made DET N her duck 11/19/2018

Dealing with Ambiguity Three approaches: Tightly coupled interaction among processing levels; knowledge from other levels can help decide among choices at ambiguous levels. Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures. Syntax proposes/semantics disposes approach Probabilistic approaches based on making the most likely choices 11/19/2018

Models to Represent Linguistic Knowledge Different formalisms (models) are used to represent the required linguistic knowledge. State Machines -- FSAs, HMMs, ATNs, RTNs Formal Rule Systems -- Context Free Grammars, Unification Grammars, Probabilistic CFGs. Logic-based Formalisms -- first order predicate logic, some higher order logic. Models of Uncertainty -- Bayesian probability theory. 11/19/2018

Algorithms to Manipulate Linguistic Knowledge We will use algorithms to manipulate the models of linguistic knowledge to produce the desired behavior. Most of the algorithms we will study are transducers and parsers. These algorithms construct some structure based on their input. Since the language is ambiguous at all levels, these algorithms are never simple processes. Categories of most algorithms that will be used can fall into following categories. state space search dynamic programming 11/19/2018

Language and Intelligence Turing Test Computer Human Human Judge Human Judge asks tele-typed questions to Computer and Human. Computer’s job is to act like a human. Human’s job is to convince Judge that he is not machine. Computer is judged “intelligent” if it can fool the judge Judgment of intelligence is linked to appropriate answers to questions from the system. 11/19/2018

NLP - an inter-disciplinary Field NLP borrows techniques and insights from several disciplines. Linguistics: How do words form phrases and sentences? What constraints the possible meaning for a sentence? Computational Linguistics: How is the structure of sentences are identified? How can knowledge and reasoning be modeled? Computer Science: Algorithms for automatons, parsers. Engineering: Stochastic techniques for ambiguity resolution. Psychology: What linguistic constructions are easy or difficult for people to learn to use? Philosophy: What is the meaning, and how do words and sentences acquire it? 11/19/2018

Some Buzz-Words Other Areas: NLP – Natural Language Processing CL – Computational Linguistics SP – Speech Processing HLT – Human Language Technology NLE – Natural Language Engineering SNLP – Statistical Natural Language Processing Other Areas: Speech Generation, Text Generation, Speech Understanding, Information Retrieval, Dialogue Processing, Inference, Spelling Correction, Grammar Correction, Text Summarization, Text Categorization, 11/19/2018

Some NLP Applications Machine Translation – Translation between two natural languages. Babel Fish translations system, Systran Information Retrieval – Web search (uni-lingual or multi-lingual). Query Answering/Dialogue – Natural language interface with a database system, or a dialogue system. Report Generation – Generation of reports such as weather reports. Other Applications – Grammar Checking, Spell Checking, Spell Corrector 11/19/2018

The Big Picture Source Language Speech Signal Target Language Speech Signal Speech recognition Speech Synthesis Source text Analysis Target text Generation 11/19/2018

The Reductionist Approach Source Language Analysis Target Language Generation Text Normalization Text Rendering Morphological Analysis Morphological Synthesis POS Tagging Phrase Generation Parsing Role Ordering Semantic Analysis Lexical Choice Discourse Analysis Discourse Planning 11/19/2018

Natural Language Understanding Words Morphological Analysis Morphologically analyzed words (another step: POS tagging) Syntactic Analysis Syntactic Structure Semantic Analysis Context-independent meaning representation Discourse Processing Final meaning representation 11/19/2018

Natural Language Generation Meaning representation Utterance Planning Meaning representations for sentences Sentence Planning and Lexical Choice Syntactic structures of sentences with lexical choices Sentence Generation Morphologically analyzed words Morphological Generation Words 11/19/2018

Natural Language Generation NLG is the process of constructing natural language outputs from non-linguistic inputs. the reverse process of NL understanding. A NLG system may have two main parts: Discourse Planning -- what will be generated, Surface Realization -- realizes a sentence from its internal representation. Lexical Choice selecting the correct words describing the concepts. 11/19/2018

Machine Translation Machine Translation -- converting a text in language A into the corresponding text in language B (or speech). Different Machine Translation architectures: interlingua based systems transfer based systems How to acquire the required knowledge resources such as mapping rules and bi-lingual dictionary? By hand or acquire them automatically from corpora. Example Based Machine Translation acquires the required knowledge (some of it or all of it) from corpora. 11/19/2018

Some statistics (old) Business e-mail sent per day in the US: 2.1Billion First class mail per year: 107 Billion Text on Internet (2/99): > 6TB Current: ? indexed: 16% (Lawrence and Giles, Nature 400, 1999) Dialog (www.dialog.com): 9 TB Average college library: 1 TB 11/19/2018

Languages Languages: 39,000 languages and dialects (22,000 dialects in India alone) Top languages: Chinese/Mandarin (885M), Spanish (332M), English (322M), Bengali (189M), Hindi (182M), Portuguese (170M), Russian (170M), Japanese (125M) Source: www.sil.org/ethnologue, www.nytimes.com Internet: English (128M), Japanese (19.7M), German (14M), Spanish (9.4M), French (9.3M), Chinese (7.0M) Usage: English (1999-54%, 2001-51%, 2003-46%, 2005-43%) Source: www.computereconomics.com 11/19/2018