Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 2, 7/22/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 2 22 July 2005.

Similar presentations


Presentation on theme: "Lecture 2, 7/22/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 2 22 July 2005."— Presentation transcript:

1 Lecture 2, 7/22/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 2 22 July 2005

2 Lecture 2, 7/22/2005Natural Language Processing2 Today’s slides adapted from Ilyas Cicekli’s slide http://www.cs.ucf.edu/~ilyas/Courses/CAP6640 Martin & Jurafsky’s book

3 Lecture 2, 7/22/2005Natural Language Processing3 Text Books Daniel Jurafsky, and James H. Martin, "Speech and Language Processing", Prentice Hall, 2000. Other References  James Allen, "Natural Language Understanding", Second edition, The Benjamin/Cumings Publishing Company Inc., 1995  Christopher D. Manning, and Hinrich Schutze, "Foundations of Statistical Natural Language Processing", The MIT Press, 1999.

4 Lecture 2, 7/22/2005Natural Language Processing4 Why Should You Care? Trends 1.An enormous amount of knowledge is now available in machine readable form as natural language text 2.Human-computer communication is becoming increasingly in vogue – dialog based systems. Question answering, conversational agents, Machine translation Information Retrieval, Summarization, Fusion,

5 Lecture 2, 7/22/2005Natural Language Processing5 Ambiguity I made her duck. How many different interpretations does this sentence have? What are the reasons for the ambiguity? The categories of knowledge of language can be thought of as ambiguity resolving components. How can each ambiguous piece be resolved? Does speech input make the sentence even more ambiguous?  Yes – deciding word boundaries

6 Lecture 2, 7/22/2005Natural Language Processing6 Ambiguity Some interpretations of : I made her duck. 1.I cooked duck for her. 2.I cooked duck belonging to her. 3.I created a toy duck which she owns. 4.I caused her to quickly lower her head or body. 5.I used magic and turned her into a duck. duck – morphologically and syntactically ambiguous: noun or verb. her – syntactically ambiguous: dative or possessive. make – semantically ambiguous: cook or create. make – syntactically ambiguous:  Transitive – takes a direct object. => 2  Di-transitive – takes two objects. => 5  Takes a direct object and a verb. => 4

7 Lecture 2, 7/22/2005Natural Language Processing7 Why is NLP difficult? Because Natural Language is highly ambiguous.  Syntactic ambiguity  The president spoke to the nation about the problem of drug use in the schools from one coast to the other.  has 720 parses.  Ex: “to the other” can attach to any of the previous NPs (ex. “the problem”), or the head verb  6 places “from one coast” has 5 places to attach …

8 Lecture 2, 7/22/2005Natural Language Processing8 Ambiguity in a Bengali/Hindi Sentence Give examples Some interpretations of: Morphological Ambiguity: Semantic Ambiguity:

9 Lecture 2, 7/22/2005Natural Language Processing9 Why is NLP difficult?  Word category ambiguity  book --> verb? or noun?  Word sense ambiguity  bank --> financial institution? building? or river side?  Words can mean more than their sum of parts  make up a story  Fictitious worlds  People on mars can fly.  Defining scope  People like ice-cream.  Does this mean that all (or some?) people like ice cream?  Language is changing and evolving  I’ll email you my answer.  This new S.U.V. has a compartment for your mobile phone.  Googling, …

10 Lecture 2, 7/22/2005Natural Language Processing10 Resolve Ambiguities We will introduce models and algorithms to resolve ambiguities at different levels. part-of-speech tagging -- Deciding whether duck is verb or noun. word-sense disambiguation -- Deciding whether make is create or cook. lexical disambiguation -- Resolution of part-of-speech and word-sense ambiguities are two important kinds of lexical disambiguation. syntactic ambiguity -- her duck is an example of syntactic ambiguity, and can be addressed by probabilistic parsing.

11 Lecture 2, 7/22/2005Natural Language Processing11 Resolve Ambiguities (cont.) I made her duck S NPVP NP VP IVNPNP I V NP madeher duck made DET N herduck

12 Lecture 2, 7/22/2005Natural Language Processing12 Dealing with Ambiguity Three approaches:  Tightly coupled interaction among processing levels; knowledge from other levels can help decide among choices at ambiguous levels.  Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures.  Syntax proposes/semantics disposes approach  Probabilistic approaches based on making the most likely choices

13 Lecture 2, 7/22/2005Natural Language Processing13 Models to Represent Linguistic Knowledge Different formalisms (models) are used to represent the required linguistic knowledge. State Machines -- FSAs, HMMs, ATNs, RTNs Formal Rule Systems -- Context Free Grammars, Unification Grammars, Probabilistic CFGs. Logic-based Formalisms -- first order predicate logic, some higher order logic. Models of Uncertainty -- Bayesian probability theory.

14 Lecture 2, 7/22/2005Natural Language Processing14 Algorithms to Manipulate Linguistic Knowledge We will use algorithms to manipulate the models of linguistic knowledge to produce the desired behavior. Most of the algorithms we will study are transducers and parsers.  These algorithms construct some structure based on their input. Since the language is ambiguous at all levels, these algorithms are never simple processes. Categories of most algorithms that will be used can fall into following categories.  state space search  dynamic programming

15 Lecture 2, 7/22/2005Natural Language Processing15 Language and Intelligence Turing Test Computer Human Human Judge Human Judge asks tele-typed questions to Computer and Human. Computer’s job is to act like a human. Human’s job is to convince Judge that he is not machine. Computer is judged “intelligent” if it can fool the judge Judgment of intelligence is linked to appropriate answers to questions from the system.

16 Lecture 2, 7/22/2005Natural Language Processing16 NLP - an inter-disciplinary Field NLP borrows techniques and insights from several disciplines. Linguistics: How do words form phrases and sentences? What constraints the possible meaning for a sentence? Computational Linguistics: How is the structure of sentences are identified? How can knowledge and reasoning be modeled? Computer Science: Algorithms for automatons, parsers. Engineering: Stochastic techniques for ambiguity resolution. Psychology: What linguistic constructions are easy or difficult for people to learn to use? Philosophy: What is the meaning, and how do words and sentences acquire it?

17 Lecture 2, 7/22/2005Natural Language Processing17 Some Buzz-Words NLP – Natural Language Processing CL – Computational Linguistics SP – Speech Processing HLT – Human Language Technology NLE – Natural Language Engineering SNLP – Statistical Natural Language Processing Other Areas:  Speech Generation, Text Generation, Speech Understanding, Information Retrieval,  Dialogue Processing, Inference, Spelling Correction, Grammar Correction,  Text Summarization, Text Categorization,

18 Lecture 2, 7/22/2005Natural Language Processing18 Some NLP Applications Machine Translation – Translation between two natural languages.  Babel Fish translations system, Systran Information Retrieval – Web search (uni-lingual or multi-lingual). Query Answering/Dialogue – Natural language interface with a database system, or a dialogue system. Report Generation – Generation of reports such as weather reports. Other Applications –  Grammar Checking, Spell Checking, Spell Corrector

19 Lecture 2, 7/22/2005Natural Language Processing19 Brief History of NLP 1940s –1950s: Foundations  Development of formal language theory (Chomsky, Backus, Naur, Kleene)  Probabilities and information theory (Shannon) 1957 – 1970s:  Use of formal grammars as basis for natural language processing (Chomsky, Kaplan)  Use of logic and logic based programming (Minsky, Winograd, Colmerauer, Kay) 1970s – 1983:  Probabilistic methods for early speech recognition (Jelinek, Mercer)  Discourse modeling (Grosz, Sidner, Hobbs) 1983 – 1993:  Finite state models (morphology) (Kaplan, Kay) 1993 – present:  Strong integration of different techniques, different areas.

20 Lecture 2, 7/22/2005Natural Language Processing20 The Big Picture Speech recognitionSpeech Synthesis Source text Analysis Target text Generation Source Language Speech Signal Target Language Speech Signal

21 Lecture 2, 7/22/2005Natural Language Processing21 The Reductionist Approach Text Normalization Morphological Analysis POS Tagging Parsing Semantic Analysis Discourse Analysis Text Rendering Morphological Synthesis Phrase Generation Role Ordering Lexical Choice Discourse Planning Source Language AnalysisTarget Language Generation

22 Lecture 2, 7/22/2005Natural Language Processing22 Natural Language Understanding Words Morphological Analysis Morphologically analyzed words (another step: POS tagging) Syntactic Analysis Syntactic Structure Semantic Analysis Context-independent meaning representation Discourse Processing Final meaning representation

23 Lecture 2, 7/22/2005Natural Language Processing23 Natural Language Generation Meaning representation Utterance Planning Meaning representations for sentences Sentence Planning and Lexical Choice Syntactic structures of sentences with lexical choices Sentence Generation Morphologically analyzed words Morphological Generation Words

24 Lecture 2, 7/22/2005Natural Language Processing24 Natural Language Generation NLG is the process of constructing natural language outputs from non-linguistic inputs.  the reverse process of NL understanding. A NLG system may have two main parts:  Discourse Planning -- what will be generated,  Surface Realization -- realizes a sentence from its internal representation. Lexical Choice  selecting the correct words describing the concepts.

25 Lecture 2, 7/22/2005Natural Language Processing25 Machine Translation Machine Translation -- converting a text in language A into the corresponding text in language B (or speech). Different Machine Translation architectures:  interlingua based systems  transfer based systems How to acquire the required knowledge resources such as mapping rules and bi-lingual dictionary? By hand or acquire them automatically from corpora. Example Based Machine Translation acquires the required knowledge (some of it or all of it) from corpora.

26 Lecture 2, 7/22/2005Natural Language Processing26 Some statistics (old) Business e-mail sent per day in the US: 2.1Billion First class mail per year: 107 Billion Text on Internet  (2/99): > 6TB  Current: ? indexed: 16% (Lawrence and Giles, Nature 400, 1999) Dialog (www.dialog.com): 9 TB Average college library: 1 TB

27 Lecture 2, 7/22/2005Natural Language Processing27 Languages Languages: 39,000 languages and dialects (22,000 dialects in India alone) Top languages:  Chinese/Mandarin (885M),  Spanish (332M),  English (322M),  Bengali (189M),  Hindi (182M),  Portuguese (170M), Russian (170M), Japanese (125M) Source: www.sil.org/ethnologue, www.nytimes.com Internet: English (128M), Japanese (19.7M), German (14M), Spanish (9.4M), French (9.3M), Chinese (7.0M) Usage: English (1999-54%, 2001-51%, 2003-46%, 2005-43%) Source: www.computereconomics.com


Download ppt "Lecture 2, 7/22/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 2 22 July 2005."

Similar presentations


Ads by Google