Introduction to NLP Chapter 1: Overview Heshaam Faili University of Tehran.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Statistical NLP: Lecture 3
BİL711 Natural Language Processing
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Advanced AI - Part II Luc De Raedt University of Freiburg WS 2004/2005 Many slides taken from Helmut Schmid.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Part of speech (POS) tagging
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Statistical Natural Language Processing Advanced AI - Part II Luc De Raedt University of Freiburg WS 2005/2006 Many slides taken from Helmut Schmid.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/2010 Overview of NLP tasks (text pre-processing)
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
Research methods in corpus linguistics Xiaofei Lu.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Information Extraction Junichi Tsujii Graduate School of Science University of Tokyo Japan Ronen Feldman Bar Ilan University Israel.
ELN – Natural Language Processing Giuseppe Attardi
Linguistics 362: Introduction to Natural Language Processing
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
9/8/20151 Natural Language Processing Lecture Notes 1.
Unit 1 Language and Learning Methodology Unit 1 Language and learning I.How do we learn language ? 1 ) How do we learn our own language ? 2 ) How do.
Introduction to Natural Language Processing Heshaam Faili University of Tehran.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
BTANT 129 w5 Introduction to corpus linguistics. BTANT 129 w5 Corpus The old school concept – A collection of texts especially if complete and self-contained:
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
Finite State Parsing & Information Extraction CMSC Intro to NLP January 10, 2006.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Introduction to CL & NLP CMSC April 1, 2003.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Word classes and part of speech tagging Chapter 5.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
Introduction to NLP Chapter 1: Overview Heshaam Faili University of Tehran.
Rules, Movement, Ambiguity
CSA2050 Introduction to Computational Linguistics Parsing I.
CPSC 503 Computational Linguistics
Introduction Chapter 1 Foundations of statistical natural language processing.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Statistical NLP: Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Machine Learning in Natural Language Processing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
Linguistic Essentials
David Kauchak CS159 – Spring 2019
Presentation transcript:

Introduction to NLP Chapter 1: Overview Heshaam Faili University of Tehran

2 General Themes Ambiguity of Language Language as a formal system Rule-based vs. Statistical Methods The need for efficiency

3 Ambiguity of language Phonetic [raIt] = write, right, rite Lexical can = noun, verb, modal Structural I saw the man with the telescope Semantic dish = physical plate, menu item → All of these make NLP difficult

4 Language as a formal system We can treat parts of language formally Language = a set of acceptable strings Define a model to recognize/generate language Works for different levels of language (phonology, morphology, etc.) Can use finite-state automata, context- free grammars, etc. to represent language

5 Rule-based & Statistical Methods Theoretical linguistics captures abstract properties of language NLP can more or less follow theoretical insights Rule-based: model system with linguistic rules Statistical: model system with probabilities of what normally happens Hybrid models combine the two

6 The need for efficiency Simply writing down linguistic insights isn’t sufficient to have a working system Programs need to run in real-time, i.e., be efficient There are thousands of grammar rules which might be applied to a sentence Use insights from computer science To find the best parse, use chart parsing, a form of dynamic programming

7 Preview of Topics 1. Finding Syntactic Patterns in Human Languages: Lg. as Formal System 2. Meaning from Patterns 3. Patterns from Language in the Large 4. Bridging the Rationalist-Empiricist Divide 5. Applications 6. Conclusion

8 The Problem of Syntactic Analysis Assume input sentence S in natural language L Assume you have rules (grammar G) that describe syntactic regularities (patterns or structures) found in sentences of L Given S & G, find syntactic structure of S Such a structure is called a parse tree

9 Example 1 S  NP VP VP  V NP VP  V S VP he NP V slept NP  I NP  he V  slept V  ate V  drinks Grammar Parse Tree

10 Parsing Example 1 S  NP VP VP  V NP VP  V NP  I NP  he V  slept V  ate V  drinks

11 More Complex Sentences I can fish. I saw the elephant in my pajamas. These sentences exhibit ambiguity Computers will have to find the acceptable or most likely meaning(s).

12 Example 2

13 Example 3 NP  D Nom Nom  Nom RelClause Nom  N RelClause  RelPro VP VP  V NP D  the D  my V  is V  hit N  dog N  boy N  brother RelPro  who

14 Topics 1. Finding Syntactic Patterns in Human Languages 2. Meaning from Patterns 3. Patterns from Language in the Large 4. Bridging the Rationalist-Empiricist Divide 5. Applications 6. Conclusion

15 Meaning from a Parse Tree I can fish. We want to understand Who does what? the canner is me, the action is canning, and the thing canned is fish. e.g. Canning(ME, FishStuff) This is a logic representation of meaning We can do this by associating meanings with lexical items in the tree then using rules to figure out what the S as a whole means

16 Meaning from a Parse Tree (Details) Let’s augment the grammar with feature constraints S  NP VP = VP  V NP = *2:[pred: Canning] [subj: *1 pred: *2 obj: *3] *1[sem: ME] *3[sem: Fish Stuff] [pred: *2 obj: *3]

17 Grammar Induction Start with a tree bank = collection of parsed sentences Extract grammar rules corresponding to parse trees, estimating the probability of the grammar rule based on its frequency P(A  β | A) = Count(A  β) / Count(A) You then have a probabilistic grammar, derived from a corpus of parse trees How does this grammar compare to grammars created by human intuition? How do you get the corpus?

18 Finite-State Analysis A finite-state machine for recognizing NPs:  initial=0; final ={2}  0->N->2  0->D->1  1->N->2  2->N->2 An equivalent regular expression for NP’s /D? N + / A regular expression for recognizing simple sentences /(Prep D? A* N + )* (D? N) (Prep D? A* N + )* (V_tns|Aux V_ing) (Prep D? A* N + )*/ We can also “cheat” a bit in our linguistic analysis

19 Topics 1. Finding Syntactic Patterns in Human Languages 2. Meaning from Patterns 3. Patterns from Language in the Large 4. Bridging the Rationalist-Empiricist Divide 5. Applications 6. Conclusion

20 Empirical Approaches to NLP Empiricism: knowledge is derived from experience Rationalism: knowledge is derived from reason NLP is, by necessity, focused on ‘performance’, in that naturally- occurring linguistic data has to be processed Have to process data characterized by false starts, hesitations, elliptical sentences, long and complex sentences, input in a complex format, etc. The methodology used is corpus-based linguistic analysis (phonological, morphological, syntactic, semantic, etc.) carried out on a fairly large scale rules are derived by humans or machines from looking at phenomena in situ (with statistics playing an important role)

21 Which Words are the Most Frequent? Common Words in Tom Sawyer (71,730 words), from Manning & Schutze p.21 Will these counts hold in a different corpus (and genre, cf. Tom)? What happens if you have 8-9M words?

22 Data Sparseness Many low-frequency words Fewer high-frequency words. Only a few words will have lots of examples. About 50% of word types occur only once Over 90% occur 10 times or less. Word FrequencyNumber of words of that frequency > Frequency of word types in Tom Sawyer, from M&S 22.

23 Zipf’s Law: Frequency is inversely proportional to rank turned you’ll name comes group lead friends begin family brushed sins could applausive18000 Empirical evaluation of Zipf’s Law on Tom Sawyer, from M&S 23.

24 Illustration of Zipf’s Law logarithmic scale (Brown Corpus, from M&S p. 30)

25 Empiricism: Part-of-Speech Tagging Word statistics are only so useful We want to be able to deduce linguistic properties of the text Part-of-speech (POS) Tagging = assigning a POS (lexical category) to every word in a text Words can be ambiguous What is the best way to disambiguate?

26 Part-of-Speech Disambiguation Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN The/DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN is … Given a sentence W1…Wn and a tagset of lexical categories, find the most likely tag C1..Cn for each word in the sentence Tagset – e.g., Penn Treebank (45 tags) Note that many of the words may have unambiguous tags The tagger also has to deal with unknown words

27 Penn Tree Bank Tagset

28 A Statistical Method for POS Tagging MD NN VB PRP He will race lexical generation probs he|PRP 0.3 will|MD 0.8 race|NN 0.4 race|VB 0.6 will|NN |  1 C|R MD NN VB PRP MD.4.6 NN.3.7 PRP.8.2  1 POS bigram probs Find the value of C1..Cn which maximizes:  i=1, n P(Wi| Ci) * P(Ci| Ci-1) lexical generation probabilities POS bigram probabilities

29 Chomsky’s Critique of Corpus-Based Methods 1. Corpora model performance, while linguistics is aimed at the explanation of competence If you define linguistics that way, linguistic theories will never be able to deal with actual, messy data 2. Natural language is in principle infinite, whereas corpora are finite, so many examples will be missed Excellent point, which needs to be understood by anyone working with a corpus. But does that mean corpora are useless? Introspection is unreliable (prone to performance factors, cf. only short sentences), and pretty useless with child data. Insights from a corpus might lead to generalization/induction beyond the corpus– if the corpus is a good sample of the “text population” 3. Ungrammatical examples won’t be available in a corpus Depends on the corpus, e.g., spontaneous speech, language learners, etc. The notion of grammaticality is not that clear Who did you see [pictures/?a picture/??his picture/*John’s picture] of?

30 Topics 1. Finding Syntactic Patterns in Human Languages 2. Meaning from Patterns 3. Patterns from Language in the Large 4. Bridging the Rationalist-Empiricist Divide 5. Applications 6. Conclusion

31 The Annotation of Data If we want to learn linguistic properties from data, we need to annotate the data Train on annotated data Test methods on other annotated data Through the annotation of corpora, we encode linguistic information in a computer-usable way.

32 An Annotation Tool

33 Knowledge Discovery Methodology Raw Corpus Annotated Corpus Initial Tagger Annotation Editor Annotation Guidelines Machine Learning Program Raw Corpus Learned Rules Annotated Corpus Rule Apply Knowledge Base?

34 Topics 1. Finding Syntactic Patterns in Human Languages 2. Meaning from Patterns 3. Patterns from Language in the Large 4. Bridging the Rationalist-Empiricist Divide 5. Applications 6. Conclusion

35 Application #1: Machine Translation Using different techniques for linguistic analysis, we can: Parse the contents of one language Generate another language consisting of the same content

36 Machine Translation on the Web nguist/GU-CLI/GU-CLI-home.html

37 If languages were all very similar…. … then MT would be easier Dialects Spanish to Portuguese…. Spanish to French English to Japanese …………..

38 MT Approaches Interlingua Semantics Syntax Morphology Direct Syntactic Transfer Semantic Transfer

39 MT Using Parallel Treebanks

40 Application #2: Understanding a Simple Narrative (Question Answering) Yesterday Holly was running a marathon when she twisted her ankle. David had pushed her. 1. When did the running occur? Yesterday. 2. When did the twisting occur? Yesterday, during the running. 3. Did the pushing occur before the twisting? Yes. 4. Did Holly keep running after twisting her ankle? Maybe not????

41 Question Answering by Computer (Temporal Questions) Yesterday Holly was running a marathon when she twisted her ankle. David had pushed her run twist ankle during finishes or during before push before during 1. When did the running occur? Yesterday. 2. When did the twisting occur? Yesterday, during the running. 3. Did the pushing occur before the twisting? Yes. 4. Did Holly keep running after twisting her ankle? Maybe not????

42 Application #3: Information Extraction KEY: Trigger word tagging Named Entity tagging Chunk parsing: NGs, VGs, preps, conjunctions Bridgestone Sports Co. said Friday it has set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be shipped to Japan. Company NG Set-UP VG Joint- Venture NG with Company NG Produce VG Product NG The joint venture, Bridgestone Sports Taiwan Cp., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month.

43 Information Extraction: Filling Templates Activity: Type: PRODUCTION Company: Product: golf clubs Start-date: Bridgestone Sports Co. said Friday it has set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be shipped to Japan. Activity: Type: PRODUCTION Company: Bridgestone Sports Taiwan Co Product: iron and “metal wood” clubs Start-date: DURING 1990 The joint venture, Bridgestone Sports Taiwan Cp., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month.

44 Conclusion NLP programs can carry out a number of very interesting tasks Part-of-speech disambiguation Parsing Information extraction Machine Translation Question Answering These programs have impacts on the way we communicate These capabilities also have important implications for cognitive science