Introduction to NLP Thanks for Hongning Wang@UVa’s slides on Text Ming Courses, Slides are slightly modified by Lei Chen.

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

Lexical Semantics and Word Senses Hongning Wang

1 Introduction to Natural Language Processing (Lecture for CS410 Text Information Systems) Jan 28, 2011 ChengXiang Zhai Department of Computer Science.

Introduction to Natural Language Processing Hongning Wang

1 Words and the Lexicon September 10th 2009 Lecture #3.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.

Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.

NLP and Speech 2004 English Grammar

Part of speech (POS) tagging

تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

Part-of-Speech Tagging & Sequence Labeling

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Albert Gatt Corpora and Statistical Methods Lecture 9.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

ELN – Natural Language Processing Giuseppe Attardi

9/8/20151 Natural Language Processing Lecture Notes 1.

Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.

Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.

10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.

Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová

Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.

WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.

Natural Language Processing Artificial Intelligence CMSC February 28, 2002.

Introduction to CL & NLP CMSC April 1, 2003.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.

1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.

Overview of Information Retrieval (CS598-CXZ Advanced Topics in IR Presentation) Jan. 18, 2005 ChengXiang Zhai Department of Computer Science University.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.

Natural Language Processing

Using Semantic Relatedness for Word Sense Disambiguation

CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.

LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.

Data Mining: Text Mining

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.

For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.

Part-of-speech tagging

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

Natural Language Processing (NLP)

For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.

Introduction to Natural Language Processing Hongning Wang

Part-of-Speech Tagging & Sequence Labeling Hongning Wang

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Word classes and part of speech tagging Chapter 5.

Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.

Lexical Semantics and Word Senses Hongning Wang

Recap: distributional hypothesis What is tezgüino? – A bottle of tezgüino is on the table. – Everybody likes tezgüino. – Tezgüino makes you drunk. – We.

Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,

Natural Language Processing (NLP)

Dept. of Computer Science University of Liverpool

CSC 594 Topics in AI – Applied Natural Language Processing

WordNet WordNet, WSD.

CS246: Information Retrieval

Natural Language Processing (NLP)

Natural Language Processing (NLP)

Presentation transcript:

Introduction to NLP Thanks for Hongning Wang@UVa’s slides on Text Ming Courses, Slides are slightly modified by Lei Chen

What is NLP? كلب هو مطاردة صبي في الملعب. Arabic text How can a computer make sense out of this string? - What are the basic units of meaning (words)? - What is the meaning of each word? Morphology Syntax - How are words related with each other? Semantics - What is the “combined meaning” of words? Pragmatics - What is the “meta-meaning”? (speech act) Discourse - Handling a large chunk of text Inference - Making sense of everything CS@UVa CS6501: Text Mining

An example of NLP A dog is chasing a boy on the playground. + Det Noun Aux Verb Prep Lexical analysis (part-of-speech tagging) Noun Phrase Complex Verb Prep Phrase Verb Phrase Sentence Dog(d1). Boy(b1). Playground(p1). Chasing(d1,b1,p1). Semantic analysis Syntactic analysis (Parsing) Scared(x) if Chasing(_,x,_). + Scared(b1) Inference A person saying this may be reminding another person to get the dog back… Pragmatic analysis (speech act) CS@UVa CS6501: Text Mining

If we can do this for all the sentences in all languages, then … Automatically answer our emails Translate languages accurately Help us manage, summarize, and aggregate information Use speech as a UI (when needed) Talk to us / listen to us If we can do this for all the sentences in all languages, then … BAD NEWS: Unfortunately, we cannot right now. General NLP = “Complete AI” CS@UVa CS6501: Text Mining

NLP is difficult!!!!!!! Ambiguity is a “killer”! Natural language is designed to make human communication efficient. Therefore, We omit a lot of “common sense” knowledge, which we assume the hearer/reader possesses We keep a lot of ambiguities, which we assume the hearer/reader knows how to resolve This makes EVERY step in NLP hard Ambiguity is a “killer”! Common sense reasoning is pre-required CS@UVa CS6501: Text Mining

An example of ambiguity Get the cat with the gloves. CS@UVa CS6501: Text Mining

Examples of challenges Word-level ambiguity “design” can be a noun or a verb (Ambiguous POS) “root” has multiple meanings (Ambiguous sense) Syntactic ambiguity “natural language processing” (Modification) “A man saw a boy with a telescope.” (PP Attachment) Anaphora resolution “John persuaded Bill to buy a TV for himself.” (himself = John or Bill?) Presupposition “He has quit smoking.” implies that he smoked before. CS@UVa CS6501: Text Mining

Despite all the challenges, research in NLP has also made a lot of progress… CS@UVa CS6501: Text Mining

A brief history of NLP Early enthusiasm (1950’s): Machine Translation Too ambitious Bar-Hillel report (1960) concluded that fully-automatic high-quality translation could not be accomplished without knowledge (Dictionary + Encyclopedia) Less ambitious applications (late 1960’s & early 1970’s): Limited success, failed to scale up Speech recognition Dialogue (Eliza) Inference and domain knowledge (SHRDLU=“block world”) Real world evaluation (late 1970’s – now) Story understanding (late 1970’s & early 1980’s) Large scale evaluation of speech recognition, text retrieval, information extraction (1980 – now) Statistical approaches enjoy more success (first in speech recognition & retrieval, later others) Current trend: Boundary between statistical and symbolic approaches is disappearing. We need to use all the available knowledge Application-driven NLP research (bioinformatics, Web, Question answering…) Deep understanding in limited domain Shallow understanding Knowledge representation Robust component techniques Statistical language models Applications CS@UVa CS6501: Text Mining

The state of the art A dog is chasing a boy on the playground POS Tagging: 97% Det Noun Aux Verb Det Noun Prep Det Noun Noun Phrase Noun Phrase Complex Verb Noun Phrase Prep Phrase Verb Phrase Parsing: partial >90% Semantics: some aspects Entity/relation extraction Word sense disambiguation Anaphora resolution Verb Phrase Sentence Inference: ??? Speech act analysis: ??? CS@UVa CS6501: Text Mining

Machine translation CS@UVa CS6501: Text Mining

Dialog systems Apple’s siri system Google search CS@UVa CS6501: Text Mining

Information extraction Google Knowledge Graph Wiki Info Box CS@UVa CS6501: Text Mining

Information extraction YAGO Knowledge Base CMU Never-Ending Language Learning CS@UVa CS6501: Text Mining

Building a computer that ‘understands’ text: The NLP pipeline CS@UVa CS6501: Text Mining

Tokenization/Segmentation Split text into words and sentences Task: what is the most likely segmentation /tokenization? There was an earthquake near D.C. I’ve even felt it in Philadelphia, New York, etc. There + was + an + earthquake + near + D.C. I + ve + even + felt + it + in + Philadelphia, + New + York, + etc. CS@UVa CS6501: Text Mining

Part-of-Speech tagging Marking up a word in a text (corpus) as corresponding to a particular part of speech Task: what is the most likely tag sequence A + dog + is + chasing + a + boy + on + the + playground A + dog + is + chasing + a + boy + on + the + playground Det Noun Aux Verb Prep CS@UVa CS6501: Text Mining

Named entity recognition Determine text mapping to proper names Task: what is the most likely mapping Its initial Board of Visitors included U.S. Presidents Thomas Jefferson, James Madison, and James Monroe. Its initial Board of Visitors included U.S. Presidents Thomas Jefferson, James Madison, and James Monroe. Organization, Location, Person CS@UVa CS6501: Text Mining

Syntactic parsing Grammatical analysis of a given sentence, conforming to the rules of a formal grammar Task: what is the most likely grammatical structure A + dog + is + chasing + a + boy + on + the + playground Det Noun Aux Verb Det Noun Prep Det Noun Noun Phrase Complex Verb Prep Phrase Verb Phrase Sentence CS@UVa CS6501: Text Mining

Relation extraction Identify the relationships among named entities Shallow semantic analysis Its initial Board of Visitors included U.S. Presidents Thomas Jefferson, James Madison, and James Monroe. 1. Thomas Jefferson Is_Member_Of Board of Visitors 2. Thomas Jefferson Is_President_Of U.S. CS@UVa CS6501: Text Mining

Logic inference Convert chunks of text into more formal representations Deep semantic analysis: e.g., first-order logic structures Its initial Board of Visitors included U.S. Presidents Thomas Jefferson, James Madison, and James Monroe. ∃𝑥 (Is_Person(𝑥) & Is_President_Of(𝑥,’U.S.’) & Is_Member_Of(𝑥,’Board of Visitors’)) CS@UVa CS6501: Text Mining

Towards understanding of text Who is Carl Lewis? Did Carl Lewis break any records? CS@UVa CS6501: Text Mining

Major NLP applications Speech recognition: e.g., auto telephone call routing Text mining Text clustering Text classification Text summarization Topic modeling Question answering Language tutoring Spelling/grammar correction Machine translation Cross-language retrieval Restricted natural language Natural language user interface Our focus CS@UVa CS6501: Text Mining

NLP & text mining Better NLP => Better text mining Bad NLP => Bad text mining? Robust, shallow NLP tends to be more useful than deep, but fragile NLP. Errors in NLP can hurt text mining performance… CS@UVa CS6501: Text Mining

How much NLP is really needed? Tasks Dependency on NLP Scalability Classification Clustering Summarization Extraction Topic modeling Translation Dialogue Question Answering Inference Speech Act CS@UVa CS6501: Text Mining

So, what NLP techniques are the most useful for text mining? Statistical NLP in general. The need for high robustness and efficiency implies the dominant use of simple models CS@UVa CS6501: Text Mining

What is POS tagging Tagged Text Raw Text POS Tagger Tag Set NNP: proper noun CD: numeral JJ: adjective POS Tagger Pierre_NNP Vinken_NNP ,_, 61_CD years_NNS old_JJ ,_, will_MD join_VB the_DT board_NN as_IN a_DT nonexecutive_JJ director_NN Nov._NNP 29_CD ._. Tagged Text Raw Text Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 . CS@UVa CS 6501: Text Mining

Why POS tagging? POS tagging is a prerequisite for further NLP analysis Syntax parsing Basic unit for parsing Information extraction Indication of names, relations Machine translation The meaning of a particular word depends on its POS tag Sentiment analysis Adjectives are the major opinion holders Good v.s. Bad, Excellent v.s. Terrible CS@UVa CS 6501: Text Mining

Challenges in POS tagging Words often have more than one POS tag The back door (adjective) On my back (noun) Promised to back the bill (verb) Simple solution with dictionary look-up does not work in practice One needs to determine the POS tag for a particular instance of a word from its context CS@UVa CS 6501: Text Mining

Define a tagset We have to agree on a standard inventory of word classes Taggers are trained on a labeled corpora The tagset needs to capture semantically or syntactically important distinctions that can easily be made by trained human annotators CS@UVa CS 6501: Text Mining

Word classes Open classes Closed classes Nouns, verbs, adjectives, adverbs Closed classes Auxiliaries and modal verbs Prepositions, Conjunctions Pronouns, Determiners Particles, Numerals CS@UVa CS 6501: Text Mining

Public tagsets in NLP Brown corpus - Francis and Kucera 1961 500 samples, distributed across 15 genres in rough proportion to the amount published in 1961 in each of those genres 87 tags Penn Treebank - Marcus et al. 1993 Hand-annotated corpus of Wall Street Journal, 1M words 45 tags, a simplified version of Brown tag set Standard for English now Most statistical POS taggers are trained on this Tagset CS@UVa CS 6501: Text Mining

How much ambiguity is there? Statistics of word-tag pair in Brown Corpus and Penn Treebank 11% 18% CS@UVa CS 6501: Text Mining

Is POS tagging a solved problem? Baseline Tag every word with its most frequent tag Tag unknown words as nouns Accuracy Word level: 90% Sentence level Average English sentence length 14.3 words 0.9 14.3 =22% Accuracy of State-of-the-art POS Tagger Word level: 97% Sentence level: 0.97 14.3 =65% CS@UVa CS 6501: Text Mining

Building a POS tagger Rule-based solution Take a dictionary that lists all possible tags for each word Assign to every word all its possible tags Apply rules that eliminate impossible/unlikely tag sequences, leaving only one tag per word she PRP promised VBN,VBD to TO back VB, JJ, RB, NN!! the DT bill NN, VB R1: Pronoun should be followed by a past tense verb Rules can be learned via inductive learning. R2: Verb cannot follow determiner CS@UVa CS 6501: Text Mining

Building a POS tagger Statistical POS tagging 𝒕 ∗ =𝑎𝑟𝑔𝑚𝑎 𝑥 𝒕 𝑝(𝒕|𝒘) What is the most likely sequence of tags 𝒕 for the given sequence of words 𝒘 𝒕= 𝑡 1 𝑡 2 𝑡 3 𝑡 4 𝑡 5 𝑡 6 𝒘= 𝑤 1 𝑤 2 𝑤 3 𝑤 4 𝑤 5 𝑤 6 𝒕 ∗ =𝑎𝑟𝑔𝑚𝑎 𝑥 𝒕 𝑝(𝒕|𝒘) CS@UVa CS 6501: Text Mining

POS tagging with generative models Bayes Rule Joint distribution of tags and words Generative model A stochastic process that first generates the tags, and then generates the words based on these tags 𝒕 ∗ =𝑎𝑟𝑔𝑚𝑎 𝑥 𝒕 𝑝 𝒕 𝒘 =𝑎𝑟𝑔𝑚𝑎 𝑥 𝒕 𝑝 𝒘 𝒕 𝑝(𝒕) 𝑝(𝒘) =𝑎𝑟𝑔𝑚𝑎 𝑥 𝒕 𝑝 𝒘 𝒕 𝑝(𝒕) CS@UVa CS 6501: Text Mining

Hidden Markov models Two assumptions for POS tagging Current tag only depends on previous 𝑘 tags 𝑝 𝒕 = 𝑖 𝑝( 𝑡 𝑖 | 𝑡 𝑖−1 , 𝑡 𝑖−2 ,…, 𝑡 𝑖−𝑘 ) When 𝑘=1, it is so-called first-order HMMs Each word in the sequence depends only on its corresponding tag 𝑝 𝒘 𝒕 = 𝑖 𝑝( 𝑤 𝑖 | 𝑡 𝑖 ) CS@UVa CS 6501: Text Mining

Graphical representation of HMMs 𝑝( 𝑡 𝑖 | 𝑡 𝑖−1 ) Transition probability All the tags in the tagset All the words in the vocabulary 𝑝( 𝑤 𝑖 | 𝑡 𝑖 ) Emission probability Light circle: latent random variables Dark circle: observed random variables Arrow: probabilistic dependency CS@UVa CS 6501: Text Mining

Finding the most probable tag sequence 𝒕 ∗ =𝑎𝑟𝑔𝑚𝑎 𝑥 𝒕 𝑝 𝒕 𝒘 =𝑎𝑟𝑔𝑚𝑎 𝑥 𝒕 𝑖 𝑝 𝑤 𝑖 𝑡 𝑖 𝑝( 𝑡 𝑖 | 𝑡 𝑖−1 ) Complexity analysis Each word can have up to 𝑇 tags For a sentence with 𝑁 words, there will be up to 𝑇 𝑁 possible tag sequences Key: explore the special structure in HMMs! CS@UVa CS 6501: Text Mining

𝒕 𝟏 = 𝑡 4 𝑡 1 𝑡 3 𝑡 5 𝑡 7 𝒕 𝟐 = 𝑡 4 𝑡 1 𝑡 3 𝑡 5 𝑡 2 𝑤 1 𝑤 2 𝑤 3 𝑤 4 𝑤 5 𝑡 1 𝑡 2 𝑡 3 𝑡 4 𝑡 5 𝑡 6 𝑡 7 Word 𝑤 1 takes tag 𝑡 4 CS@UVa CS 6501: Text Mining

Trellis: a special structure for HMMs 𝒕 𝟏 = 𝑡 4 𝑡 1 𝑡 3 𝑡 5 𝑡 7 𝒕 𝟐 = 𝑡 4 𝑡 1 𝑡 3 𝑡 5 𝑡 2 Computation can be reused! 𝑤 1 𝑤 2 𝑤 3 𝑤 4 𝑤 5 𝑡 1 𝑡 2 𝑡 3 𝑡 4 𝑡 5 𝑡 6 𝑡 7 Word 𝑤 1 takes tag 𝑡 4 CS@UVa CS 6501: Text Mining

Viterbi algorithm Store the best tag sequence for 𝑤 1 … 𝑤 𝑖 that ends in 𝑡 𝑗 in 𝑇[𝑗][𝑖] 𝑇[𝑗][𝑖]=max⁡𝑝( 𝑤 1 … 𝑤 𝑖 , 𝑡 1 …, 𝑡 𝑖 = 𝑡 𝑗 ) Recursively compute trellis[j][i] from the entries in the previous column trellis[j][i-1] 𝑇 𝑗 𝑖 =𝑃 𝑤 𝑖 𝑡 𝑗 𝑀𝑎 𝑥 𝑘 𝑇 𝑘 𝑖−1 𝑃 𝑡 𝑗 𝑡 𝑘 Generating the current observation Transition from the previous best ending tag The best i-1 tag sequence CS@UVa CS 6501: Text Mining

Viterbi algorithm Dynamic programming: 𝑂( 𝑇 2 𝑁)! 𝑤 1 𝑤 2 𝑤 3 𝑤 4 𝑤 5 𝑇 𝑗 𝑖 =𝑃 𝑤 𝑖 𝑡 𝑗 𝑀𝑎 𝑥 𝑘 𝑇 𝑘 𝑖−1 𝑃 𝑡 𝑗 𝑡 𝑘 𝑤 1 𝑤 2 𝑤 3 𝑤 4 𝑤 5 𝑡 1 𝑡 2 𝑡 3 𝑡 4 𝑡 5 𝑡 6 𝑡 7 Order of computation CS@UVa CS 6501: Text Mining

Decode 𝑎𝑟𝑔𝑚𝑎 𝑥 𝒕 𝑝(𝒕|𝒘) Take the highest scoring entry in the last column of the trellis Keep backpointers in each trellis to keep track of the most probable sequence 𝑇 𝑗 𝑖 =𝑃 𝑤 𝑖 𝑡 𝑗 𝑀𝑎 𝑥 𝑘 𝑇 𝑘 𝑖−1 𝑃 𝑡 𝑗 𝑡 𝑘 𝑤 1 𝑤 2 𝑤 3 𝑤 4 𝑤 5 𝑡 1 𝑡 2 𝑡 3 𝑡 4 𝑡 5 𝑡 6 𝑡 7 CS@UVa CS 6501: Text Mining

Train an HMMs tagger Parameters in an HMMs tagger Transition probability: 𝑝 𝑡 𝑖 𝑡 𝑗 , 𝑇×𝑇 Emission probability: 𝑝 𝑤 𝑡 , 𝑉×𝑇 Initial state probability: 𝑝 𝑡 𝜋 , 𝑇×1 For the first tag in a sentence CS@UVa CS 6501: Text Mining

Train an HMMs tagger Maximum likelihood estimator Given a labeled corpus, e.g., Penn Treebank Count how often we have the pair of 𝑡 𝑖 𝑡 𝑗 and 𝑤 𝑖 𝑡 𝑗 𝑝 𝑡 𝑗 𝑡 𝑖 = 𝑐( 𝑡 𝑖 , 𝑡 𝑗 ) 𝑐( 𝑡 𝑖 ) 𝑝 𝑤 𝑖 𝑡 𝑗 = 𝑐( 𝑤 𝑖 , 𝑡 𝑗 ) 𝑐( 𝑡 𝑗 ) Proper smoothing is necessary! CS@UVa CS 6501: Text Mining

Viterbi Algorithm Example CS@UVa CS 6501: Text Mining

Viterbi Algorithm Example (Cont.) CS@UVa CS 6501: Text Mining

CS@UVa CS 6501: Text Mining

Public POS taggers Brill’s tagger TnT tagger Stanford tagger SVMTool http://www.cs.jhu.edu/~brill/ TnT tagger http://www.coli.uni-saarland.de/~thorsten/tnt/ Stanford tagger http://nlp.stanford.edu/software/tagger.shtml SVMTool http://www.lsi.upc.es/~nlp/SVMTool/ GENIA tagger http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/ More complete list at http://www-nlp.stanford.edu/links/statnlp.html#Taggers CS@UVa CS 6501: Text Mining

What you should know Definition of POS tagging problem Public tag sets Property & challenges Public tag sets Generative model for POS tagging HMMs CS@UVa CS 6501: Text Mining

Lexical Semantics Lexical semantics WordNet Distributional semantics Meaning of words Relation between different meanings WordNet An ontology structure of word senses Similarity between words Distributional semantics Word sense disambiguation CS@UVa CS 6501: Text Mining

What is the meaning of a word? Most words have many different senses dog = animal or sausage? lie = to be in a horizontal position or a false statement made with deliberate intent What are the relations of different words in terms of meaning? Specific relations between senses Animal is more general than dog Semantic fields Money is related to bank “a set of words grouped, referring to a specific subject … not necessarily synonymous, but are all used to talk about the same general phenomenon ” - wiki CS@UVa CS 6501: Text Mining

Word senses What does ‘bank’ mean? A financial institution E.g., “US bank has raised interest rates.” A particular branch of a financial institution E.g., “The bank on Main Street closes at 5pm.” The sloping side of any hollow in the ground, especially when bordering a river E.g., “In 1927, the bank of the Mississippi flooded.” A ‘repository’ E.g., “I donate blood to a blood bank.” CS@UVa CS 6501: Text Mining

Lexicon entries lemma senses CS@UVa CS 6501: Text Mining

Some terminologies Word forms: runs, ran, running; good, better, best Any, possibly inflected, form of a word Lemma (citation/dictionary form): run; good A basic word form (e.g. infinitive or singular nominative noun) that is used to represent all forms of the same word Lexeme: RUN(V), GOOD(A), BANK1(N), BANK2(N) An abstract representation of a word (and all its forms), with a part-of-speech and a set of related word senses Often just written (or referred to) as the lemma, perhaps in a different FONT Lexicon A (finite) list of lexemes CS@UVa CS 6501: Text Mining

Make sense of word senses Polysemy A lexeme is polysemous if it has different related senses bank = financial institution or a building CS@UVa CS 6501: Text Mining

Make sense of word senses Homonyms Two lexemes are homonyms if their senses are unrelated, but they happen to have the same spelling and pronunciation bank = financial institution or river bank CS@UVa CS 6501: Text Mining

Relations between senses Symmetric relations Synonyms: couch/sofa Two lemmas with the same sense Antonyms: cold/hot, rise/fall, in/out Two lemmas with the opposite sense Hierarchical relations: Hypernyms and hyponyms: pet/dog The hyponym (dog) is more specific than the hypernym (pet) Holonyms and meronyms: car/wheel The meronym (wheel) is a part of the holonym (car) CS@UVa CS 6501: Text Mining

WordNet A very large lexical database of English: George Miller, Cognitive Science Laboratory of Princeton University, 1985 A very large lexical database of English: 117K nouns, 11K verbs, 22K adjectives, 4.5K adverbs Word senses grouped into synonym sets (“synsets”) linked into a conceptual-semantic hierarchy 82K noun synsets, 13K verb synsets, 18K adjectives synsets, 3.6K adverb synsets Avg. # of senses: 1.23/noun, 2.16/verb, 1.41/adj, 1.24/adverb Conceptual-semantic relations hypernym/hyponym CS@UVa CS 6501: Text Mining

A WordNet example http://wordnet.princeton.edu/ CS@UVa CS 6501: Text Mining

Hierarchical synset relations: nouns Hypernym/hyponym (between concepts) The more general ‘meal’ is a hypernym of the more specific ‘breakfast’ Instance hypernym/hyponym (between concepts and instances) Austen is an instance hyponym of author Member holonym/meronym (groups and members) professor is a member meronym of (a university’s) faculty Part holonym/meronym (wholes and parts) wheel is a part meronym of (is a part of) car. Substance meronym/holonym (substances and components) flour is a substance meronym of (is made of) bread Jane Austen, 1775–1817, English novelist CS@UVa CS 6501: Text Mining

WordNet hypernyms & hyponyms CS@UVa CS 6501: Text Mining

Hierarchical synset relations: verbs the presence of a ‘manner’ relation between two lexemes Hypernym/troponym (between events) travel/fly, walk/stroll Flying is a troponym of traveling: it denotes a specific manner of traveling Entailment (between events): snore/sleep Snoring entails (presupposes) sleeping CS@UVa CS 6501: Text Mining

WordNet similarity Path based similarity measure between words Shortest path between two concepts (Leacock & Chodorow 1998) sim = 1/|shortest path| Path length to the root node from the least common subsumer (LCS) of the two concepts (Wu & Palmer 1994) sim = 2*depth(LCS)/(depth(w1)+depth(w2)) http://wn-similarity.sourceforge.net/ the most specific concept which is an ancestor of both A and B. CS@UVa CS 6501: Text Mining

WordNet::Similarity CS@UVa CS 6501: Text Mining

WordNet::Similarity CS@UVa CS 6501: Text Mining

Distributional hypothesis What is tezgüino? A bottle of tezgüino is on the table. Everybody likes tezgüino. Tezgüino makes you drunk. We make tezgüino out of corn. The contexts in which a word appears tell us a lot about what it means CS@UVa CS 6501: Text Mining

Distributional semantics Use the contexts in which words appear to measure their similarity Assumption: similar contexts => similar meanings Approach: represent each word 𝑤 as a vector of its contexts 𝑐 Vector space representation Each dimension corresponds to a particular context 𝑐 𝑛 Each element in the vector of 𝑤 captures the degree to which the word 𝑤 is associated with the context 𝑐 𝑛 Similarity metric Cosine similarity CS@UVa CS 6501: Text Mining

How to define the contexts within a sentence Nearby words 𝑤 appears near 𝑐 if 𝑐 occurs within ±𝑘 words of 𝑤 It yields fairly broad thematic relations Decide on a fixed vocabulary of 𝑁 context words 𝑐 1 .. 𝑐 𝑁 Prefer words occur frequently enough in the corpus but not too frequent (i.e., avoid stopwords) Co-occurrence count of word 𝑤 and context 𝑐 as the corresponding element in the vector Pointwise Mutual Information (PMI) Grammatical relations How often is 𝑤 used as the subject of the verb 𝑐? Fine-grained thematic relations CS@UVa CS 6501: Text Mining

Mutual information Relatedness between two random variables 𝐼 𝑋;𝑌 = 𝑦∈𝑌 𝑥∈𝑋 𝑝(𝑥,𝑦) log ( 𝑝(𝑥,𝑦) 𝑝 𝑥 𝑝(𝑦) ) CS@UVa CS 6501: Text Mining

Pointwise mutual information within a sentence PMI between w and c using a fixed window of ±𝑘 words 𝑃𝑀𝐼 𝑤;𝑐 =𝑝(𝑤,𝑐) log ( 𝑝(𝑤,𝑐) 𝑝 𝑤 𝑝(𝑐) ) How often 𝑤 and 𝑐 co-occur inside a window How often 𝑤 occurs How often 𝑐 occurs CS@UVa CS 6501: Text Mining

Word sense disambiguation What does this word mean? This plant needs to be watered each day. living plant This plant manufactures 1000 widgets each day. factory Word sense disambiguation (WSD) Identify the sense of content words (noun, verb, adjective) in context (assuming a fixed inventory of word senses) watered manufactures CS@UVa CS 6501: Text Mining

Dictionary-based methods A dictionary/thesaurus contains glosses and examples of a word bank1 Gloss: a financial institution that accepts deposits and channels the money into lending activities Examples: “he cashed the check at the bank”, “that bank holds the mortgage on my home” bank2 Gloss: sloping land (especially the slope beside a body of water) Examples: “they pulled the canoe up on the bank”, “he sat on the bank of the river and watched the current” CS@UVa CS 6501: Text Mining

Lesk algorithm Compare the context with the dictionary definition of the sense Construct the signature of a word in context by the signatures of its senses in the dictionary Signature = set of context words (in examples/gloss or in context) Assign the dictionary sense whose gloss and examples are the most similar to the context in which the word occurs Similarity = size of intersection of context signature and sense signature context words CS@UVa CS 6501: Text Mining

Sense signatures bank1 bank2 Gloss: a financial institution that accepts deposits and channels the money into lending activities Examples: “he cashed the check at the bank”, “that bank holds the mortgage on my home” bank2 Gloss: sloping land (especially the slope beside a body of water) Examples: “they pulled the canoe up on the bank”, “he sat on the bank of the river and watched the current” Signature(bank1) = {financial, institution, accept, deposit, channel, money, lend, activity, cash, check, hold, mortgage, home} Signature(bank1) = {slope, land, body, water, pull, canoe, sit, river, watch, current} CS@UVa CS 6501: Text Mining

Signature of target word “The bank refused to give me a loan.” Simplified Lesk Words in context Signature(bank) = {refuse, give, loan} Original Lesk Augmented signature of the target word Signature(bank) = {refuse, reject, request,... , give, gift, donate,... loan, money, borrow,...} CS@UVa CS 6501: Text Mining

What you should know Lexical semantics Distributional semantics Relationship between words WordNet Distributional semantics Similarity between words Word sense disambiguation CS@UVa CS 6501: Text Mining