NLP.

Slides:

Advertisements

Similar presentations

Albert Gatt Corpora and Statistical Methods Lecture 13.

Advertisements

Text Similarity David Kauchak CS457 Fall 2011.

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

Orange: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin & Franz Josef Och (presented by Bilmes) or Orange: a.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Natural Language Processing Prof: Jason Eisner Webpage: syllabus, announcements, slides, homeworks.

SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

Automatic Evaluation Philipp Koehn Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology.

Text Analysis Everything Data CompSci Spring 2014.

1 Ling 569: Introduction to Computational Linguistics Jason Eisner Johns Hopkins University Tu/Th 1:30-3:20 (also this Fri 1-5)

CHAPTER 13 NATURAL LANGUAGE PROCESSING. Machine Translation.

Chapter 7 File I/O 1. File, Record & Field 2 The file is just a chunk of disk space set aside for data and given a name. The computer has no idea what.

1 Computing Relevance, Similarity: The Vector Space Model.

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.

Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.

Albert Gatt LIN3021 Formal Semantics Lecture 4. In this lecture Compositionality in Natural Langauge revisited: The role of types The typed lambda calculus.

Chapter 23: Probabilistic Language Models April 13, 2004.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,

1 An Introduction to Computational Linguistics Mohammad Bahrani.

Text Summarization via Semantic Representation 吳旻誠 2014/07/16.

Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Ling 575: Machine Translation Yuval Marton Winter 2016 February 9: MT Evaluation Much of the materials was borrowed from course slides of Chris Callison-Burch.

Guided Reading at Milton Court Presentation for parents Monday 1 st February 2016.

Generic Trees—Trie, Compressed Trie, Suffix Trie (with Analysi

Information Retrieval in Practice

/665 Natural Language Processing

Gender consistency theory test

OOP - Object Oriented Programming

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

What the problem looks like:

Semantic Processing with Context Analysis

What are n-grams good for?

Writing a Thesis for The Research Project

Robust Semantics, Information Extraction, and Information Retrieval

Vector-Space (Distributional) Lexical Semantics

Neural Machine Translation By Learning to Jointly Align and Translate

Efficient Estimation of Word Representation in Vector Space

2008/09/17: Lecture 4 CMSC 104, Section 0101 John Y. Park

Neural Language Model CS246 Junghoo “John” Cho.

Hierarchical clustering approaches for high-throughput data

Basic Information Retrieval

HCC class lecture 13 comments

Jun Wu and Sanjeev Khudanpur Center for Language and Speech Processing

Fundamentals of Data Representation

Video recap As he studied human population, he looked at several factors that control the population change. What are those factors? What do you think.

Vector Representation of Text

Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab

A Path-based Transfer Model for Machine Translation

Word embeddings (continued)

Retrieval Utilities Relevance feedback Clustering

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

CSC312 Automata Theory Lecture # 2 Languages.

Word representations David Kauchak CS158 – Fall 2016.

Theoretical approaches to helping children to learn to read:

Vector Representation of Text

Presented by Nick Janus

What is Reading Recovery?

CS249: Neural Language Model

Professor Junghoo “John” Cho UCLA

Presentation transcript:

NLP

Text Similarity Text Kernels

Need for Kernels Sentences (or documents, or words) can be represented as vectors in many different ways, depending on the type of similarity that needs to be computed. In document retrieval, word order may often be ignored (the so-called “bag of words” approach). For example, “… cat … dog” may be considered the same as “… dog … cat.” In sentence retrieval word order may matter, e.g., compare “Utopia invades Asgård” and “Asgård invades Utopia.” A bag of words approach, however, will treat these two sentences as the same.

The Basic Idea behind Kernels o x o o x x x

The Basic Idea behind Kernels o x o o x x x

The Basic Idea behind Kernels φ φ(o) φ(o) o x φ(o) o o φ(x) x φ(x) x x φ(x) φ(x)

The Basic Idea behind Kernels φ φ(o) φ(o) o x φ(o) o o φ(x) x φ(x) x x φ(x) φ(x)

Unigram Kernels A unigram kernel is essentially the same as a bag of words For example, we can split the sentence “Utopia invades Asgård” into the three unigrams: “Utopia” “invades” “Asgård” Similarly, we can split “Asgård invades Utopia” into: However, the vector representations of the two sentences will be identical (<1,1,1>) and therefore their cosine similarity will be computed as 1.

Bigram Kernels Using a bigram kernel, we can split the sentence “Utopia invades Asgård” into the two bigrams: “Utopia invades” “invades Asgård” Similarly, we can split “Asgård invades Utopia” into: “Asgård invades” “invades Utopia” Using this representation, the vectors corresponding to the two sentences are <1,1,0,0> and <0,0,1,1>, respectively, and their cosine similarity is 0.

Longer n-grams N-grams for n>2 can also be used for measuring similarity. Longer n-grams (shingles) are particularly useful in Information Retrieval, for fast retrieval of similar documents. In NLP, people use n-gram similarity methods for the evaluation of machine translation (BLEU – Papineni et al.) text summarization (ROUGE – Lin et al.) In both cases, an automatically generated text (a translation or a summary) is compared against a set of reference sentences.

BLEU Example One hundred artists from 16 countries will exhibit 270 pieces of artistic work made on porcelain in the Palace of Arts in Cairo in a an exhibition inaugurated earlier this week and that will last for two weeks. One hundred artists from 61 states are exhibiting 270 pieces of porcelain artwork at the Arts Palace in Cairo, as part of a two-week exhibition that opened at the beginning of this week. 100 artists from 16 countries are exhibiting 270 pieces of work on porcelain in Cairo's Arts Palace as part of a 2-week exhibition, which opened at the beginning of the week. One hundred artists from 16 countries will display 270 pieces of art works undertaken on porcelain in the Palace of Arts in Cairo in an exhibition opened at the beginning of this week and which will continue for two weeks. A hundred artists from 16 countries will exhibit 270 pieces of artistic works that are made on Porcelain in the arts palace in Cairo in a an exhibition inaugurated earlier this week, which will last for two weeks. One hundred artists from 61 countries exhibited 270 types of art pieces done on porcelain in the Cultural Palace of Cairo, in an exhibition that was inaugurated at the beginning of the current week. The exhibition would last two weeks. One hundred artists from 16 countries will exhibit 270 works of art made with porcelain in the Arts Castle in Cairo in an exhibition that opened earlier this week and will last for two weeks. A hundred artists from 16 countries are displaying 270 porcelain art pieces in an exhibition that opened early in the week, and will continue over a span of two weeks. A hundred artists from 16 countries display 270 artistic porcelain pieces at the Palace of Arts in Cairo, in an exhibition that was opened earlier this week and continuing for two weeks.

Letter and Substring Kernels Letter n-grams can be used for various applications such as spelling correction, language recognition, and named entity recognition. For example, the word stop can be represented as the set of all of its substrings: s, t, o, p, st, to, op, sto, top, and stop. In this representation, sim(stop,stops) > sim(stop,plot), even though all three words are different. Character n-grams (Stanford papers)

Subsequences Unlike a substring, a subsequence doesn’t need to consist of contiguous words (or letters). comp, cotr, opter, cpute – all of these are letter-based subsequences of computer Subsequence kernels (of words, not letters) are most useful for measuring similarity between sentences. Character n-grams (Stanford papers)

Quiz On the next slide, you will see a set of published headlines relating the same scientific study published in 2009. The study suggests that Wolfgang Amadeus Mozart may have died of complications caused by strep throat. Take the time to read all these headlines. What interesting (class-related) observations can you make based on your reading? What kernels would be most appropriate for clustering all these headlines together? In other words, these kernels should assign high pairwise similarity scores for the headlines in the group. Character n-grams (Stanford papers)

Did Mozart die of strep throat? what killed mozart? study suggests strep infection what killed mozart? strep, study suggests what killed mozart? study suggest it might have been a strep infection wstc/wnlk local news what killed mozart? a new theory emerges strep throat may have led to mozart's death mozart done in by strep throat? study says mozart died of strep throat dutch researchers suggest 'super-bug' as cause of mozart's death new theory on what killed mozart what killed mozart? study suggests strep study suggests strep infection killed mozart 'strep throat may have killed mozart' what killed mozart? study hints at complications from strep infection what killed mozart? study suggests just strep throat study suggests mozart died of strep infection strep throat may have killed mozart did strep throat kill mozart? mozart may have died from strep throat: study strep throat theory in mozart's death medical study suggests mozart died of strep mozart died of strep throat? infection killed mozart? did strep infection kill mozart? did a strep infection cause mozart's death? mozart may have died from strep study reviews mozart's death infection killed mozart - report mozart's killer revealed: it was not salieri infection killed mozart, says study did poison or strep infection kill mozart? mozart may have died from strep throat, says study cause of mozart's death revealed mozart died from strep what really killed mozart? possibly strep mozart may have been killed by strep throat

Answer to Quiz Observations Suggestions All sentences have Mozart The syntax varies a lot (e.g., passive/active) They all have some word related to dying (e.g., “kill/die/disease”) Suggestions Ngram kernel will probably not work very well here Some semantic information should be encoded in the kernel Possibly, use word2vec

Dependencies likes John apples green

Dependencies green apple green: modifier, child apple: head, parent

Dependency Structure Unionized workers are usually better paid than their non-union counterparts. 1 2 3 4 5 6 7 8 9 10

The Dependency Tree Kernel likes John apples green likes likes likes John apples apples John apples green green likes likes John apples apples green

The Syntactic Tree Kernel REPLACE! Children are not split up Note that siblings are not split up (Collins and Duffy, 2002)

Possible Improvements Add semantics (e.g., strong/powerful) [Moschitti et al.] Other ideas?

NLP