CS548 Fall 2018 Text Mining Showcase

Slides:



Advertisements
Similar presentations
Scalable Text Mining with Sparse Generative Models
Advertisements

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
NTU & MSRA Ming-Feng Tsai
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
DeepWalk: Online Learning of Social Representations
Fill-in-The-Blank Using Sum Product Network
Distributed Representations for Natural Language Processing
A Simple Approach for Author Profiling in MapReduce
Big data classification using neural network
Sentiment analysis using deep learning methods
Convolutional Sequence to Sequence Learning
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
IEEE BIBM 2016 Xu Min, Wanwen Zeng, Ning Chen, Ting Chen*, Rui Jiang*
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
A Straightforward Author Profiling Approach in MapReduce
Deep learning David Kauchak CS158 – Fall 2016.
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
Neural Machine Translation by Jointly Learning to Align and Translate
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Deep Learning Technical Paper Recommender System
Intro to NLP and Deep Learning
Presenter: Chu-Song Chen
Intelligent Information System Lab
Natural Language Processing of Knee MRI Reports
Vector-Space (Distributional) Lexical Semantics
Learning to Rank Shubhra kanti karmaker (Santu)
Efficient Estimation of Word Representation in Vector Space
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Neural Language Model CS246 Junghoo “John” Cho.
Distributed Representation of Words, Sentences and Paragraphs
Convolutional Neural Networks for sentence classification
Vessel Extraction in X-Ray Angiograms Using Deep Learning
Final Presentation: Neural Network Doc Summarization
Table Cell Search for Question Answering Huan Sun
Word Embedding Word2Vec.
Word embeddings based mapping
Word embeddings based mapping
Introduction to Machine Reading Comprehension
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Vector Representation of Text
Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.
Word embeddings (continued)
Deep Learning for the Soft Cutoff Problem
INF 141: Information Retrieval
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
Introduction to Sentiment Analysis
Automatic Handwriting Generation
Presented by: Anurag Paul
SVMs for Document Ranking
Word representations David Kauchak CS158 – Fall 2016.
Topic: Semantic Text Mining
Vector Representation of Text
CS249: Neural Language Model
Visual Grounding.
Presentation transcript:

CS548 Fall 2018 Text Mining Showcase By Charlie Parsons, Sola Shirai, Shyam Subramanian, Erin Teeple, Jack Zhang Showcasing work by Lei Yu, Karl Moritz Hermann, Phil Blunsom, Stephen Pulman On Deep Learning for Answer Sentence Selection

References [1] Lei Yu, Karl M. Hermann, Phil Blunsom, and Stephen Pulman. “Deep learning for answer sentence selection”. NIPS Deep Learning Workshop, 2014. [2] Vasin Punyakanok , Dan Roth , and Wen-tau Yih. “Mapping Dependencies Trees: An Application to Question Answering”. In Proceedings of the 8th International Symposium on Artificial Intelligence and Mathematics, Fort, 2004 [3] Danqi Chen and Christopher D Manning. “A Fast and Accurate Dependency Parser using Neural Networks”. Proceedings of EMNLP, 2014. [4] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. “GloVe: Global Vectors for Word Representation”. In EMNLP, 2014. [5] Mengqiu Wang, Noah A Smith, and Teruko Mitamura. What is the jeopardy model? a quasisynchronous grammar for qa. In EMNLP-CoNLL, 2007. .

What is Text Mining The application of artificial intelligence methods to text data for purposes such as information extraction, summarization, or classification. 1. Text Mining and NLP – Background 2. Abstract 3. DB Query vs. Open Domain Question Answering 4. Answer Sentence Selection OpenMinTeD Project for linking text mining workflows: http://www.nactem.ac.uk/openminted/

Who wrote the book Harry Potter? Answer Sentence : Harry Potter is a series of fantasy novels written by British author J.K. Rowling Answer Phrase : J.K. Rowling Source: https://en.wikipedia.org/wiki/Harry_Potter

Answer Sentence Selection Task To answer a given question, a method must: 1) Identify question type/content 2) Retrieve relevant information from other sources 3) Select correct and appropriate information as the answer

Deep Learning for Answer Sentence Selection Lei Yu, Karl Moritz Hermann, Phil Blunsom, Stephen Pulman Innovation: Neural network-based sentence model developed that is Applicable to any language Does not require feature-engineering or hand-coded resources beyond training initial word embeddings. Objective: Authors match questions with answers using semantic encoding to select an answer from among candidate sentences not seen during training. Maybe mention key idea of this paper? “We propose a novel approach to solving this task via means of distributed representations, and learn to match questions with answers by considering their semantic encoding. This contrasts prior work on this task, which typically relies on classifiers with large numbers of hand-crafted syntactic and semantic features and various external resources.”

TREC Answer Selection Dataset [5] Mengqiu Wang, Noah A Smith, and Teruko Mitamura. What is the jeopardy model? a quasi-synchronous grammar for qa. In EMNLP-CoNLL, 2007. TREC Dataset: Factoid questions + List of Corresponding Answer Sentences TRAIN: answer candidates chosen by non-stopword counts and patterns match TRAIN-ALL: noisy training data set, answers labelled automatically Task: Rank candidate answers based on relatedness to the question MAP - mean average precision, examines ranks of all correct answers MRR - mean reciprocal rank, rank of any correct answer Evaluation: TREC_eval official evaluation scripts used to allow for performance comparisons with other models

TREC Answer Selection Dataset [5]

TREC_eval https://www-nlpir.nist.gov/projects/trecvid/trecvid.tools/trec_eval_video/A.README Standard tool used by researchers in TREC (Text REtrieval Conference ) for evaluating information retrieval performance. Outputs: Total number of documents over all queries Retrieved: Relevant: Rel_ret: (relevant and retrieved) 2. Precision - % retrieved documents that are relevant 3. Mean Average Precision - if a relevant document is not retrieved, its precision is 0, all precision values are averaged together to get the performance of a query 4. Mean Reciprocal Rank - The reciprocal rank is then the inverse of the rank for the true correct answer (1 - first, ½, second). Mean reciprocal rank is the average over all query matches. TREC_eval uses rank of top relevant document (i.e., if first relevant document is third document, reciprocal rank is 1/3)

Answer Selection Model: Setup Set of question sentences { q1, q2, …, qn } Set of answer sentences { a1, a2, …, an } Assume that matching question-answer pairs have high semantic similarity Model sentences as vectors and evaluate relatedness Better visual representation

Word Embeddings Word Embeddings are Vector representation of words Sentence 1 : Have a good day Sentence 2 : Have a great day Vocabulary : {Have, a, good, great, day} Vector representation : One Hot Encoding Problem: Similarity between vector representation of good and great? One Hot Encoding Have a good great day 1

Complex Word Embeddings Continuous bag of words Complex Word Embeddings Source word (great) Target word (day) Neural Network to learn word embedding of one word from another For Example: Continuous Bag Of Words - Predict day from great and good or vice-versa Popular Pre-Trained Word Embeddings - Word2Vec, GloVe [4] https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec -652d0c2060fa

Vectorize Sentences with Word Embeddings Convert words into a vector Use pre-trained word embeddings Sentences represented as combination of the word vectors E.g. Average of word vectors 1. Word embeddings - (Word2Vec, GloVe)? 2. Explain the methodology Form a Question from Answer. Find q’ = Ma a. Unigram Bag of Words

Formal Model Definition Given a vector representation of question q and answer a, the probability that a is the correct answer to q: q and a are d-dimensional vectors Matrix M and bias b are learned parameters qTMa collapses vectors into a single number σ: Sigmoid squashes result into (0, 1) to give final probability Slides [formal model, intuitively, lossfun, bow] ~ roughly 4min (without hp example)

How it works Intuitively… For a candidate answer a We “generate” a question q’ = Ma Then measure similarity of q to q’ by their dot product “Generate” the question q’ Computer similarity of q’ to q

How it works Intuitively… E.g. a = “Harry potter is a fantasy novel series written by british author J.K. Rowling” Generate q’ = “Who wrote the fantasy novel series Harry Potter?” Compare similarity to q = “Who wrote Harry Potter?” “Generate” the question q’ Computer similarity of q’ to q Do example question generation - go back to harry potter example? E.g. for the answer sentence “Harry potter is a fantasy novel series written by british author j.k. Rowling”, maybe this might generate a question like “who wrote the fantasy novel series harry potter?” and from there we would compare the similarity to actual question sentences

Loss Function Model is trained by minimizing cross entropy of Question-Answer pairs Is the Frobenius norm Used for regularization to prevent overfitting

Representation of Sentences Type 1 : Bag-of-Words (Unigram) First implementation of answer selection model Sum word embeddings of each word in sentence Normalize by sentence length

Representation of Sentences Type 2: CNN (Bigram) CONVOLUTION FILTERS + TANH AVERAGE POOLING INPUT LAYER Who SENTENCE wrote S Harry Potter

Architecture of One-dimensional convolutional neural network 1-D CNN BREAK-DOWN Input Layer (N x 1 x D) TanH + Convolutional (N x 1 x D) Average Pooling Layer (1 x D) Who W1 TL C1 TR Sentence wrote W2 TL S SUM C2 TR Harry W3 TL C3 TR Potter W3 Architecture of One-dimensional convolutional neural network

Unigram vs Bigram Sentence Modelling Sentence 1: Jim loves Joe Unigram: (Jim), (loves), (Joe) Bigram: (Jim,loves) (loves,Joe) Sentence 2: Joe loves Jim Unigram: (Joe), (loves), (Jim) Bigram: (Joe,loves),(loves,Jim) Advantages of Bigram over Unigram Unigram Sentence 1: [1,1,1] Unigram Sentence 2: [1,1,1] Bigram Sentence 1: [1,1,0,0] Bigram Sentence 2: [0,0,1,1] N-gram: An N-gram is an N-token sequence of words: a 2- gram (more commonly called a bigram) is a two-word sequence of words like “please turn”, “turn your”, or “your homework”, and a 3-gram (more commonly called a trigram) is a three-word sequence of words like “please turn your”, or “turn your homework”. Similar Not Similar Unigram Vocabulary: (Joe),(Jim),(loves) Bigram Vocabulary: (Jim,loves), (loves,Joe), (Joe,loves), (loves,Jim)

Output Probability of Q/A similarity The Big Picture Question Sentence Embedding q-Word Embeddings BOW/CNN q Output Probability of Q/A similarity a-word embeddings a BOW/CNN Answer Sentence Embedding

Experimental Methods Question and Answer Sentence Embeddings - Unigram + BOW, Bigram + CNN Word Embeddings - Collobert and Weston neural language model Initial Weights - Gaussian (μ = 0, Std. = 0.01) Hyper-parameter optimization - Grid Search on MAP score on Dev data Adaptive Gradient optimization 1. TREC Dataset – Train-All, Train, Dev, Test, trec_eval 2. Used Word Embeddings 3. AdaGrad 4. L-BFGS 5. Cross Entropy

Unigram+Count & Bigram+Count Logistic Regression with Co-occurrence count (L-BFGS with L2 Regularizer of 0.01) Q: Who is the author of the book Harry Potter? A1: Harry Potter is a series of fantasy novels written by British author J. K. Rowling A2: The novels chronicle the lives of a young wizard, Harry Potter, and his friends Hermione Granger and Ron Weasley Predictors and Output Probability Q/A Co-occurrence count In Q/A pair In Q/A pair weighted by IDF co-occurrence count Output Probability Q/A

Results Bigram out performed unigram Including IDF-weighted word count improved results Using the noisy, larger dataset also improved results 1. Evalutation Metrics - Mean Average Precision, Mean Reciprocal Rank 2. Base lines vs. Published models vs. Proposed Model

Discussion Simpler model than the best performing Published Models Can be directly applied across languages, unlike some of the Published Models Here are a couple of examples where the author’s model correctly identified the best answer, but other models failed (co-occurrence word count): 1. Results with Questions and Answers (Unigram < Unigram + count < Bigram < Bigram+Count) 2. Shortcomings of the proposed model 3. Conclusion

Discussion - Cont. Problems with the model Here, the judgement requires a good understanding of the sentence and some background knowledge about the relation between lead singer and group.

Questions? Thank you!