Yangqiu Song Lane Department of CSEE West Virginia University

Slides:



Advertisements
Similar presentations
Albert Gatt Corpora and Statistical Methods Lecture 13.
Advertisements

Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro.
Concepts and Categories. Functions of Concepts By dividing the world into classes of things to decrease the amount of information we need to learn, perceive,
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Presented by Zeehasham Rasheed
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Scalable Text Mining with Sparse Generative Models
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Information Retrieval in Practice
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Attribute Extraction and Scoring: A Probabilistic Approach Taesung Lee, Zhongyuan Wang, Haixun Wang, Seung-won Hwang Microsoft Research Asia Speaker: Bo.
Short Text Understanding Through Lexical-Semantic Analysis
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Machine Learning.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Source-Selection-Free Transfer Learning
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Post-Ranking query suggestion by diversifying search Chao Wang.
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
Data Mining and Decision Support
Concept-based Short Text Classification and Ranking
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Data Mining and Text Mining. The Standard Data Mining process.
Introduction to Machine Learning, its potential usage in network area,
Unsupervised Sparse Vector Densification for Short Text Similarity
On Dataless Hierarchical Text Classification
Hierarchical Clustering & Topic Models
Illinois CCG LoReHLT16 Situation Frame System
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Cross-lingual Dataless Classification for Many Languages
Queensland University of Technology
Data Mining – Intro.
Semi-Supervised Clustering
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Sentiment analysis algorithms and applications: A survey
Cross-lingual Dataless Classification for Many Languages
On Dataless Hierarchical Text Classification
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Vector-Space (Distributional) Lexical Semantics
Unsupervised Extraction of Template Structure in Web Search Queries www 2012 – Session: search Qingxia Liu.
Distributed Representation of Words, Sentences and Paragraphs
Statistical NLP: Lecture 9
Knowledge Base Completion
Information Organization: Clustering
Topic Oriented Semi-supervised Document Clustering
Jiawei Han Department of Computer Science
Michal Rosen-Zvi University of California, Irvine
Text Categorization Berlin Chen 2003 Reference:
Enriching Taxonomies With Functional Domain Knowledge
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
Information Retrieval and Web Design
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Yangqiu Song Lane Department of CSEE West Virginia University Much of the work was done at UIUC Text Classification without Supervision: Incorporating World Knowledge and Domain Adaptation Yangqiu Song Lane Department of CSEE West Virginia University

Collaborators Dan Roth Haixun Wang Shusen Wang Weizhu Chen

Text Categorization Traditional machine learning approach: Label data Train a classifier Make prediction

Challenges Domain expert annotation Diverse domains and tasks Large scale problems Diverse domains and tasks Topics Languages … Short and noisy texts Tweets, Queries,

Reduce Labeling Efforts A more general way? Many diverse and fast changing domains Search engine Social media … Domain specific task: entertainment or sports? Semi-supervised learning Transfer learning Zero-shot learning

Our Solution Knowledge enabled learning Millions of entities and concepts Billions of relationships Labels carry a lot of information! Traditional models treat labels as “numbers or IDs”

Example: Knowledge Enabled Text Classification Dong Nguyen announced that he would be removing his hit game Flappy Bird from both the iOS and Android app stores, saying that the success of the game is something he never wanted. Some fans of the game took it personally, replying that they would either kill Nguyen or kill themselves if he followed through with his decision. Pick a label: Class1 or Class2 ? Mobile Game or Sports

Dataless Text Categorization: Classification on the Fly Mobile Game or Sports? (Good) Label names Map labels/documents to the same space Compute document and label similarities Documents World knowledge Choose labels M.-W. Chang, L.-A. Ratinov, D. Roth, V. Srikumar: Importance of Semantic Representation: Dataless Classification. AAAI 2008. Y. Song, D. Roth: On dataless hierarchical text classification. (AAAI). 2014.

Challenges of Using Knowledge Scalability; Domain adaptation; Open domain classes Knowledge specification; Disambiguation Representation Inference Learning Show some interesting examples Data vs. knowledge representation Compare different representations

Outline of the Talk Dataless Text Classification: Classify Documents on the Fly Label names Map labels/documents to the same space Compute document and label similarities Documents World knowledge Choose labels

Difficulty of Text Representation Polysemy and Synonym cat company fruit tree Meaning Typicality scores Ambiguity Variability Basic level concepts Language cat feline kitty moggy apple Rosch, E. et al. Basic objects in natural categories. Cognitive Psychology. 1976. Rosch, E. Principles of categorization. In Rosch, E., and Lloyd, B., eds., Cognition and Categorization. 1978.

Typicality of Entities bird

Basic Level Concepts What do we usually call it? pug pet dog mammal animal

We use the right level of concepts to describe things! pug bulldog We use the right level of concepts to describe things!

Probase: A Probabilistic Knowledge Base Web Document Cleaning Information Extraction Semantic Cleaning Knowledge Integration 1.68 billions documents Hearst patterns “Animals such as dogs and cats.” Freebase Wikipedia Mutual exclusive concepts M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. Int. Conf. on Comp. Ling. (COLING).1992. W. Wu, et al. Probase: A probabilistic taxonomy for text understanding. In ACM SIG on Management of Data (SIGMOD). 2012. (Data released http://probase.msra.cn)

Concept Distribution Distribution of Concepts city country disease magazine bank … local school Java tool big bank BI product … Distribution of Concepts

Typicality Animal Dog

Basic Level Concepts Robin Penguin

Concepts of Multiple Entities Obama’s real-estate policy president, politician investment, property, asset, plan president, politician, investment, property, asset, plan Explicit Semantic Analysis (ESA) + w1 w2 wn = E. Gabrilovich and S. Markovitch. Wikipedia-based Semantic Interpretation for Natural Language Processing. J. of Art. Intell. Res. (JAIR). 2009.

Multiple Related Entities apple adobe software company, brand, fruit brand, software company software company, brand, fruit software company, brand Intersection instead of union!

Probabilistic Conceptualization P(concept | related entities) P(fruit | adobe, apple) = 0 P(adobe | fruit) = 0 Basic Level Concept Typicality Song et al., Int. Joint Conf. on Artif. Intell. (IJCAI). 2011.

Given “China, India, Russia, Brazil”

Given “China, India, Japan, Singapore”

Outline of the Talk Dataless Text Classification: Classify Documents on the Fly Label names Map labels/documents to the same space Compute document and label similarities Documents World knowledge Choose labels

Generic Short Text Conceptualization P(concept | short text) 1. Grounding to knowledge base 2. Clustering entities 3. Inside clusters: intersection 4. Between clusters: union

Markov Random Field Model Parameter estimation: concept distribution Entity clique: intersection To realize the general text conceptualization, we built a markov random field model. First of all, we use a EM-like algorithm to detect the type of the entities. For example population can be an instance of concept demographic data, but it can also be an attribute of concept country. If it co-occurs with a country such as united states, then we annotate population as an attribute. Then we build the entity cliques based on clustering algorithm. Inside the cliques, we build the energy function of Markov random field using the intersection rule. Then with different cliques, we perform parameter estimation, which is similar to the average of all the concepts for different cliques. Entity type: instance or attribute

Given “U.S.”, “Japan”, “U.K.”; “apple”, “pear”; “BBC”, ”New York Time”

Tweet Clustering Better access the right level of concepts Song et al., Int. Joint Conf. on Artif. Intell. (IJCAI). 2011.

Web Search Relevance Evaluation data: Historical data: 300K Web queries 19M query-URL pairs Historical data: 8M URLs 8B query-URL clicks Song et al., Int. Conf. on Inf. and Knowl. Man. (CIKM). 2014.

Domain Adaptation World knowledge bases Domain dependent tasks General purpose Information bias Domain dependent tasks E.g., classification/clustering of entertainment vs. sports Knowledge about science/technology is useless

Domain Adaptation for Corpus Hyper-parameter estimation: domain adaptation Complexity: Parameter estimation: concept distribution Entity clique: intersection Entity type: instance or attribute

Domain Adaptation Results Song et al., Int. Joint Conf. on Artif. Intell. (IJCAI). 2015.

Similarity and Relatedness a specific type of relatedness synonyms, hyponyms/hypernyms, and siblings are highly similar doctor vs. surgeon, bike vs. bicycle Relatedness topically related or based on any other semantic relation heart vs. surgeon, tire vs. car In the following, we focus on Wikipedia! The methodologies apply Entity relatedness Domain adaptation

Dataless Text Classification: Classify Documents on the Fly Label names Map labels/documents to the same space Compute document and label similarities Documents World knowledge Choose labels

Classification in the Same Semantic Space Mobile Game or Sports? Explicit Semantic Analysis (ESA) + w1 w2 wn = E. Gabrilovich and S. Markovitch. Wikipedia-based Semantic Interpretation for Natural Language Processing. J. of Art. Intell. Res. (JAIR). 2009.

Classification of 20 Newsgroups Documents: Cosine Similarity L1: 6 classes L2: 20 classes OHLDA: Same hierarchy Word2vec Trained on wiki Skipgram V.Ha-Thuc, and J.-M. Renders, Large-scale hierarchical text classification without labelled data. In WSDM 2011. Blei et al., Latent Dirichlet Allocation. J. of Mach. Learn. Res. (JMLR). 2003. Mikolov et al. Efficient Estimation of Word Representations in Vector Space. NIPS. 2013.

Two Factors in Dataless Classification Length of document Number of labels 2-newsgroups 20-newsgroups 103 RCV1 Topics 611 Wiki Cates Random Guess Balanced binary classification Multi-class classification

Similarity Cosine Champaign Police Make Arrest Armed Robbery Cases Two Arrested UI Campus … 1 1 Text 1 Text 2

Representation Densification Vector x Vector y Cosine 1.0 Average Max matching 0.7 Hungarian matching

rec.autos vs. sci.electronics (1/16 document: 13 words per text) Song and Roth. North Amer. Chap. Assoc. Comp. Ling. (NAACL). 2015.

Dataless Text Classification: Classify Documents on the Fly Label names Map labels/documents to the same space Compute document and label similarities Documents World knowledge Choose labels

Classification of 20 Newsgroups Documents Cosine Similarity Supervised Classification Blei et al., Latent Dirichlet Allocation. J. of Mach. Learn. Res. (JMLR). 2003. Mikolov et al. Efficient Estimation of Word Representations in Vector Space. Adv. Neur. Info. Proc. Sys. (NIPS). 2013.

Bootstrapping with Unlabeled Data Application of world knowledge of label meaning Initialize N documents for each label Pure similarity based classifications Train a classifier to label N more documents Continue to label more data until no unlabeled document exists Domain adaptation Mobile games Sports

Classification of 20 Newsgroups Documents Bootstrapped: 0.84 Pure Similarity: 0.68 Song and Roth. Assoc. Adv. Artif. Intell. (AAAI). 2014

Hierarchical Classification: Considering Label Dependency Top-down classification Bottom-up classification (flat classification) Root A B … N 1 2 … M 1 2 … 1 2 …

Top-down vs. Bottom-up RCV1 804,414 documents 82 categories in 4 levels 103 nodes in hierarchy 3.24 labels/document Song and Roth. Assoc. Adv. Artif. Intell. (AAAI). 2014

Dataless Text Classification: Classify Documents on the Fly Label names Map labels/documents to the same space Compute document and label similarities Documents World knowledge Choose labels

Labeled data in training Unlabeled data in training Label names in training I.I.D. between training and testing Supervised learning Yes No Unsupervised learning Semi-supervised learning Transfer learning Zero-shot learning Dataless Classification (pure similarity) (bootstrapping)

Using knowledge as structured information instead of flat features! Conclusions Dataless classification Reduce labeling work for thousands of documents Compared semantic representation using world knowledge Probabilistic conceptualization (PC) Explicit semantic analysis (ESA) Word embedding (word2vec) Topic model (LDA) Combination of ESA and word2vec Unified PC and ESA Markov random field model Domain adaptation Hyper-parameter estimation Boostrapping – refining the classifier Advertisement: Using knowledge as structured information instead of flat features! Session 7B, DM835 Thank You! 

Correlation with Human Annotation of IS-A Relationships The Web Gigaword corpus Combining Heterogeneous Models for Measuring Relational Similarity. A. Zhila, W. Yih, C. Meek, G. Zweig & T. Mikolov. In NAACL-HLT-13.