Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute.

Slides:



Advertisements
Similar presentations
ADBIS 2007 Aggregating Multiple Instances in Relational Database Using Semi-Supervised Genetic Algorithm-based Clustering Technique Rayner Alfred Dimitar.
Advertisements

Latent Variables Naman Agarwal Michael Nute May 1, 2013.
January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Collectively Representing Semi-Structured Data from the Web Bhavana Dalvi, William W. Cohen and Jamie Callan Language Technologies Institute Carnegie Mellon.
AUTOMATIC GLOSS FINDING for a Knowledge Base using Ontological Constraints Bhavana Dalvi (PhD Student, LTI) Work done with: Prof. William Cohen, CMU Prof.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Problem Semi supervised sarcasm identification using SASI
CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Text Classification: An Implementation Project Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo.
Holistic Web Page Classification William W. Cohen Center for Automated Learning and Discovery (CALD) Carnegie-Mellon University.
+ Doing More with Less : Student Modeling and Performance Prediction with Reduced Content Models Yun Huang, University of Pittsburgh Yanbo Xu, Carnegie.
Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.
Distributed Representations of Sentences and Documents
Near-Duplicate Detection by Instance-level Constrained Clustering Hui Yang, Jamie Callan Language Technologies Institute School of Computer Science Carnegie.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
K.U.Leuven Department of Computer Science Predicting gene functions using hierarchical multi-label decision tree ensembles Celine Vens, Leander Schietgat,
Very Fast Similarity Queries on Semi-Structured Data from the Web Bhavana Dalvi, William W. Cohen Language Technologies Institute, Carnegie Mellon University.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Design Challenges and Misconceptions in Named Entity Recognition Lev Ratinov and Dan Roth The Named entity recognition problem: identify people, locations,
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Recent Trends in Text Mining Girish Keswani
Collectively Representing Semi-Structured Data from the Web Bhavana Dalvi, William W. Cohen and Jamie Callan Language Technologies Institute, Carnegie.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented.
Exploratory Learning Semi-supervised Learning in the presence of unanticipated classes Bhavana Dalvi, William W. Cohen, Jamie Callan School Of Computer.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by a grant from the National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
EXPLORATORY LEARNING Semi-supervised Learning in the presence of unanticipated classes Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer.
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Semi-automatic Product Attribute Extraction from Store Website
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Active, Semi-Supervised Learning for Textual Information Access Anastasia Krithara¹, Cyril Goutte², Massih-Reza Amini³, Jean-Michel Renders¹ Massih-Reza.
Semi-Supervised Learning William Cohen. Outline The general idea and an example (NELL) Some types of SSL – Margin-based: transductive SVM Logistic regression.
Data Mining and Text Mining. The Standard Data Mining process.
Recent Trends in Text Mining
Semi-Supervised Clustering
Guillaume-Alexandre Bilodeau
Advanced data mining with TagHelper and Weka
KDD 2004: Adversarial Classification
SVM Based Learning System for F-term Patent Classification
Text Categorization Berlin Chen 2003 Reference:
Using Uneven Margins SVM and Perceptron for IE
Hierarchical, Perceptron-like Learning for OBIE
Semi-Automatic Data-Driven Ontology Construction System
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
Presentation transcript:

Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute for Artificial Intelligence, * School Of Computer Science, Carnegie Mellon University, † Department of Computer Science & Software Engineering, Seattle University Motivation Datasets Method: OptDAC Exploratory EM Experimental Results Acknowledgements : This work is supported in part by Google PhD fellowship in Information Extraction, and NSF grant No. IIS NSFCOHEN. Conclusions  In an entity classification task, topic or concept hierarchies are often incomplete. This can lead to semantic drift of known classes or topics.  Our previous work on Exploratory Learning (Dalvi et al. ECML 2013) extends the semi-supervised EM algorithm by dynamically adding new classes when appropriate. In this paper, we present Exploratory learning techniques for hierarchical semi- supervised learning tasks.  We focus on entity classification task where each entity is represented by either text context or table co-occurrence features. Given a few seed examples per Knowledge Base(KB) category, the task is to classify unlabeled entities into KB categories.  KB categories are arranged in an ontology. There are subset and disjointness constraints defined between these classes. Further, the class hierarchy can be incomplete.  Our proposed method (OptDAC) can learn new examples of existing classes, as well as extend the class hierarchy in a single unified framework. Optimal Label Assignment given Class Constraints  In this paper, we propose the Hierarchical Exploratory EM approach that can take an incomplete class ontology as input, along with a few seed examples of each class, to populate new instances of seeded classes and extend the ontology with newly discovered classes.  Our proposed hierarchical exploratory EM method, named OptDAC- ExploreEM performs better than flat classification and hierarchical semi- supervised EM methods at all levels of hierarchy, especially as we go further down the hierarchy.  Experiments show that OptDAC-ExploreEM outperforms its semi- supervised variant on average by 13% in terms of seed class F1 scores. It also outperforms both previously proposed exploratory learning approaches FLAT-ExploreEM and DAC-ExploreEM in terms of seed class F1on average by 10% and 7% respectively.  In the future, we would like to apply our method on datasets with non- tree structured class hierarchies. Comparison: macro averaged seeded-class F1 OptDAC reduces semantic drift of seeded classes. DatasetStatistics #Entities#Features# (Entity, label) pairs Text-Small2.5K3.4M7.2K Text-Medium12.9K6.7M42.2K Table-Small4.3K0.96M12.2K Table-Medium33.4K2.2M126.K StatisticOntology SmallMedium #Classes34 #levels in the hierarchy1139 #classes per level1, 3, 71, 4, 24, 10 Subset constraint Mutex Constraint Mutex constraint Penalty Score of label assignment Subset constraint Penalty Evaluation of extended class hierarchies OptDAC with varying amount of training data DatasetAvg. Runtime in sec. Avg. runtime in multiple of Flat Semi- supervised EM FLAT OptDAC Semi- supervised EM Exploratory EM Semi- supervised EM Exploratory EM Text-Small Table-Small Text-Medium Table-Medium Runtime of Flat vs. OptDAC method on different datasets Text-Small Table-Small This dataset is made publicly available at hical_ExploratoryLearning_WSDM2016/ index.html hical_ExploratoryLearning_WSDM2016/ index.html When New Classes Are Created? C new Near uniform? Test: Best assignment using the mixed integer program should pick C new Level = Small Ontology Medium Ontology  An example Text pattern feature for entity “Pittsburgh” is (“lives in ARG”, 1000), indicating that the entity Pittsburgh appeared in position ARG of the text context “live in ARG” for 1000 times in the sentences from Clueweb09 dataset.  An example Table context feature for entity “Pittsburgh” is (“clueweb09-en ::2:1”, 1) indicates that the entity “Pittsburgh” appeared once in HTML table 2, column 1 from ClueWeb09 document id “clueweb09-en ”. denotes statistically significant improvements (0.05 significance level) w.r.t. FLAT ExloreEM