Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar.

Slides:



Advertisements
Similar presentations
Background Knowledge for Ontology Construction Blaž Fortuna, Marko Grobelnik, Dunja Mladenić, Institute Jožef Stefan, Slovenia.
Advertisements

Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Florida International University COP 4770 Introduction of Weka.
Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
On feature distributional clustering for text categorization Bekkerman, El-Yaniv, Tishby and Winter The Technion. June, 27, 2001.
Multi-class SVM with Negative Data Selection for Web Page Classification Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao International Joint Conference.
Mapping Between Taxonomies Elena Eneva 30 Oct 2001 Advanced IR Seminar.
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Mapping Between Taxonomies Elena Eneva 27 Sep 2001 Advanced IR Seminar.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Online Learning Algorithms
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
AUTOMATED TEXT CATEGORIZATION: THE TWO-DIMENSIONAL PROBABILITY MODE Abdulaziz alsharikh.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Web Taxonomy Integration through Co-Bootstrapping Dell Zhang National University of Singapore Wee Sun Lee National University of Singapore SIGIR’04.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensemble Methods: Bagging and Boosting
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Adaptive Sampling Methods for Scaling up Knowledge Discovery Algorithms From Ch 8 of Instace selection and Costruction for Data Mining (2001) From Ch 8.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Acclimatizing Taxonomic Semantics for Hierarchical Content Categorization --- Lei Tang, Jianping Zhang and Huan Liu.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Aligner automatiquement des ontologies avec Tuesday 23 rd of January, 2007 Rapha ë l Troncy.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
NTU & MSRA Ming-Feng Tsai
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
Does one size really fit all? Evaluating classifiers in a Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.
Data Mining and Text Mining. The Standard Data Mining process.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
System for Semi-automatic ontology construction
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Special Topics in Data Mining Applications Focus on: Text Mining
Machine Learning Week 1.
Text Categorization Rong Jin.
CS Fall 2016 (Shavlik©), Lecture 2
Support Vector Machine _ 2 (SVM)
Information Retrieval
Integrating Taxonomies
Semi-Automatic Data-Driven Ontology Construction System
Introduction Dataset search
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar

Mapping Between Taxonomies  Formal systems of orderly classification of knowledge, which are designed for a specific purpose  Companies, organizing information in various ways (eg. one for marketing, another for product development)

Approach German French Textile Automobile By country By industry

Approach German French Textile Automobile By country By industry

Approach German French Textile Automobile By country By industry

Approach German French Textile Automobile By country By industry

Approach Textile Automobile By industry

Approach Textile Automobile By industry abc

Approach Textile Automobile By industry abc

Approach German French Textile Automobile By country By industry abc

Approach German French Textile Automobile By country By industry abc

Approach German French Textile Automobile By country By industry abc

Datasets Two classification schemes:  Reuter 2001 ( docs)  Topics (127)  Industry categories (871)  Regions (376)  Hoovers-255 and Hoovers-28 (4286 docs)  industry categories (28)  industry categories (255)

Learning  2 separate methods of learning for the documents:  Old doc category -> new doc category  Doc contents -> new category  Combined method:  Weighted average based on confidence  Final result determined by a decision tree  One combined learner – used both old category and contents as features

Simple Learners  Simple Decision Tree (C4.5) – learns probabilities of new categories based on 1 kind of feature:  Old categories (doesn’t know about documents/words)  Word-based classification (doesn’t know about old categories)  Naïve Bayes (rainbow)  Old categories (doesn’t know about documents/words)  Word-based classification (doesn’t know about old categories)  Support Vector Machine (SVM-Light)  word-based classification (doesn’t know about old categories), linear kernel [results will be reported in the final paper]

Learning  Using the document content abc  Using the document labels DT, NB, SVM

Combined Learners  Weighted Average  Voting scheme  Combination Decision Tree  takes the outputs and confidences of two of the simple learners, predicts new category

Learning  Using both the content and the label  Combining the two outputs abc DT abc DT, NB, SVM voting 3 rd classifier

Results Words Only  5-fold cross validation

Results Categories Only  5-fold cross validation

Results Combination  5-fold cross validation

Results

Remarks  Hierarchy (old classes) usually ignored  Shown that helps  Learners are not the issue  Better way of understanding  Old label (or hierarchy path) is meta data

Remaining Work  SVM results (running even as we speak)  Repeat experiments on Reuters-2001  Internal hierarchies  Missing labels  Less correlated types of classes  Results in standard evaluation format

Future Work  Try with a web dataset (Google and Yahoo! Hierarchies)  Hierarchies of more levels  Meta data (for non-text sources)

Related Literature  A study of Approaches to Hypertext, Y. Yang, S. Slattery, R. Ghani, Journal of Intelligent Information Systems, Volume 18, Number 2, March 2002 (to appear).  Learning Mappings between Data Schemas, A. Doan, P. Domingos, and A. Levy. Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data, 2000, Austin, TX.

Questions and Suggestions The end.

Taxonomies  Formal systems of orderly classification of knowledge, which are designed for a specific purpose  Change of purpose, change of taxonomies  Businesses often need and keep the information in several structures  Important to be able to automatically map between taxonomies

Useful Mappings  Companies, organizing information in various ways (eg. one for marketing, another for product development)  Personal online bookmark classification  Search engines (eg. Google Yahoo)  EU Committee for Standardization “detailed overview of the existing taxonomies officially used in the EU, in order to derive general concepts such as: information organisation, properties, multilinguality, keywords, etc. and, last but not least, the mapping between.”