Acclimatizing Taxonomic Semantics for Hierarchical Content Categorization --- Lei Tang, Jianping Zhang and Huan Liu.

Slides:



Advertisements
Similar presentations
An Introduction To Categorization Soam Acharya, PhD 1/15/2003.
Advertisements

On the Optimality of Probability Estimation by Random Decision Trees Wei Fan IBM T.J.Watson.
Systematic Data Selection to Mine Concept Drifting Data Streams Wei Fan IBM T.J.Watson.
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Conceptual Clustering
Albert Gatt Corpora and Statistical Methods Lecture 13.
Chapter 4 Search Methodologies.
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
What is Statistical Modeling
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Decision Tree Algorithm
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.
Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar.
Classification Continued
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Presented by Zeehasham Rasheed
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Special Topic: Missing Values. Missing Values Common in Real Data  Pneumonia: –6.3% of attribute values are missing –one attribute is missing in 61%
Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
1 Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections Zhao-Yan Ming, Kai Wang and Tat-Seng Chua School of Computing,
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Mohammad Ali Keyvanrad
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
1 A Graph-Theoretic Approach to Webpage Segmentation Deepayan Chakrabarti Ravi Kumar
Document Categorization Problem: given –a collection of documents, and –a taxonomy of subject areas Classification: Determine the subject area(s) most.
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호
Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.
Hierarchical Classification
Distributed Representative Reading Group. Research Highlights 1Support vector machines can robustly decode semantic information from EEG and MEG 2Multivariate.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Image Classification for Automatic Annotation
M180: Data Structures & Algorithms in Java Trees & Binary Trees Arab Open University 1.
Class Imbalance in Text Classification
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Validation methods.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Wen Chan 1 , Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan.
Semi-Supervised Clustering
Taxonomies, Lexicons and Organizing Knowledge
Introduction to Data Mining, 2nd Edition by
Intent-Aware Semantic Query Annotation
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Machine Learning in Practice Lecture 23
Panagiotis G. Ipeirotis Luis Gravano
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Text Mining Application Programming Chapter 9 Text Categorization
Using Link Information to Enhance Web Page Classification
Presentation transcript:

Acclimatizing Taxonomic Semantics for Hierarchical Content Categorization --- Lei Tang, Jianping Zhang and Huan Liu

Taxonomies and Hierarchical Models Web pages can be organized as a tree- structured taxonomy (Yahoo!, Google directory) Parental control: Web filters to block children ’ s access to undesirable web sites.  Parents want accurate content categorization of different granularity  Service providers appreciate the decision path how a blocking/non-blocking is made for fine tuning. Hierarchical Model: Exploit the taxonomy for classification strategy or loss function

Quality of Taxonomy Most hierarchical models use a predefined taxonomy, typically semantically sound. A librarian is often employed to construct the semantic taxonomy. Is semantically-sound taxonomy always good?  Subjectivity can result in different taxonomies  Semantics change for specific data

A Motivating Example Hurricane Federal Emergency Management Agency Geography Politics Normally During Katrina

A “Bayesian” View Stagnant nature of predefined Taxonomy (Prior Knowledge) Dynamic change of Semantics reflected in Data Data-Driven Taxonomy Inconsistent

“Start from Scratch” - Clustering Throw away the predefined taxonomy information, clustering based on labeled data. Two categories: divisive or hierarchical Usually require human experts to specify some parameters like the maximum height of a tree, the number of nodes in each branch, etc. Difficult to specify parameters without looking at the data

Optimal Hierarchy Optimal hierarchy: How to estimate the likelihood? Hierarchical model’s performance and the likelihood are positively related. Use hierarchical models’ performance statistics on validation set to gauge the likelihood. Brute-force approach to enumerate all taxonomies is infeasible.

Constrained Optimal Hierarchy Predefined taxonomy can help. Assumption: the optimal hierarchy is near the neighborhood of predefined taxonomy H 0 Constrained optimal hierarchy H ’ for H 0 : H ’ results from a series of elementary operations to adjust H 0 until no likelihood increase is observed.

Elementary Operations (H1)(H1) (H3)(H3) (H4)(H4) (H2)(H2) ‘Promote’ ‘Merge’ ‘Demote’ (All the leaf nodes remain unchanged)

Search in Hierarchy Space Given a predefined taxonomy, find its best constrained optimal hierarchy. Search in the hierarchy space. H0H0 H 12 H 11 H 32 H 33 H 31 H 03 H 02 H 13 H 21 H 22 H 23 H 24 H 01 H 04

Finding Best COH Greedy Search Follow the track with largest likelihood increase at each step to search for the best hierarchy. H0H0 H 12 H 11 H 32 H 33 H 31 H 03 H 02 H 13 H 21 H 22 H 23 H 24 H 01 H 04

Framework (a wrapper approach) Given: H 0, Training Data T, Validation Data V 1. Generate neighbor hierarchies for H 0, 2. For each neighbor hierarchy, train hierarchical classification models on T 3. Evaluate hierarchical classifiers on V. 4. Pick the best neighbor hierarchy as H 0 5. Repeat step 1 until no improvement

Hierarchy Neighbors Elementary operations can be applied to any nodes in the tree. Neighbors of a hierarchy could be huge. Most operations are repeated for evaluation ’ 1 3 H1H2

Finding Neighbors Check nodes one by one rather than all the nodes at the same time in each search step. ‘Merge’ and ‘Demote’ only consider the node most similar to the current one. Nodes at higher levels affects more for classification. Top-down traversal: Generate neighbors by performing all possible elementary operations to the shallowest node first.

Further consideration 2 types of top-down traversal: 1. ‘Promote’ operation only to generate neighbors 2. ‘Demote’ and ‘Merge’ operations only to generate neighbors Repeat 2-traversals procedure until no improvement. Root Geography Hurricane Politics If a node is inproperly placed under a parent, we need to ‘promote’ it first.

Experiment Setting 10-fold cross validation Naïve Bayes Classifier (Multinomial) Use information gain to select features Due to the scarcity of documents in each class, we use training data to validate the likelihood of a hierarchy.

Data Sets Data: Soc and Kids Human labeled web pages with a predefined taxonomy SocKids Classes69244 Nodes83299 Height45 Instances Vocabulary

Results on Soc

Results on Kids

Over-fitting? As we optimize the hierarchy just based on training data, it’s possible to over-fit the data.

Robust Method Instead of multiple traversals(iterations), just do 2- traversals once.

Conclusions Semantically sound taxonomy does not necessarily lead to intended good classification performance. Given a predefined taxonomy, we can accustom it to a data-driven taxonomy for more accurate classification Taxonomy generated by our method outperforms human-constructed taxonomy and the taxonomy generated “starting from scratch”.

Future work An initial work to combine “noisy” prior knowledge and data. How to implement an efficient filter model that can find a good taxonomy by exploiting the predefined taxonomy? Feature selection could alleviate the difference between taxonomies. How to use the taxonomy information for feature selection?