Hierarchical Classification

Name: Hierarchical Classification
Uploaded: 2017-06-30T15:38:16+00:00
Duration: PTM12S0
Channel: Josephine Jordan
Description: Hierarchical Classification

Hierarchical Classification
Rongcheng Lin Computer Science Department

Contents Motivation, Definition & Problem Review of SVM
Hierarchical Classification Path-based Approaches Regularization-based Approaches

Motivation The classes in real world are structured, specially often hierarchically related. Gene function prediction Document categorization Image Search … Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs Prior knowledge about class relationships will improve the classification performance, especially for tasks with large class number Real world classification systems have complex hierarchical structure

Motivation The classes in real world are structured, specially often hierarchically related. Gene function prediction Document categorization Image Search … Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs Prior knowledge about class relationships will boost the classification performance, especially for tasks with large class number Real world classification systems have complex hierarchical structure

Definition and Problem
automatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output

DAG and Tree Structure

automatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output Problem and solution?

Incorporate the inter-class relationship(hierarchy) into classification Redefine the problem Lower level categories are more detailed while upper level categories are more general Redefine the margin Different classification mistake are of different severity Redefine the loss function Hierarchy indicates one specific dependency among topics

Review: Binary SVM wTx + b = 0 wTx + b > 0 wTx + b < 0
Binary classification Margin Loss Function wTx + b = 0 wTx + b < 0 wTx + b > 0 f(x) = sign(wTx + b) 𝐿(𝑓 𝑥 , 𝑦)

Review: Binary SVM General Form: 𝐽 𝑤 =𝑅 𝑤 + 𝑖=1…𝑛 𝐿(𝑤, 𝑥 𝑖 , 𝑦 𝑖 )

Review: Multiclass SVM
1) one-vs-the rest 2) Crammer & Singer (pairwise)

Dedicated Loss Function

Dedicated Loss Function 𝑀𝑎𝑟𝑔𝑖𝑛: 𝛾 𝑖 𝑤 = 𝑤 𝑦 𝑖 𝑇 𝑋 𝑖 − 𝑤 𝑘 𝑇 𝑋 𝑖 for k≠ 𝑦 𝑖

Review: Hinge Loss Function
the more you violate the margin, the higher the penalty is.

Loss Function ℎ𝑖𝑛𝑔𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑧 + 𝑠𝑞𝑢𝑎𝑟𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑧 2
ℎ𝑖𝑛𝑔𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑧 + 𝑠𝑞𝑢𝑎𝑟𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑧 2 𝑉 𝑧 = 1−𝑧 + 𝑞 𝑓𝑜𝑟 𝑞>1 Ψ−𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑠𝑖𝑔𝑛 𝑧 𝑖𝑓 𝑧≥0 𝑜𝑟 𝑧<0 2 1−𝑧 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝑙𝑜𝑠𝑠 𝑉 𝑧 = log 1+ 𝑒 −𝑧 𝜌−ℎ𝑖𝑛𝑔𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 𝜌−𝑧 + 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑙𝑜𝑠𝑠 𝑉 𝑧 =1 −tanh⁡(𝑐𝑧)

Hierarchical Classifiers
Path-based Approaches Large Margin Hierarchical Classification Hierarchical Document Categorization with Support Vector Machine On Large Margin Hierarchical Classification with multiple paths Regularization-based Approaches Tree-Guided Group Lasso for Multi-task Regression Hierarchical Multitask Structured Output Learning for Large-Scale Segmentation

Tree Distance A given hierarchy induces a metric over the set of classes tree distance or tree induced error 𝛾(y, 𝑦 ) is defined to be the number of edges along the (unique) path from y to 𝑦

Tree Distance A given hierarchy induces a metric over the set of classes tree distance or tree induced error 𝛾(y, 𝑦 ) is defined to be the number of edges along the (unique) path from y to 𝑦 𝒚 y 𝛾 𝑦, 𝑦 =4

Tree Distance 2 5 6 𝒚 8 9 1 3 y 4 𝐷 𝑦, 𝑦 = 𝑓 4 ∗ 𝐶 4 + 𝑓 1 ∗ 𝐶 1 + 𝑓 3 ∗ 𝑐 3

Loss Functions 1 1 𝐷（ 𝑦 ,𝑦) 𝐷( 𝑦 ,𝑦) Hierarchical Hinge Loss
Zero-One Loss 1 1 𝐷( 𝑦 ,𝑦) 𝑓 𝑦 𝑥 − 𝑓 𝑦 (𝑥)

Path-based Approaches
path-based approaches try to find the most likely path from the root. Only need to update the parameters of miss-classified nodes in the tree

Large margin hierarchical classifier
𝑓 𝑦 𝑥 − 𝑓 𝑦 (𝑥) 𝛾(𝑦, 𝑦 ) 𝛾(𝑦, 𝑦 ) 𝑛𝑜𝑡𝑒: 𝑦 𝑖𝑠 𝑡ℎ𝑒 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑙𝑎𝑏𝑒𝑙 𝑎𝑛𝑑 𝑦 ≠𝑦

Training Algorithm

HSVM 𝑓 𝑦 𝑥 − 𝑓 𝑦 (𝑥) Δ( 𝑦 𝑖 ,𝑦) 1

HSVM Δ( 𝑦 𝑖 ,𝑦) 1

Regularization-based Approaches
K individual classification tasks Use a n additional regularization term 𝑅 𝑀𝑇𝐿 𝑤 1 ,…, 𝑤 𝑇 to penalizes the disagreement between the individual models

Multitask Learning Inductions of multiple tasks are performed simultaneously to capture intrinsic relatedness

L1-Norm, L2-Norm Penalize model complexity to avoid overfitting
L-1 Norm give more sparse estimate than L-2 Norm

Group Lasso and Sparse Group Lasso

HMTL: Hierarchical Multitask Learning
𝛾 determine the contribution of regularization from the origin vs. the parent node’s parameters (i.e., the strength of coupling between the node and its parent)

Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity
Original Approach: New Approach: Note:𝛽 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑚𝑒𝑎𝑛𝑖𝑛𝑔 𝑤𝑖𝑡ℎ 𝑤(𝑤𝑒 𝑢𝑠𝑒𝑑 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠𝑙𝑦)

Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity
each leaf node is a class each inner node is a group of classes

Tree-Guided Group Lasso

Advantages and Drawbacks
Assume children is good Assume parent is good Assume both are not good

Advantages and Drawbacks
Assume children is good Tree Guided Group Lasso Assume parent is good HMTL Assume both are not good Path-based It depends!

Hierarchical Classification

Similar presentations

Presentation on theme: "Hierarchical Classification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hierarchical Classification

Similar presentations

Presentation on theme: "Hierarchical Classification"— Presentation transcript:

Similar presentations

About project

Feedback