Download presentation
1
Hierarchical Classification
Rongcheng Lin Computer Science Department
2
Contents Motivation, Definition & Problem Review of SVM
Hierarchical Classification Path-based Approaches Regularization-based Approaches
3
Motivation The classes in real world are structured, specially often hierarchically related. Gene function prediction Document categorization Image Search … Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs Prior knowledge about class relationships will improve the classification performance, especially for tasks with large class number Real world classification systems have complex hierarchical structure
4
Motivation The classes in real world are structured, specially often hierarchically related. Gene function prediction Document categorization Image Search … Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs Prior knowledge about class relationships will improve the classification performance, especially for tasks with large class number Real world classification systems have complex hierarchical structure
5
Motivation The classes in real world are structured, specially often hierarchically related. Gene function prediction Document categorization Image Search … Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs Prior knowledge about class relationships will boost the classification performance, especially for tasks with large class number Real world classification systems have complex hierarchical structure
6
Definition and Problem
automatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output
7
DAG and Tree Structure
8
Definition and Problem
automatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output Problem and solution?
9
Definition and Problem
Incorporate the inter-class relationship(hierarchy) into classification Redefine the problem Lower level categories are more detailed while upper level categories are more general Redefine the margin Different classification mistake are of different severity Redefine the loss function Hierarchy indicates one specific dependency among topics
10
Definition and Problem
Incorporate the inter-class relationship(hierarchy) into classification Redefine the problem Lower level categories are more detailed while upper level categories are more general Redefine the margin Different classification mistake are of different severity Redefine the loss function Hierarchy indicates one specific dependency among topics
11
Definition and Problem
Incorporate the inter-class relationship(hierarchy) into classification Redefine the problem Lower level categories are more detailed while upper level categories are more general Redefine the margin Different classification mistake are of different severity Redefine the loss function Hierarchy indicates one specific dependency among topics
12
Review: Binary SVM wTx + b = 0 wTx + b > 0 wTx + b < 0
Binary classification Margin Loss Function wTx + b = 0 wTx + b < 0 wTx + b > 0 f(x) = sign(wTx + b) 𝐿(𝑓 𝑥 , 𝑦)
13
Review: Binary SVM General Form: 𝐽 𝑤 =𝑅 𝑤 + 𝑖=1…𝑛 𝐿(𝑤, 𝑥 𝑖 , 𝑦 𝑖 )
14
Review: Multiclass SVM
1) one-vs-the rest 2) Crammer & Singer (pairwise)
15
Review: Multiclass SVM
Dedicated Loss Function
16
Review: Multiclass SVM
Dedicated Loss Function 𝑀𝑎𝑟𝑔𝑖𝑛: 𝛾 𝑖 𝑤 = 𝑤 𝑦 𝑖 𝑇 𝑋 𝑖 − 𝑤 𝑘 𝑇 𝑋 𝑖 for k≠ 𝑦 𝑖
17
Review: Hinge Loss Function
the more you violate the margin, the higher the penalty is.
18
Loss Function ℎ𝑖𝑛𝑔𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑧 + 𝑠𝑞𝑢𝑎𝑟𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑧 2
ℎ𝑖𝑛𝑔𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑧 + 𝑠𝑞𝑢𝑎𝑟𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑧 2 𝑉 𝑧 = 1−𝑧 + 𝑞 𝑓𝑜𝑟 𝑞>1 Ψ−𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑠𝑖𝑔𝑛 𝑧 𝑖𝑓 𝑧≥0 𝑜𝑟 𝑧<0 2 1−𝑧 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝑙𝑜𝑠𝑠 𝑉 𝑧 = log 1+ 𝑒 −𝑧 𝜌−ℎ𝑖𝑛𝑔𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 𝜌−𝑧 + 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑙𝑜𝑠𝑠 𝑉 𝑧 =1 −tanh(𝑐𝑧)
19
Hierarchical Classifiers
Path-based Approaches Large Margin Hierarchical Classification Hierarchical Document Categorization with Support Vector Machine On Large Margin Hierarchical Classification with multiple paths Regularization-based Approaches Tree-Guided Group Lasso for Multi-task Regression Hierarchical Multitask Structured Output Learning for Large-Scale Segmentation
20
Tree Distance A given hierarchy induces a metric over the set of classes tree distance or tree induced error 𝛾(y, 𝑦 ) is defined to be the number of edges along the (unique) path from y to 𝑦
21
Tree Distance A given hierarchy induces a metric over the set of classes tree distance or tree induced error 𝛾(y, 𝑦 ) is defined to be the number of edges along the (unique) path from y to 𝑦 𝒚 y 𝛾 𝑦, 𝑦 =4
22
Tree Distance 2 5 6 𝒚 8 9 1 3 y 4 𝐷 𝑦, 𝑦 = 𝑓 4 ∗ 𝐶 4 + 𝑓 1 ∗ 𝐶 1 + 𝑓 3 ∗ 𝑐 3
23
Loss Functions 1 1 𝐷( 𝑦 ,𝑦) 𝐷( 𝑦 ,𝑦) Hierarchical Hinge Loss
Zero-One Loss 1 1 𝐷( 𝑦 ,𝑦) 𝑓 𝑦 𝑥 − 𝑓 𝑦 (𝑥)
24
Path-based Approaches
path-based approaches try to find the most likely path from the root. Only need to update the parameters of miss-classified nodes in the tree
25
Large margin hierarchical classifier
𝑓 𝑦 𝑥 − 𝑓 𝑦 (𝑥) 𝛾(𝑦, 𝑦 ) 𝛾(𝑦, 𝑦 ) 𝑛𝑜𝑡𝑒: 𝑦 𝑖𝑠 𝑡ℎ𝑒 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑙𝑎𝑏𝑒𝑙 𝑎𝑛𝑑 𝑦 ≠𝑦
26
Training Algorithm
27
HSVM
28
HSVM 𝑓 𝑦 𝑥 − 𝑓 𝑦 (𝑥) Δ( 𝑦 𝑖 ,𝑦) 1
29
HSVM Δ( 𝑦 𝑖 ,𝑦) 1
30
Regularization-based Approaches
K individual classification tasks Use a n additional regularization term 𝑅 𝑀𝑇𝐿 𝑤 1 ,…, 𝑤 𝑇 to penalizes the disagreement between the individual models
31
Multitask Learning Inductions of multiple tasks are performed simultaneously to capture intrinsic relatedness
33
L1-Norm, L2-Norm Penalize model complexity to avoid overfitting
L-1 Norm give more sparse estimate than L-2 Norm
34
Group Lasso and Sparse Group Lasso
35
HMTL: Hierarchical Multitask Learning
𝛾 determine the contribution of regularization from the origin vs. the parent node’s parameters (i.e., the strength of coupling between the node and its parent)
36
HMTL
37
Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity
Original Approach: New Approach: Note:𝛽 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑚𝑒𝑎𝑛𝑖𝑛𝑔 𝑤𝑖𝑡ℎ 𝑤(𝑤𝑒 𝑢𝑠𝑒𝑑 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠𝑙𝑦)
38
Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity
each leaf node is a class each inner node is a group of classes
40
Tree-Guided Group Lasso
44
Advantages and Drawbacks
Assume children is good Assume parent is good Assume both are not good
45
Advantages and Drawbacks
Assume children is good Tree Guided Group Lasso Assume parent is good HMTL Assume both are not good Path-based It depends!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.