Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hierarchical Classification

Similar presentations


Presentation on theme: "Hierarchical Classification"— Presentation transcript:

1 Hierarchical Classification
Rongcheng Lin Computer Science Department

2 Contents Motivation, Definition & Problem Review of SVM
Hierarchical Classification Path-based Approaches Regularization-based Approaches

3 Motivation The classes in real world are structured, specially often hierarchically related. Gene function prediction Document categorization Image Search Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs Prior knowledge about class relationships will improve the classification performance, especially for tasks with large class number Real world classification systems have complex hierarchical structure

4 Motivation The classes in real world are structured, specially often hierarchically related. Gene function prediction Document categorization Image Search Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs Prior knowledge about class relationships will improve the classification performance, especially for tasks with large class number Real world classification systems have complex hierarchical structure

5 Motivation The classes in real world are structured, specially often hierarchically related. Gene function prediction Document categorization Image Search Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs Prior knowledge about class relationships will boost the classification performance, especially for tasks with large class number Real world classification systems have complex hierarchical structure

6 Definition and Problem
automatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output

7 DAG and Tree Structure

8 Definition and Problem
automatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output Problem and solution?

9 Definition and Problem
Incorporate the inter-class relationship(hierarchy) into classification Redefine the problem Lower level categories are more detailed while upper level categories are more general Redefine the margin Different classification mistake are of different severity Redefine the loss function Hierarchy indicates one specific dependency among topics

10 Definition and Problem
Incorporate the inter-class relationship(hierarchy) into classification Redefine the problem Lower level categories are more detailed while upper level categories are more general Redefine the margin Different classification mistake are of different severity Redefine the loss function Hierarchy indicates one specific dependency among topics

11 Definition and Problem
Incorporate the inter-class relationship(hierarchy) into classification Redefine the problem Lower level categories are more detailed while upper level categories are more general Redefine the margin Different classification mistake are of different severity Redefine the loss function Hierarchy indicates one specific dependency among topics

12 Review: Binary SVM wTx + b = 0 wTx + b > 0 wTx + b < 0
Binary classification Margin Loss Function wTx + b = 0 wTx + b < 0 wTx + b > 0 f(x) = sign(wTx + b) 𝐿(𝑓 𝑥 , 𝑦)

13 Review: Binary SVM General Form: 𝐽 𝑤 =𝑅 𝑤 + 𝑖=1…𝑛 𝐿(𝑤, 𝑥 𝑖 , 𝑦 𝑖 )

14 Review: Multiclass SVM
1) one-vs-the rest 2) Crammer & Singer (pairwise)

15 Review: Multiclass SVM
Dedicated Loss Function

16 Review: Multiclass SVM
Dedicated Loss Function 𝑀𝑎𝑟𝑔𝑖𝑛: 𝛾 𝑖 𝑤 = 𝑤 𝑦 𝑖 𝑇 𝑋 𝑖 − 𝑤 𝑘 𝑇 𝑋 𝑖 for k≠ 𝑦 𝑖

17 Review: Hinge Loss Function
the more you violate the margin, the higher the penalty is.

18 Loss Function ℎ𝑖𝑛𝑔𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑧 + 𝑠𝑞𝑢𝑎𝑟𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑧 2
ℎ𝑖𝑛𝑔𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑧 + 𝑠𝑞𝑢𝑎𝑟𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑧 2 𝑉 𝑧 = 1−𝑧 + 𝑞 𝑓𝑜𝑟 𝑞>1 Ψ−𝑙𝑜𝑠𝑠 𝑉 𝑧 = 1−𝑠𝑖𝑔𝑛 𝑧 𝑖𝑓 𝑧≥0 𝑜𝑟 𝑧<0 2 1−𝑧 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝑙𝑜𝑠𝑠 𝑉 𝑧 = log 1+ 𝑒 −𝑧 𝜌−ℎ𝑖𝑛𝑔𝑒 𝑙𝑜𝑠𝑠 𝑉 𝑧 = 𝜌−𝑧 + 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑙𝑜𝑠𝑠 𝑉 𝑧 =1 −tanh⁡(𝑐𝑧)

19 Hierarchical Classifiers
Path-based Approaches Large Margin Hierarchical Classification Hierarchical Document Categorization with Support Vector Machine On Large Margin Hierarchical Classification with multiple paths Regularization-based Approaches Tree-Guided Group Lasso for Multi-task Regression Hierarchical Multitask Structured Output Learning for Large-Scale Segmentation

20 Tree Distance A given hierarchy induces a metric over the set of classes tree distance or tree induced error 𝛾(y, 𝑦 ) is defined to be the number of edges along the (unique) path from y to 𝑦

21 Tree Distance A given hierarchy induces a metric over the set of classes tree distance or tree induced error 𝛾(y, 𝑦 ) is defined to be the number of edges along the (unique) path from y to 𝑦 𝒚 y 𝛾 𝑦, 𝑦 =4

22 Tree Distance 2 5 6 𝒚 8 9 1 3 y 4 𝐷 𝑦, 𝑦 = 𝑓 4 ∗ 𝐶 4 + 𝑓 1 ∗ 𝐶 1 + 𝑓 3 ∗ 𝑐 3

23 Loss Functions 1 1 𝐷( 𝑦 ,𝑦) 𝐷( 𝑦 ,𝑦) Hierarchical Hinge Loss
Zero-One Loss 1 1 𝐷( 𝑦 ,𝑦) 𝑓 𝑦 𝑥 − 𝑓 𝑦 (𝑥)

24 Path-based Approaches
path-based approaches try to find the most likely path from the root. Only need to update the parameters of miss-classified nodes in the tree

25 Large margin hierarchical classifier
𝑓 𝑦 𝑥 − 𝑓 𝑦 (𝑥) 𝛾(𝑦, 𝑦 ) 𝛾(𝑦, 𝑦 ) 𝑛𝑜𝑡𝑒: 𝑦 𝑖𝑠 𝑡ℎ𝑒 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑙𝑎𝑏𝑒𝑙 𝑎𝑛𝑑 𝑦 ≠𝑦

26 Training Algorithm

27 HSVM

28 HSVM 𝑓 𝑦 𝑥 − 𝑓 𝑦 (𝑥) Δ( 𝑦 𝑖 ,𝑦) 1

29 HSVM Δ( 𝑦 𝑖 ,𝑦) 1

30 Regularization-based Approaches
K individual classification tasks Use a n additional regularization term 𝑅 𝑀𝑇𝐿 𝑤 1 ,…, 𝑤 𝑇 to penalizes the disagreement between the individual models

31 Multitask Learning Inductions of multiple tasks are performed simultaneously to capture intrinsic relatedness

32

33 L1-Norm, L2-Norm Penalize model complexity to avoid overfitting
L-1 Norm give more sparse estimate than L-2 Norm

34 Group Lasso and Sparse Group Lasso

35 HMTL: Hierarchical Multitask Learning
𝛾 determine the contribution of regularization from the origin vs. the parent node’s parameters (i.e., the strength of coupling between the node and its parent)

36 HMTL

37 Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity
Original Approach: New Approach: Note:𝛽 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑚𝑒𝑎𝑛𝑖𝑛𝑔 𝑤𝑖𝑡ℎ 𝑤(𝑤𝑒 𝑢𝑠𝑒𝑑 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠𝑙𝑦)

38 Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity
each leaf node is a class each inner node is a group of classes

39

40 Tree-Guided Group Lasso

41

42

43

44 Advantages and Drawbacks
Assume children is good Assume parent is good Assume both are not good

45 Advantages and Drawbacks
Assume children is good Tree Guided Group Lasso Assume parent is good HMTL Assume both are not good Path-based It depends!


Download ppt "Hierarchical Classification"

Similar presentations


Ads by Google