Download presentation
Presentation is loading. Please wait.
1
Learning Representations of Data
J. Saketha Nath, IIT Bombay Collaborators: Pratik Jawanpuria, Arun Iyer, Sunita Sarawagi, Ganesh Ramakrishnan.
2
Outline Introduction to Representation Learning Summary of Research
Case Study: Class-ratio estimation Concluding remarks
3
Introduction to Representation Learning
4
Representation Learning: Illustration
Training Inference π₯ 1 π¦ 1 π₯ 1 β² ? π₯ 2 π¦ 2 π₯ 2 β² ? π₯ π π¦ π π₯ π β² ?
5
Representation Learning: Illustration
Training Inference π π₯ 1 π π₯ 1 β² π₯ 1 π¦ 1 π₯ 1 β² ? π π₯ 2 π π₯ 2 β² π₯ 2 π¦ 2 π₯ 2 β² ? π π₯ π π π₯ π β² π₯ π π¦ π π₯ π β² ?
6
Representation Learning: Examples
Training Inference π π₯ 1 π π₯ 1 β² π₯ 1 π¦ 1 π₯ 1 β² ? π π₯ 2 π π₯ 2 β² π₯ 2 π¦ 2 π₯ 2 β² ? Principle Component Analysis Deep Learning (long list :) π π₯ π π π₯ π β² π₯ π π¦ π π₯ π β² ?
7
Representation Learning: Illustration
Training Inference π π π₯ 1 π π π₯ 1 β² π₯ 1 π¦ 1 π₯ 1 β² ? π π π₯ 2 π π π₯ 2 β² π¦ 2 ? π₯ 2 π₯ 2 β² π 11 π 12 β¦ π 21 π 22 π 11 β² π 12 β² β¦ π 21 β² π 22 β² π ππ =π π₯ π , π₯ π = π π π₯ π , π π π₯ π π π π₯ π β² π π π₯ π π₯ π π¦ π π₯ π β² ?
8
Kernel Learning: Illustration
Training Inference π π π₯ 1 π π π₯ 1 β² π₯ 1 π¦ 1 π₯ 1 β² ? π π π₯ 2 π π π₯ 2 β² π¦ 2 ? π₯ 2 π₯ 2 β² π 11 π 12 β¦ π 21 π 22 π 11 β² π 12 β² β¦ π 21 β² π 22 β² π ππ =π π₯ π , π₯ π = π π π₯ π , π π π₯ π π π π₯ π β² π π π₯ π π₯ π π¦ π π₯ π β² ?
9
Kernel Learning: Broad set-ups
Multi-modal Data [NIPSβ09, JMLRβ11] Multi-task Learning [SDMβ11, ICMLβ12] Interpretable Rule Learning [ICMLβ11, JMLRβ15]
10
Case Study:Class Ratio Estimation
Kernel Learning
11
Class Ratio Estimation
Labeled Unlabeled π₯ 1 π¦ 1 π₯ 1 β² ? π₯ 2 π¦ 2 π₯ 2 β² ? π₯ π π¦ π π₯ π β² ?
12
Class Ratio Estimation
Labeled Unlabeled π₯ 1 π¦ 1 π₯ 1 β² π₯ 2 π¦ 2 π₯ 2 β² ? What frac. from each class? π₯ π π¦ π π₯ π β²
13
Class Ratio Estimation
π π π π₯ = π=1 π π π π (π) π π/π π (π₯/π)
14
Class Ratio Estimation
π π π π₯ = π=1 π π π π (π) π π/π πΏ (π₯/π) Assumption: π π/π πΏ = π π/π π
15
Class Ratio Estimation
min πβ Ξ π π π π β π=1 π π π π π/π=π πΏ 2
16
Class Ratio Estimation
1 π π’ π=1 π π’ π π ( π₯ π β² ) 1 π π π: π¦ π =π π π ( π₯ π ) min πβ Ξ π π π π β π=1 π π π π π/π=π πΏ 2 Representation of data distribution using kernel
17
Class Ratio Estimation
min πβ Ξ π π π’ π=1 π π’ π π ( π₯ π β² ) β π=1 π π π 1 π π π: π¦ π =π π π ( π₯ π ) π» π 2
18
Class Ratio Estimation
min πβ Ξ π π π’ π=1 π π’ π π ( π₯ π β² ) β π=1 π π π 1 π π π: π¦ π =π π π ( π₯ π ) π» π 2 Kernel Learning: Which π is best?
19
Statistical Consistency
Theorem: Let π , π β be the estimated and true class ratios, let π΄ π be a matrix with π π‘β column as 1 π π π: π¦ π =π π π ( π₯ π ) β 1 π π π: π¦ π =π π π ( π₯ π ) , and let π
π = max π₯βπ³ π π (π₯) , then with probability 1βπΏ, we have: π β π β 2 2 β€ π
π 2 π 2 +1 π π’ + π=1 π 2 π π 1+ πππ 2 πΏ 2 ππππππ( π΄ π π π΄ π ) Please refer ICMLβ14, KDDβ16 for details
20
Kernel Learning Given: π 1 , π 2 ,β¦, π π .
Goal: Find π π = π=1 π π π π π , π π β₯0 such that π min. bound on π β π β mineig π΄ π π β€ π΄ π π =mineig π=1 π π π π΄ π π β€ π΄ π π CONVEX! π
π π 2 = π=1 π π π 2 π
π π CONVEX! π min. empirical average of π β π β Posed as a SDP, solved using cutting-planes algorithm
21
Kernel Learning Given: π 1 , π 2 ,β¦, π π .
Goal: Find π π = π=1 π π π π π , π π β₯0 such thatβ¦ π min. bound on π β π β mineig π΄ π π β€ π΄ π π =mineig π=1 π π π π΄ π π β€ π΄ π π CONVEX! π
π π 2 = π=1 π π π 2 π
π π CONVEX! π min. empirical average of π β π β Posed as a SDP, solved using cutting-planes algorithm
22
Kernel Learning Given: π 1 , π 2 ,β¦, π π .
Goal: Find π π = π=1 π π π π π , π π β₯0 such thatβ¦ π min. bound on π β π β mineig π΄ π π β€ π΄ π π =mineig π=1 π π π π΄ π π β€ π΄ π π CONVEX! π
π π 2 = π=1 π π π 2 π
π π CONVEX! π min. empirical average of π β π β Posed as a SDP, solved using cutting-planes algorithm
23
Kernel Learning Given: π 1 , π 2 ,β¦, π π .
Goal: Find π π = π=1 π π π π π , π π β₯0 such thatβ¦ π min. bound on π β π β mineig π΄ π π β€ π΄ π π =mineig π=1 π π π π΄ π π β€ π΄ π π CONVEX! π
π π 2 = π=1 π π π 2 π
π π CONVEX! π min. empirical average of π β π β Posed as a SDP, solved using cutting-planes algorithm
24
Kernel Learning Given: π 1 , π 2 ,β¦, π π .
Goal: Find π π = π=1 π π π π π , π π β₯0 such thatβ¦ π min. bound on π β π β mineig π΄ π π β€ π΄ π π =mineig π=1 π π π π΄ π π β€ π΄ π π CONVEX! π
π π 2 = π=1 π π π 2 π
π π CONVEX! π min. empirical average of π β π β Posed as a SDP, solved using cutting-planes algorithm
25
Kernel Learning Given: π 1 , π 2 ,β¦, π π .
Goal: Find π π = π=1 π π π π π , π π β₯0 such thatβ¦ π min. bound on π β π β mineig π΄ π π β€ π΄ π π =mineig π=1 π π π π΄ π π β€ π΄ π π CONVEX! π
π π 2 = π=1 π π π 2 π
π π CONVEX! π min. empirical average of π β π β Posed as a SDP, solved using cutting-planes algorithm
26
Kernel Learning Given: π 1 , π 2 ,β¦, π π .
Goal: Find π π = π=1 π π π π π , π π β₯0 such thatβ¦ π min. bound on π β π β mineig π΄ π π β€ π΄ π π =mineig π=1 π π π π΄ π π β€ π΄ π π CONVEX! π
π π 2 = π=1 π π π 2 π
π π CONVEX! π min. empirical average of π β π β Posed as a SDP, solved using cutting-planes algorithm
27
Kernel Learning Given: π 1 , π 2 ,β¦, π π .
Goal: Find π π = π=1 π π π π π , π π β₯0 such thatβ¦ π min. bound on π β π β mineig π΄ π π β€ π΄ π π =mineig π=1 π π π π΄ π π β€ π΄ π π CONVEX! π
π π 2 = π=1 π π π 2 π
π π CONVEX! π min. empirical average of π β π β Posed as a SDP, solved using cutting-planes algorithm
28
Kernel Learning Given: π 1 , π 2 ,β¦, π π .
Goal: Find π π = π=1 π π π π π , π π β₯0 such thatβ¦ π min. bound on π β π β mineig π΄ π π β€ π΄ π π =mineig π=1 π π π π΄ π π β€ π΄ π π CONVEX! π
π π 2 = π=1 π π π 2 π
π π CONVEX! π min. empirical average of π β π β Posed as a SDP, solved using cutting-planes algorithm Please refer ICMLβ14 for details
29
Simulation results Estimation Error
Varying Negative Class Proportions in U (proportion in L is set to [0.5, 0.5])
30
Representation learning improves generalization over expert-crafted ones
01 Learning theory and optimization both need to be leveraged 02 Interplay between kernel and deep learning needs investigation 03 Concluding remarksβ¦
31
Summary of Research
32
Rule Ensemble Learning
Kernel learning π 1 1 π 2 1 π 3 1 π π 1 2 π 2 2 π 3 2 π 1 3 π 2 3 π 3 3 Multi-modal Data NIPSβ09, JMLRβ11 DATA π Multi-task learning SDMβ11, ICMLβ12 INFERENCE Rule Ensemble Learning ICMLβ11, JMLRβ15
33
Rule Ensemble Learning
Kernel learning π 1 1 π 2 1 π 3 1 π π 1 2 π 2 2 π 3 2 π 1 3 π 2 3 π 3 3 Multi-modal Data NIPSβ09, JMLRβ11 INFERENCE DATA π Multi-task learning SDMβ11, ICMLβ12 Rule Ensemble Learning ICMLβ11, JMLRβ15
34
Rule Ensemble Learning
Kernel learning π 1 1 π 2 1 π 3 1 π π 1 2 π 2 2 π 3 2 π 1 3 π 2 3 π 3 3 Multi-modal Data NIPSβ09, JMLRβ11 DATA π Multi-task learning SDMβ11, ICMLβ12 INFERENCE Rule Ensemble Learning ICMLβ11, JMLRβ15
35
Rule Ensemble Learning
Kernel learning π 1 1 π 2 1 π 3 1 π π 1 2 π 2 2 π 3 2 π 1 3 π 2 3 π 3 3 Multi-modal Data NIPSβ09, JMLRβ11 INFERENCE DATA π Multi-task learning SDMβ11, ICMLβ12 Rule Ensemble Learning ICMLβ11, JMLRβ15
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.