Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Representations of Data

Similar presentations


Presentation on theme: "Learning Representations of Data"β€” Presentation transcript:

1 Learning Representations of Data
J. Saketha Nath, IIT Bombay Collaborators: Pratik Jawanpuria, Arun Iyer, Sunita Sarawagi, Ganesh Ramakrishnan.

2 Outline Introduction to Representation Learning Summary of Research
Case Study: Class-ratio estimation Concluding remarks

3 Introduction to Representation Learning

4 Representation Learning: Illustration
Training Inference π‘₯ 1 𝑦 1 π‘₯ 1 β€² ? π‘₯ 2 𝑦 2 π‘₯ 2 β€² ? π‘₯ π‘š 𝑦 π‘š π‘₯ π‘š β€² ?

5 Representation Learning: Illustration
Training Inference πœ™ π‘₯ 1 πœ™ π‘₯ 1 β€² π‘₯ 1 𝑦 1 π‘₯ 1 β€² ? πœ™ π‘₯ 2 πœ™ π‘₯ 2 β€² π‘₯ 2 𝑦 2 π‘₯ 2 β€² ? πœ™ π‘₯ π‘š πœ™ π‘₯ π‘š β€² π‘₯ π‘š 𝑦 π‘š π‘₯ π‘š β€² ?

6 Representation Learning: Examples
Training Inference πœ™ π‘₯ 1 πœ™ π‘₯ 1 β€² π‘₯ 1 𝑦 1 π‘₯ 1 β€² ? πœ™ π‘₯ 2 πœ™ π‘₯ 2 β€² π‘₯ 2 𝑦 2 π‘₯ 2 β€² ? Principle Component Analysis Deep Learning (long list :) πœ™ π‘₯ π‘š πœ™ π‘₯ π‘š β€² π‘₯ π‘š 𝑦 π‘š π‘₯ π‘š β€² ?

7 Representation Learning: Illustration
Training Inference πœ™ π‘˜ π‘₯ 1 πœ™ π‘˜ π‘₯ 1 β€² π‘₯ 1 𝑦 1 π‘₯ 1 β€² ? πœ™ π‘˜ π‘₯ 2 πœ™ π‘˜ π‘₯ 2 β€² 𝑦 2 ? π‘₯ 2 π‘₯ 2 β€² π‘˜ 11 π‘˜ 12 … π‘˜ 21 π‘˜ 22 π‘˜ 11 β€² π‘˜ 12 β€² … π‘˜ 21 β€² π‘˜ 22 β€² π‘˜ 𝑖𝑗 =π‘˜ π‘₯ 𝑖 , π‘₯ 𝑗 = πœ™ π‘˜ π‘₯ 𝑖 , πœ™ π‘˜ π‘₯ 𝑗 πœ™ π‘˜ π‘₯ π‘š β€² πœ™ π‘˜ π‘₯ π‘š π‘₯ π‘š 𝑦 π‘š π‘₯ π‘š β€² ?

8 Kernel Learning: Illustration
Training Inference πœ™ π‘˜ π‘₯ 1 πœ™ π‘˜ π‘₯ 1 β€² π‘₯ 1 𝑦 1 π‘₯ 1 β€² ? πœ™ π‘˜ π‘₯ 2 πœ™ π‘˜ π‘₯ 2 β€² 𝑦 2 ? π‘₯ 2 π‘₯ 2 β€² π‘˜ 11 π‘˜ 12 … π‘˜ 21 π‘˜ 22 π‘˜ 11 β€² π‘˜ 12 β€² … π‘˜ 21 β€² π‘˜ 22 β€² π‘˜ 𝑖𝑗 =π‘˜ π‘₯ 𝑖 , π‘₯ 𝑗 = πœ™ π‘˜ π‘₯ 𝑖 , πœ™ π‘˜ π‘₯ 𝑗 πœ™ π‘˜ π‘₯ π‘š β€² πœ™ π‘˜ π‘₯ π‘š π‘₯ π‘š 𝑦 π‘š π‘₯ π‘š β€² ?

9 Kernel Learning: Broad set-ups
Multi-modal Data [NIPS’09, JMLR’11] Multi-task Learning [SDM’11, ICML’12] Interpretable Rule Learning [ICML’11, JMLR’15]

10 Case Study:Class Ratio Estimation
Kernel Learning

11 Class Ratio Estimation
Labeled Unlabeled π‘₯ 1 𝑦 1 π‘₯ 1 β€² ? π‘₯ 2 𝑦 2 π‘₯ 2 β€² ? π‘₯ π‘š 𝑦 π‘š π‘₯ π‘š β€² ?

12 Class Ratio Estimation
Labeled Unlabeled π‘₯ 1 𝑦 1 π‘₯ 1 β€² π‘₯ 2 𝑦 2 π‘₯ 2 β€² ? What frac. from each class? π‘₯ π‘š 𝑦 π‘š π‘₯ π‘š β€²

13 Class Ratio Estimation
𝑓 𝑋 π‘ˆ π‘₯ = 𝑖=1 𝑐 𝑓 π‘Œ π‘ˆ (𝑖) 𝑓 𝑋/π‘Œ π‘ˆ (π‘₯/𝑖)

14 Class Ratio Estimation
𝑓 𝑋 π‘ˆ π‘₯ = 𝑖=1 𝑐 𝑓 π‘Œ π‘ˆ (𝑖) 𝑓 𝑋/π‘Œ 𝐿 (π‘₯/𝑖) Assumption: 𝑓 𝑋/π‘Œ 𝐿 = 𝑓 𝑋/π‘Œ π‘ˆ

15 Class Ratio Estimation
min πœƒβˆˆ Ξ” 𝑐 𝑓 𝑋 π‘ˆ βˆ’ 𝑖=1 𝑐 πœƒ 𝑖 𝑓 𝑋/π‘Œ=𝑖 𝐿 2

16 Class Ratio Estimation
1 π‘š 𝑒 𝑖=1 π‘š 𝑒 πœ™ π‘˜ ( π‘₯ 𝑖 β€² ) 1 π‘š 𝑖 𝑗: 𝑦 𝑗 =𝑖 πœ™ π‘˜ ( π‘₯ 𝑗 ) min πœƒβˆˆ Ξ” 𝑐 𝑓 𝑋 π‘ˆ βˆ’ 𝑖=1 𝑐 πœƒ 𝑖 𝑓 𝑋/π‘Œ=𝑖 𝐿 2 Representation of data distribution using kernel

17 Class Ratio Estimation
min πœƒβˆˆ Ξ” 𝑐 π‘š 𝑒 𝑖=1 π‘š 𝑒 πœ™ π‘˜ ( π‘₯ 𝑖 β€² ) βˆ’ 𝑖=1 𝑐 πœƒ 𝑖 1 π‘š 𝑖 𝑗: 𝑦 𝑗 =𝑖 πœ™ π‘˜ ( π‘₯ 𝑗 ) 𝐻 π‘˜ 2

18 Class Ratio Estimation
min πœƒβˆˆ Ξ” 𝑐 π‘š 𝑒 𝑖=1 π‘š 𝑒 πœ™ π‘˜ ( π‘₯ 𝑖 β€² ) βˆ’ 𝑖=1 𝑐 πœƒ 𝑖 1 π‘š 𝑖 𝑗: 𝑦 𝑗 =𝑖 πœ™ π‘˜ ( π‘₯ 𝑗 ) 𝐻 π‘˜ 2 Kernel Learning: Which π‘˜ is best?

19 Statistical Consistency
Theorem: Let πœƒ , πœƒ βˆ— be the estimated and true class ratios, let 𝐴 π‘˜ be a matrix with 𝑖 π‘‘β„Ž column as 1 π‘š 𝑖 𝑗: 𝑦 𝑗 =𝑖 πœ™ π‘˜ ( π‘₯ 𝑗 ) βˆ’ 1 π‘š 𝑐 𝑗: 𝑦 𝑗 =𝑐 πœ™ π‘˜ ( π‘₯ 𝑗 ) , and let 𝑅 π‘˜ = max π‘₯βˆˆπ’³ πœ™ π‘˜ (π‘₯) , then with probability 1βˆ’π›Ώ, we have: πœƒ βˆ’ πœƒ βˆ— 2 2 ≀ 𝑅 π‘˜ 2 𝑐 2 +1 π‘š 𝑒 + 𝑖=1 𝑐 2 π‘š 𝑖 1+ π‘™π‘œπ‘” 2 𝛿 2 π‘šπ‘–π‘›π‘’π‘–π‘”( 𝐴 π‘˜ 𝑇 𝐴 π‘˜ ) Please refer ICML’14, KDD’16 for details

20 Kernel Learning Given: π‘˜ 1 , π‘˜ 2 ,…, π‘˜ 𝑛 .
Goal: Find π‘˜ πœ† = 𝑖=1 𝑛 πœ† 𝑖 π‘˜ 𝑖 , πœ† 𝑖 β‰₯0 such that πœ† min. bound on πœƒ βˆ’ πœƒ βˆ— mineig 𝐴 π‘˜ πœ† ⊀ 𝐴 π‘˜ πœ† =mineig 𝑖=1 𝑛 πœ† 𝑖 𝐴 π‘˜ 𝑖 ⊀ 𝐴 π‘˜ 𝑖 CONVEX! 𝑅 π‘˜ πœ† 2 = 𝑖=1 𝑛 πœ† 𝑖 2 𝑅 π‘˜ 𝑖 CONVEX! πœ† min. empirical average of πœƒ βˆ’ πœƒ βˆ— Posed as a SDP, solved using cutting-planes algorithm

21 Kernel Learning Given: π‘˜ 1 , π‘˜ 2 ,…, π‘˜ 𝑛 .
Goal: Find π‘˜ πœ† = 𝑖=1 𝑛 πœ† 𝑖 π‘˜ 𝑖 , πœ† 𝑖 β‰₯0 such that… πœ† min. bound on πœƒ βˆ’ πœƒ βˆ— mineig 𝐴 π‘˜ πœ† ⊀ 𝐴 π‘˜ πœ† =mineig 𝑖=1 𝑛 πœ† 𝑖 𝐴 π‘˜ 𝑖 ⊀ 𝐴 π‘˜ 𝑖 CONVEX! 𝑅 π‘˜ πœ† 2 = 𝑖=1 𝑛 πœ† 𝑖 2 𝑅 π‘˜ 𝑖 CONVEX! πœ† min. empirical average of πœƒ βˆ’ πœƒ βˆ— Posed as a SDP, solved using cutting-planes algorithm

22 Kernel Learning Given: π‘˜ 1 , π‘˜ 2 ,…, π‘˜ 𝑛 .
Goal: Find π‘˜ πœ† = 𝑖=1 𝑛 πœ† 𝑖 π‘˜ 𝑖 , πœ† 𝑖 β‰₯0 such that… πœ† min. bound on πœƒ βˆ’ πœƒ βˆ— mineig 𝐴 π‘˜ πœ† ⊀ 𝐴 π‘˜ πœ† =mineig 𝑖=1 𝑛 πœ† 𝑖 𝐴 π‘˜ 𝑖 ⊀ 𝐴 π‘˜ 𝑖 CONVEX! 𝑅 π‘˜ πœ† 2 = 𝑖=1 𝑛 πœ† 𝑖 2 𝑅 π‘˜ 𝑖 CONVEX! πœ† min. empirical average of πœƒ βˆ’ πœƒ βˆ— Posed as a SDP, solved using cutting-planes algorithm

23 Kernel Learning Given: π‘˜ 1 , π‘˜ 2 ,…, π‘˜ 𝑛 .
Goal: Find π‘˜ πœ† = 𝑖=1 𝑛 πœ† 𝑖 π‘˜ 𝑖 , πœ† 𝑖 β‰₯0 such that… πœ† min. bound on πœƒ βˆ’ πœƒ βˆ— mineig 𝐴 π‘˜ πœ† ⊀ 𝐴 π‘˜ πœ† =mineig 𝑖=1 𝑛 πœ† 𝑖 𝐴 π‘˜ 𝑖 ⊀ 𝐴 π‘˜ 𝑖 CONVEX! 𝑅 π‘˜ πœ† 2 = 𝑖=1 𝑛 πœ† 𝑖 2 𝑅 π‘˜ 𝑖 CONVEX! πœ† min. empirical average of πœƒ βˆ’ πœƒ βˆ— Posed as a SDP, solved using cutting-planes algorithm

24 Kernel Learning Given: π‘˜ 1 , π‘˜ 2 ,…, π‘˜ 𝑛 .
Goal: Find π‘˜ πœ† = 𝑖=1 𝑛 πœ† 𝑖 π‘˜ 𝑖 , πœ† 𝑖 β‰₯0 such that… πœ† min. bound on πœƒ βˆ’ πœƒ βˆ— mineig 𝐴 π‘˜ πœ† ⊀ 𝐴 π‘˜ πœ† =mineig 𝑖=1 𝑛 πœ† 𝑖 𝐴 π‘˜ 𝑖 ⊀ 𝐴 π‘˜ 𝑖 CONVEX! 𝑅 π‘˜ πœ† 2 = 𝑖=1 𝑛 πœ† 𝑖 2 𝑅 π‘˜ 𝑖 CONVEX! πœ† min. empirical average of πœƒ βˆ’ πœƒ βˆ— Posed as a SDP, solved using cutting-planes algorithm

25 Kernel Learning Given: π‘˜ 1 , π‘˜ 2 ,…, π‘˜ 𝑛 .
Goal: Find π‘˜ πœ† = 𝑖=1 𝑛 πœ† 𝑖 π‘˜ 𝑖 , πœ† 𝑖 β‰₯0 such that… πœ† min. bound on πœƒ βˆ’ πœƒ βˆ— mineig 𝐴 π‘˜ πœ† ⊀ 𝐴 π‘˜ πœ† =mineig 𝑖=1 𝑛 πœ† 𝑖 𝐴 π‘˜ 𝑖 ⊀ 𝐴 π‘˜ 𝑖 CONVEX! 𝑅 π‘˜ πœ† 2 = 𝑖=1 𝑛 πœ† 𝑖 2 𝑅 π‘˜ 𝑖 CONVEX! πœ† min. empirical average of πœƒ βˆ’ πœƒ βˆ— Posed as a SDP, solved using cutting-planes algorithm

26 Kernel Learning Given: π‘˜ 1 , π‘˜ 2 ,…, π‘˜ 𝑛 .
Goal: Find π‘˜ πœ† = 𝑖=1 𝑛 πœ† 𝑖 π‘˜ 𝑖 , πœ† 𝑖 β‰₯0 such that… πœ† min. bound on πœƒ βˆ’ πœƒ βˆ— mineig 𝐴 π‘˜ πœ† ⊀ 𝐴 π‘˜ πœ† =mineig 𝑖=1 𝑛 πœ† 𝑖 𝐴 π‘˜ 𝑖 ⊀ 𝐴 π‘˜ 𝑖 CONVEX! 𝑅 π‘˜ πœ† 2 = 𝑖=1 𝑛 πœ† 𝑖 2 𝑅 π‘˜ 𝑖 CONVEX! πœ† min. empirical average of πœƒ βˆ’ πœƒ βˆ— Posed as a SDP, solved using cutting-planes algorithm

27 Kernel Learning Given: π‘˜ 1 , π‘˜ 2 ,…, π‘˜ 𝑛 .
Goal: Find π‘˜ πœ† = 𝑖=1 𝑛 πœ† 𝑖 π‘˜ 𝑖 , πœ† 𝑖 β‰₯0 such that… πœ† min. bound on πœƒ βˆ’ πœƒ βˆ— mineig 𝐴 π‘˜ πœ† ⊀ 𝐴 π‘˜ πœ† =mineig 𝑖=1 𝑛 πœ† 𝑖 𝐴 π‘˜ 𝑖 ⊀ 𝐴 π‘˜ 𝑖 CONVEX! 𝑅 π‘˜ πœ† 2 = 𝑖=1 𝑛 πœ† 𝑖 2 𝑅 π‘˜ 𝑖 CONVEX! πœ† min. empirical average of πœƒ βˆ’ πœƒ βˆ— Posed as a SDP, solved using cutting-planes algorithm

28 Kernel Learning Given: π‘˜ 1 , π‘˜ 2 ,…, π‘˜ 𝑛 .
Goal: Find π‘˜ πœ† = 𝑖=1 𝑛 πœ† 𝑖 π‘˜ 𝑖 , πœ† 𝑖 β‰₯0 such that… πœ† min. bound on πœƒ βˆ’ πœƒ βˆ— mineig 𝐴 π‘˜ πœ† ⊀ 𝐴 π‘˜ πœ† =mineig 𝑖=1 𝑛 πœ† 𝑖 𝐴 π‘˜ 𝑖 ⊀ 𝐴 π‘˜ 𝑖 CONVEX! 𝑅 π‘˜ πœ† 2 = 𝑖=1 𝑛 πœ† 𝑖 2 𝑅 π‘˜ 𝑖 CONVEX! πœ† min. empirical average of πœƒ βˆ’ πœƒ βˆ— Posed as a SDP, solved using cutting-planes algorithm Please refer ICML’14 for details

29 Simulation results Estimation Error
Varying Negative Class Proportions in U (proportion in L is set to [0.5, 0.5])

30 Representation learning improves generalization over expert-crafted ones
01 Learning theory and optimization both need to be leveraged 02 Interplay between kernel and deep learning needs investigation 03 Concluding remarks…

31 Summary of Research

32 Rule Ensemble Learning
Kernel learning π‘˜ 1 1 π‘˜ 2 1 π‘˜ 3 1 π‘˜ π‘˜ 1 2 π‘˜ 2 2 π‘˜ 3 2 π‘˜ 1 3 π‘˜ 2 3 π‘˜ 3 3 Multi-modal Data NIPS’09, JMLR’11 DATA π‘˜ Multi-task learning SDM’11, ICML’12 INFERENCE Rule Ensemble Learning ICML’11, JMLR’15

33 Rule Ensemble Learning
Kernel learning π‘˜ 1 1 π‘˜ 2 1 π‘˜ 3 1 π‘˜ π‘˜ 1 2 π‘˜ 2 2 π‘˜ 3 2 π‘˜ 1 3 π‘˜ 2 3 π‘˜ 3 3 Multi-modal Data NIPS’09, JMLR’11 INFERENCE DATA π‘˜ Multi-task learning SDM’11, ICML’12 Rule Ensemble Learning ICML’11, JMLR’15

34 Rule Ensemble Learning
Kernel learning π‘˜ 1 1 π‘˜ 2 1 π‘˜ 3 1 π‘˜ π‘˜ 1 2 π‘˜ 2 2 π‘˜ 3 2 π‘˜ 1 3 π‘˜ 2 3 π‘˜ 3 3 Multi-modal Data NIPS’09, JMLR’11 DATA π‘˜ Multi-task learning SDM’11, ICML’12 INFERENCE Rule Ensemble Learning ICML’11, JMLR’15

35 Rule Ensemble Learning
Kernel learning π‘˜ 1 1 π‘˜ 2 1 π‘˜ 3 1 π‘˜ π‘˜ 1 2 π‘˜ 2 2 π‘˜ 3 2 π‘˜ 1 3 π‘˜ 2 3 π‘˜ 3 3 Multi-modal Data NIPS’09, JMLR’11 INFERENCE DATA π‘˜ Multi-task learning SDM’11, ICML’12 Rule Ensemble Learning ICML’11, JMLR’15


Download ppt "Learning Representations of Data"

Similar presentations


Ads by Google