Learning Representations of Data

Learning Representations of Data
J. Saketha Nath, IIT Bombay Collaborators: Pratik Jawanpuria, Arun Iyer, Sunita Sarawagi, Ganesh Ramakrishnan.

Outline Introduction to Representation Learning Summary of Research
Case Study: Class-ratio estimation Concluding remarks

Introduction to Representation Learning

Representation Learning: Illustration
Training Inference 𝑥 1 𝑦 1 𝑥 1 ′ ? 𝑥 2 𝑦 2 𝑥 2 ′ ? 𝑥 𝑚 𝑦 𝑚 𝑥 𝑚 ′ ?

Training Inference 𝜙 𝑥 1 𝜙 𝑥 1 ′ 𝑥 1 𝑦 1 𝑥 1 ′ ? 𝜙 𝑥 2 𝜙 𝑥 2 ′ 𝑥 2 𝑦 2 𝑥 2 ′ ? 𝜙 𝑥 𝑚 𝜙 𝑥 𝑚 ′ 𝑥 𝑚 𝑦 𝑚 𝑥 𝑚 ′ ?

Representation Learning: Examples
Training Inference 𝜙 𝑥 1 𝜙 𝑥 1 ′ 𝑥 1 𝑦 1 𝑥 1 ′ ? 𝜙 𝑥 2 𝜙 𝑥 2 ′ 𝑥 2 𝑦 2 𝑥 2 ′ ? Principle Component Analysis Deep Learning (long list :) 𝜙 𝑥 𝑚 𝜙 𝑥 𝑚 ′ 𝑥 𝑚 𝑦 𝑚 𝑥 𝑚 ′ ?

Training Inference 𝜙 𝑘 𝑥 1 𝜙 𝑘 𝑥 1 ′ 𝑥 1 𝑦 1 𝑥 1 ′ ? 𝜙 𝑘 𝑥 2 𝜙 𝑘 𝑥 2 ′ 𝑦 2 ? 𝑥 2 𝑥 2 ′ 𝑘 11 𝑘 12 … 𝑘 21 𝑘 22 𝑘 11 ′ 𝑘 12 ′ … 𝑘 21 ′ 𝑘 22 ′ 𝑘 𝑖𝑗 =𝑘 𝑥 𝑖 , 𝑥 𝑗 = 𝜙 𝑘 𝑥 𝑖 , 𝜙 𝑘 𝑥 𝑗 𝜙 𝑘 𝑥 𝑚 ′ 𝜙 𝑘 𝑥 𝑚 𝑥 𝑚 𝑦 𝑚 𝑥 𝑚 ′ ?

Kernel Learning: Illustration
Training Inference 𝜙 𝑘 𝑥 1 𝜙 𝑘 𝑥 1 ′ 𝑥 1 𝑦 1 𝑥 1 ′ ? 𝜙 𝑘 𝑥 2 𝜙 𝑘 𝑥 2 ′ 𝑦 2 ? 𝑥 2 𝑥 2 ′ 𝑘 11 𝑘 12 … 𝑘 21 𝑘 22 𝑘 11 ′ 𝑘 12 ′ … 𝑘 21 ′ 𝑘 22 ′ 𝑘 𝑖𝑗 =𝑘 𝑥 𝑖 , 𝑥 𝑗 = 𝜙 𝑘 𝑥 𝑖 , 𝜙 𝑘 𝑥 𝑗 𝜙 𝑘 𝑥 𝑚 ′ 𝜙 𝑘 𝑥 𝑚 𝑥 𝑚 𝑦 𝑚 𝑥 𝑚 ′ ?

Kernel Learning: Broad set-ups
Multi-modal Data [NIPS’09, JMLR’11] Multi-task Learning [SDM’11, ICML’12] Interpretable Rule Learning [ICML’11, JMLR’15]

Case Study:Class Ratio Estimation
Kernel Learning

Class Ratio Estimation
Labeled Unlabeled 𝑥 1 𝑦 1 𝑥 1 ′ ? 𝑥 2 𝑦 2 𝑥 2 ′ ? 𝑥 𝑚 𝑦 𝑚 𝑥 𝑚 ′ ?

Labeled Unlabeled 𝑥 1 𝑦 1 𝑥 1 ′ 𝑥 2 𝑦 2 𝑥 2 ′ ? What frac. from each class? 𝑥 𝑚 𝑦 𝑚 𝑥 𝑚 ′

𝑓 𝑋 𝑈 𝑥 = 𝑖=1 𝑐 𝑓 𝑌 𝑈 (𝑖) 𝑓 𝑋/𝑌 𝑈 (𝑥/𝑖)

𝑓 𝑋 𝑈 𝑥 = 𝑖=1 𝑐 𝑓 𝑌 𝑈 (𝑖) 𝑓 𝑋/𝑌 𝐿 (𝑥/𝑖) Assumption: 𝑓 𝑋/𝑌 𝐿 = 𝑓 𝑋/𝑌 𝑈

min 𝜃∈ Δ 𝑐 𝑓 𝑋 𝑈 − 𝑖=1 𝑐 𝜃 𝑖 𝑓 𝑋/𝑌=𝑖 𝐿 2

1 𝑚 𝑢 𝑖=1 𝑚 𝑢 𝜙 𝑘 ( 𝑥 𝑖 ′ ) 1 𝑚 𝑖 𝑗: 𝑦 𝑗 =𝑖 𝜙 𝑘 ( 𝑥 𝑗 ) min 𝜃∈ Δ 𝑐 𝑓 𝑋 𝑈 − 𝑖=1 𝑐 𝜃 𝑖 𝑓 𝑋/𝑌=𝑖 𝐿 2 Representation of data distribution using kernel

min 𝜃∈ Δ 𝑐 𝑚 𝑢 𝑖=1 𝑚 𝑢 𝜙 𝑘 ( 𝑥 𝑖 ′ ) − 𝑖=1 𝑐 𝜃 𝑖 1 𝑚 𝑖 𝑗: 𝑦 𝑗 =𝑖 𝜙 𝑘 ( 𝑥 𝑗 ) 𝐻 𝑘 2

min 𝜃∈ Δ 𝑐 𝑚 𝑢 𝑖=1 𝑚 𝑢 𝜙 𝑘 ( 𝑥 𝑖 ′ ) − 𝑖=1 𝑐 𝜃 𝑖 1 𝑚 𝑖 𝑗: 𝑦 𝑗 =𝑖 𝜙 𝑘 ( 𝑥 𝑗 ) 𝐻 𝑘 2 Kernel Learning: Which 𝑘 is best?

Statistical Consistency
Theorem: Let 𝜃 , 𝜃 ∗ be the estimated and true class ratios, let 𝐴 𝑘 be a matrix with 𝑖 𝑡ℎ column as 1 𝑚 𝑖 𝑗: 𝑦 𝑗 =𝑖 𝜙 𝑘 ( 𝑥 𝑗 ) − 1 𝑚 𝑐 𝑗: 𝑦 𝑗 =𝑐 𝜙 𝑘 ( 𝑥 𝑗 ) , and let 𝑅 𝑘 = max 𝑥∈𝒳 𝜙 𝑘 (𝑥) , then with probability 1−𝛿, we have: 𝜃 − 𝜃 ∗ 2 2 ≤ 𝑅 𝑘 2 𝑐 2 +1 𝑚 𝑢 + 𝑖=1 𝑐 2 𝑚 𝑖 1+ 𝑙𝑜𝑔 2 𝛿 2 𝑚𝑖𝑛𝑒𝑖𝑔( 𝐴 𝑘 𝑇 𝐴 𝑘 ) Please refer ICML’14, KDD’16 for details

Kernel Learning Given: 𝑘 1 , 𝑘 2 ,…, 𝑘 𝑛 .
Goal: Find 𝑘 𝜆 = 𝑖=1 𝑛 𝜆 𝑖 𝑘 𝑖 , 𝜆 𝑖 ≥0 such that 𝜆 min. bound on 𝜃 − 𝜃 ∗ mineig 𝐴 𝑘 𝜆 ⊤ 𝐴 𝑘 𝜆 =mineig 𝑖=1 𝑛 𝜆 𝑖 𝐴 𝑘 𝑖 ⊤ 𝐴 𝑘 𝑖 CONVEX! 𝑅 𝑘 𝜆 2 = 𝑖=1 𝑛 𝜆 𝑖 2 𝑅 𝑘 𝑖 CONVEX! 𝜆 min. empirical average of 𝜃 − 𝜃 ∗ Posed as a SDP, solved using cutting-planes algorithm

Goal: Find 𝑘 𝜆 = 𝑖=1 𝑛 𝜆 𝑖 𝑘 𝑖 , 𝜆 𝑖 ≥0 such that… 𝜆 min. bound on 𝜃 − 𝜃 ∗ mineig 𝐴 𝑘 𝜆 ⊤ 𝐴 𝑘 𝜆 =mineig 𝑖=1 𝑛 𝜆 𝑖 𝐴 𝑘 𝑖 ⊤ 𝐴 𝑘 𝑖 CONVEX! 𝑅 𝑘 𝜆 2 = 𝑖=1 𝑛 𝜆 𝑖 2 𝑅 𝑘 𝑖 CONVEX! 𝜆 min. empirical average of 𝜃 − 𝜃 ∗ Posed as a SDP, solved using cutting-planes algorithm

Goal: Find 𝑘 𝜆 = 𝑖=1 𝑛 𝜆 𝑖 𝑘 𝑖 , 𝜆 𝑖 ≥0 such that… 𝜆 min. bound on 𝜃 − 𝜃 ∗ mineig 𝐴 𝑘 𝜆 ⊤ 𝐴 𝑘 𝜆 =mineig 𝑖=1 𝑛 𝜆 𝑖 𝐴 𝑘 𝑖 ⊤ 𝐴 𝑘 𝑖 CONVEX! 𝑅 𝑘 𝜆 2 = 𝑖=1 𝑛 𝜆 𝑖 2 𝑅 𝑘 𝑖 CONVEX! 𝜆 min. empirical average of 𝜃 − 𝜃 ∗ Posed as a SDP, solved using cutting-planes algorithm Please refer ICML’14 for details

Simulation results Estimation Error
Varying Negative Class Proportions in U (proportion in L is set to [0.5, 0.5])

Representation learning improves generalization over expert-crafted ones
01 Learning theory and optimization both need to be leveraged 02 Interplay between kernel and deep learning needs investigation 03 Concluding remarks…

Summary of Research

Rule Ensemble Learning
Kernel learning 𝑘 1 1 𝑘 2 1 𝑘 3 1 𝑘 𝑘 1 2 𝑘 2 2 𝑘 3 2 𝑘 1 3 𝑘 2 3 𝑘 3 3 Multi-modal Data NIPS’09, JMLR’11 DATA 𝑘 Multi-task learning SDM’11, ICML’12 INFERENCE Rule Ensemble Learning ICML’11, JMLR’15

Kernel learning 𝑘 1 1 𝑘 2 1 𝑘 3 1 𝑘 𝑘 1 2 𝑘 2 2 𝑘 3 2 𝑘 1 3 𝑘 2 3 𝑘 3 3 Multi-modal Data NIPS’09, JMLR’11 INFERENCE DATA 𝑘 Multi-task learning SDM’11, ICML’12 Rule Ensemble Learning ICML’11, JMLR’15

Kernel learning 𝑘 1 1 𝑘 2 1 𝑘 3 1 𝑘 𝑘 1 2 𝑘 2 2 𝑘 3 2 𝑘 1 3 𝑘 2 3 𝑘 3 3 Multi-modal Data NIPS’09, JMLR’11 DATA 𝑘 Multi-task learning SDM’11, ICML’12 INFERENCE Rule Ensemble Learning ICML’11, JMLR’15

Kernel learning 𝑘 1 1 𝑘 2 1 𝑘 3 1 𝑘 𝑘 1 2 𝑘 2 2 𝑘 3 2 𝑘 1 3 𝑘 2 3 𝑘 3 3 Multi-modal Data NIPS’09, JMLR’11 INFERENCE DATA 𝑘 Multi-task learning SDM’11, ICML’12 Rule Ensemble Learning ICML’11, JMLR’15

Learning Representations of Data

Similar presentations

Presentation on theme: "Learning Representations of Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning Representations of Data

Similar presentations

Presentation on theme: "Learning Representations of Data"— Presentation transcript:

Similar presentations

About project

Feedback