Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings.

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings

Announcements Kaggle Miniproject is closed – Report due Thursday Public Leaderboard – How well you think you did Private Leaderboard now viewable – How well you actually did Lecture 14: Embeddings2

Last Week Dimensionality Reduction Clustering Latent Factor Models – Learn low-dimensional representation of data Lecture 14: Embeddings4

This Lecture Embeddings – Alternative form of dimensionality reduction Locally Linear Embeddings Markov Embeddings Lecture 14: Embeddings5

Embedding Learn a representation U – Each column u corresponds to data point Semantics encoded via d(u,u’) – Distance between points Lecture 14: Embeddings6 http://www.sciencemag.org/content/290/5500/2323.full.pdf

Locally Linear Embedding Given: Learn U such that local linearity is preserved – Lower dimensional than x – “Manifold Learning” Lecture 14: Embeddings7 https://www.cs.nyu.edu/~roweis/lle/ Unsupervised Learning Any neighborhood looks like a linear plane x’su’s

Locally Linear Embedding Create B(i) – B nearest neighbors of x i – Assumption: B(i) is approximately linear – x i can be written as a convex combination of x j in B(i) Lecture 14: Embeddings8 https://www.cs.nyu.edu/~roweis/lle/ xixi B(i)

Locally Linear Embedding Lecture 14: Embeddings9 https://www.cs.nyu.edu/~roweis/lle/ Given Neighbors B(i), solve local linear approximation W:

Locally Linear Embedding Every x i is approximated as a convex combination of neighbors – How to solve? Lecture 14: Embeddings10 Given Neighbors B(i), solve local linear approximation W:

Lagrange Multipliers Solutions tend to be at corners! 11 http://en.wikipedia.org/wiki/Lagrange_multiplier

Solving Locally Linear Approximation Lecture 14: Embeddings12 Lagrangian:

Locally Linear Approximation Invariant to: – Rotation – Scaling – Translation Lecture 14: Embeddings13

Story So Far: Locally Linear Embeddings Locally Linear Approximation Lecture 14: Embeddings14 https://www.cs.nyu.edu/~roweis/lle/ Given Neighbors B(i), solve local linear approximation W: Solution via Lagrange Multipliers:

Recall: Locally Linear Embedding Given: Learn U such that local linearity is preserved – Lower dimensional than x – “Manifold Learning” Lecture 14: Embeddings15 https://www.cs.nyu.edu/~roweis/lle/ x’su’s

Dimensionality Reduction Find low dimensional U – Preserves approximate local linearity Lecture 14: Embeddings16 https://www.cs.nyu.edu/~roweis/lle/ Given local approximation W, learn lower dimensional representation: x’su’s Neighborhood represented by W i,*

Rewrite as: Lecture 14: Embeddings17 Symmetric positive semidefinite https://www.cs.nyu.edu/~roweis/lle/ Given local approximation W, learn lower dimensional representation:

Suppose K=1 By min-max theorem – u = principal eigenvector of M + Lecture 14: Embeddings18 http://en.wikipedia.org/wiki/Min-max_theorem pseudoinverse Given local approximation W, learn lower dimensional representation:

Recap: Principal Component Analysis Each column of V is an Eigenvector Each λ is an Eigenvalue (λ 1 ≥ λ 2 ≥ …) Lecture 14: Embeddings19

K=1: – u = principal eigenvector of M + – u = smallest non-trivial eigenvector of M Corresponds to smallest non-zero eigenvalue General K – U = top K principal eigenvectors of M + – U = bottom K non-trivial eigenvectors of M Corresponds to bottom K non-zero eigenvalues Lecture 14: Embeddings20 http://en.wikipedia.org/wiki/Min-max_theorem https://www.cs.nyu.edu/~roweis/lle/ Given local approximation W, learn lower dimensional representation:

Recap: Locally Linear Embedding Generate nearest neighbors of each x i, B(i) Compute Local Linear Approximation: Compute low dimensional embedding Lecture 14: Embeddings21

Results for Different Neighborhoods Lecture 14: Embeddings22 https://www.cs.nyu.edu/~roweis/lle/gallery.html B=3 B=6B=9B=12 True Distribution 2000 Samples

Embeddings vs Latent Factor Models Both define low-dimensional representation Embeddings preserve distance: Latent Factor preserve inner product: Relationship: Lecture 14: Embeddings23

Visualization Semantics Lecture 14: Embeddings24 Latent Factor Model Similarity measured via dot product Rotational semantics Can interpret axes Can only visualize 2 axes at a time Embedding Similarity measured via distance Clustering/locality semantics Cannot interpret axes Can visualize many clusters simultaneously

Latent Markov Embeddings Lecture 14: Embeddings25

Latent Markov Embeddings Locally Linear Embedding is conventional unsupervised learning – Given raw features x i – I.e., find low-dimensional U that preserves approximate local linearity Latent Markov Embedding is a feature learning problem – E.g., learn low-dimensional U that captures user-generated feedback Lecture 14: Embeddings26

Playlist Embedding Users generate song playlists – Treat as training data Can we learn a probabilistic model of playlists? Lecture 14: Embeddings27

Probabilistic Markov Modeling Training set: Goal: Learn a probabilistic Markov model of playlists: What is the form of P? Lecture 14: Embeddings28 http://www.cs.cornell.edu/People/tj/publications/chen_etal_12a.pdf SongsPlaylistsPlaylist Definition

First Try: Probability Tables Lecture 14: Embeddings29 P(s|s’ ) s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 s7s7 s start s1s1 0.010.030.010.110.04 0.010.05 s2s2 0.030.010.040.030.020.010.02 s3s3 0.01 0.070.02 0.050.09 s4s4 0.020.110.070.010.070.040.01 s5s5 0.040.010.020.170.01 0.100.02 s6s6 0.010.020.030.01 0.08 s7s7 0.070.020.01 0.030.090.030.01 … … #Parameters = O(|S| 2 ) !!!

Second Try: Hidden Markov Models #Parameters = O(K 2 ) #Parameters = O(|S|K) Total = O(K 2 ) + O(|S|K) Lecture 14: Embeddings30

Problem with Hidden Markov Models Need to reliably estimate P(s|z) Lots of “missing values” in this training set Lecture 14: Embeddings31

Latent Markov Embedding “Log-Radial” function – (my own terminology) Lecture 14: Embeddings32 http://www.cs.cornell.edu/People/tj/publications/chen_etal_12a.pdf u s : entry point of song s v s : exit point of song s

Log-Radial Functions Lecture 14: Embeddings33 v s’ Each ring defines an equivalence class of transition probabilities usus u s” 2K parameters per song 2|S|K parameters total

Learning Problem Learning Goal: Lecture 14: Embeddings34 http://www.cs.cornell.edu/People/tj/publications/chen_etal_12a.pdf SongsPlaylistsPlaylist Definition

Minimize Neg Log Likelihood Solve using gradient descent – Homework question: derive the gradient formula – Random initialization Normalization constant hard to compute: – Approximation heuristics See paper Lecture 14: Embeddings35 http://www.cs.cornell.edu/People/tj/publications/chen_etal_12a.pdf

Simpler Version Dual point model: Single point model: – Transitions are symmetric (almost) – Exact same form of training problem Lecture 14: Embeddings36

Visualization in 2D Lecture 14: Embeddings37 http://www.cs.cornell.edu/People/tj/publications/chen_etal_12a.pdf Simpler version: Single Point Model Single point model is easier to visualize

Sampling New Playlists Given partial playlist: Generate next song for playlist p j+1 – Sample according to: Lecture 14: Embeddings38 http://www.cs.cornell.edu/People/tj/publications/chen_etal_12a.pdf Dual Point ModelSingle Point Model

Demo Lecture 14: Embeddings39 http://jimi.ithaca.edu/~dturnbull/research/lme/lmeDemo.html

What About New Songs? Suppose we’ve trained U: What if we add a new song s’? – No playlists created by users yet… – Only options: u s’ = 0 or u s’ = random Both are terrible! Lecture 14: Embeddings40

Song & Tag Embedding Songs are usually added with tags – E.g., indie rock, country – Treat as features or attributes of songs How to leverage tags to generate a reasonable embedding of new songs? – Learn an embedding of tags as well! Lecture 14: Embeddings41 http://www.cs.cornell.edu/People/tj/publications/moore_etal_12a.pdf

Lecture 14: Embeddings42 SongsPlaylistsPlaylist Definition Tags for Each Song http://www.cs.cornell.edu/People/tj/publications/moore_etal_12a.pdf Learning Objective: Same term as before: Song embedding ≈ average of tag embeddings: Solve using gradient descent:

Visualization in 2D Lecture 14: Embeddings43 http://www.cs.cornell.edu/People/tj/publications/moore_etal_12a.pdf

Revisited: What About New Songs? No user has yet s’ added to playlist – So no evidence from playlist training data: Assume new song has been tagged T s’ – The u s’ = average of A t for tags t in T s’ – Implication from objective: Lecture 14: Embeddings44 s’ does not appear in

Recap: Embeddings Learn a low-dimensional representation of items U Capture semantics using distance between items u, u’ Can be easier to visualize than latent factor models Lecture 14: Embeddings45

Next Lecture Recent Applications of Latent Factor Models Low-rank Spatial Model for Basketball Play Prediction Low-rank Tensor Model for Collaborative Clustering Miniproject 1 report due Thursday. Lecture 14: Embeddings46

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings.

Similar presentations

Presentation on theme: "Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings.

Similar presentations

Presentation on theme: "Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings."— Presentation transcript:

Similar presentations

About project

Feedback