Michael Kuhn Distributed Computing Group (DISCO) ETH Zurich The MusicExplorer Project: Mapping the World of Music.

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

Aggregating local image descriptors into compact codes
Information retrieval – LSI, pLSI and LDA
Collaborative Filtering Sue Yeon Syn September 21, 2005.
Dimensionality Reduction PCA -- SVD
A Music Search Engine Built upon Audio-based and Web-based Similarity Measures P. Knees, T., Pohle, M. Schedl, G. Widmer SIGIR 2007.
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
ETH Zurich – Distributed Computing Group Michael Kuhn 1ETH Zurich – Distributed Computing Group Social Audio Features An Intuitive Guide to the Music Galaxy.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings.
Distributed Computing Group From Web to Map: Exploring the World of Music Olga Goussevskaia Michael Kuhn Michael Lorenzi Roger Wattenhofer Web Intelligence.
“Random Projections on Smooth Manifolds” -A short summary
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Dimensionality Reduction and Embeddings
LLE and ISOMAP Analysis of Robot Images Rong Xu. Background Intuition of Dimensionality Reduction Linear Approach –PCA(Principal Component Analysis) Nonlinear.
Dimensionality Reduction
1 Efficient Clustering of High-Dimensional Data Sets Andrew McCallum WhizBang! Labs & CMU Kamal Nigam WhizBang! Labs Lyle Ungar UPenn.
ETH Zurich – Distributed Computing Group Samuel Welten 1ETH Zurich – Distributed Computing Group Michael Kuhn Roger Wattenhofer Samuel Welten TexPoint.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Distributed Computing Group Exploring Music Collections on Mobile Devices Michael Kuhn Olga Goussevskaia Roger Wattenhofer MobileHCI 2008 Amsterdam, NL.
Distributed Computing Group Visually and Acoustically Exploring the High-Dimensional Space of Music Lukas Bossard Michael Kuhn Roger Wattenhofer SocialCom.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Understanding and Organizing User Generated Data Methods and Applications.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Probabilistic Latent Semantic Analysis
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Dimensionality Reduction
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Normal Estimation in Point Clouds 2D/3D Shape Manipulation, 3D Printing March 13, 2013 Slides from Olga Sorkine.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Presented By Wanchen Lu 2/25/2013
Latent Semantic Analysis Hongning Wang VS model in practice Document and query are represented by term vectors – Terms are not necessarily orthogonal.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
Access Control Via Face Recognition Progress Review.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
1 Learning from Shadows Dimensionality Reduction and its Application in Artificial Intelligence, Signal Processing and Robotics Ali Ghodsi Department of.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Manifold learning: MDS and Isomap
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory.
Information Retrieval CSE 8337 Spring 2005 Modeling (Part II) Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.
CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Social Audio Features for Advanced Music Retrieval Interfaces
Understanding and Organizing User Generated Data
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
NonLinear Dimensionality Reduction or Unfolding Manifolds
CSE572: Data Mining by H. Liu
Presentation transcript:

Michael Kuhn Distributed Computing Group (DISCO) ETH Zurich The MusicExplorer Project: Mapping the World of Music

„Today, I woud like to listen to something cheerful.“ „Something like Lenny Kravitz would be great.“ „Who can help me to discover my collection?“

„In my shelf AC/DC is next to the ZZ Top...“

„play random songs that match my mood“

What‘s the Talk about? basic idea: map of music constructing the map (MDS, PLSA) using the map (Youtube, Android)

Map of Music: What is it good for?

Similar songs are close to each other Similar songs are close to each other Quickly find nearest neighbors Quickly find nearest neighbors Span (and play) volumes Span (and play) volumes Create smooth playlists by interpolation Create smooth playlists by interpolation Visualize a collection Visualize a collection Low memory footprint Low memory footprint – Well suited for mobile domain Advantages of a Map convenient basis to build music software

How to Construct a Map of Music?

Similar or different???

Music Similarity Audio Analysis Usage Data

From Usage Data to Similarity Collaborative Filtering Folksonomies (Tags)

Collaborative Filtering and MDS Method 1:

Basic Idea d = ? item-to-item collaborative filtering 1 1

Item-to-Item Collaborative Filtering People Who Listen To This Song also Listen to... [Linden et al., 2003]

Users who listen to A also listen to B Users who listen to A also listen to B – Top-50 listened songs per user – Normalization (cosine similarity) Pairwise Similarity #common users (co-occurrences) (co-occurrences) Occurrences of song A Occurrences of song B

Graph Embedding EEBB DDCC AA BB AA EE DD CC

Well-known for dimensionality reduction Well-known for dimensionality reduction – first described by Young and Householder, 1938 Principal Component Analysis (PCA): Principal Component Analysis (PCA): – Project on hyperplane that maximizes variance. – Computed by solving an eigenvalue problem. Basic idea of MDS: Basic idea of MDS: – Assume that the exact positions y 1,...,y N in a high-dimensional space are given. – It can be shown that knowing only the distances d(y i, y j ) between points we can calculate the same result as applying PCA to y 1,...,y N. Problem: Complexity O(n 2 log n) Problem: Complexity O(n 2 log n) Classical Multidimensional Scaling (MDS)

Select k landmarks and embed them using MDS Select k landmarks and embed them using MDS For the remaining points: For the remaining points: – Place according to distances from landmarks Complexity: O(k n log n) Complexity: O(k n log n) Landmark MDS [de Silva and Tenenbaum, 1999]

Assumption: some links erroneously shortcut certain paths Assumption: some links erroneously shortcut certain paths Idea: Use embedding as estimator for distance Idea: Use embedding as estimator for distance – Shortcut edges get stretched – Remove edges with worst stretch and re-embed Example: Kleinberg graph (20x20 grid with random edges) Example: Kleinberg graph (20x20 grid with random edges) Iterative Embedding Original embedding (spring embedder) After 6 rounds After 12 rounds After 30 rounds

Evaluation: Dimensionality 10 Dimensions give a reasonable quality Example Neighborhoods in 10D Space

Social Tags and PLSA Method 2:

Similarity from Social Tagging

Probabilistic Latent Semantic Analysis (PLSA) documents latent semantic classes wordssongs latent music style classes tags [Hofmann, 1999]

PLSA: Interpretation as Space can be seen as a vector that defines a point in space [Hofmann, 1999] K small: Dimensionality reduction

PLSA Space latent class 2 latent class 3 latent class 1 Probabilities sum to 1: K-1 dimensional hyperplane Probabilities sum to 1: K-1 dimensional hyperplane Similar documents (songs) are close to each other music space: 32 dimensions

Advantages of LMDS: Advantages of LMDS: – Same accurracy at lower dimensionality (10 vs. 32) Advantages of PLSA: Advantages of PLSA: – Natural meaning of tags – Assignment of tags to songs (probabilistic) LMDS vs. PLSA Space Current sizes (approx.): LMDS: 600K tracks PLSA: 1.1M tracks Current sizes (approx.): LMDS: 600K tracks PLSA: 1.1M tracks

Using the Map

Visualization? high-dimensional!high-dimensional!

Identify relevant tags Identify relevant tags Find centroids of these tags in 10D Find centroids of these tags in 10D Apply Principal Component Analysis (PCA) to these centroids Apply Principal Component Analysis (PCA) to these centroids From 10D to 2D

What people have chosen during the researcher‘s night in Zurich

YouJuke – The YouTube Jukebox

YouTube as media source YouTube as media source Music map to create smart playlist

Reaching YouJuke apps.facebook.com/youjukeapps.facebook.com/youjuke

μseek „play random songs that match my mood“ „In my shelf John Lennon is next to the Beatles...“ μseek: The map of music on Android μseek: The map of music on Android

An Intelligent iPod-Shuffle

Realization After only few skips, we know pretty well which songs match the user‘s mood After only few skips, we know pretty well which songs match the user‘s mood

Work in Progress: Who is Dancing? AC/DCAC/DC BeatlesBeatles ProdigyProdigy

„In my shelf AC/DC is next to the ZZ Top...“ Browsing Covers

Video

Internet connection only required at first startup!

Thanks to: Thanks to: – Lukas Bossard – Mihai Calin – Olga Goussevskaia – Michael Lorenzi – Roger Wattenhofer – Samuel Welten URLs: URLs: – – – apps.facebook.com/youjuke – (Michael Kuhn) Questions?