Label Embedding Trees for Large Multi-class Tasks Samy Bengio Jason Weston David Grangier Presented by Zhengming Xing.

Slides:



Advertisements
Similar presentations
Is Random Model Better? -On its accuracy and efficiency-
Advertisements

Random Forest Predrag Radenković 3237/10
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008.
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Quad Trees By JJ Shepherd. Introduction So far we’ve only used binary trees to solve problems – Sort data – Search data – Confuse students Trees are not.
Windows Scheduling Problems for Broadcast System 1 Amotz Bar-Noy, and Richard E. Ladner Presented by Qiaosheng Shi.
Decision Tree Algorithm
1 On Constructing Efficient Shared Decision Trees for Multiple Packet Filters Author: Bo Zhang T. S. Eugene Ng Publisher: IEEE INFOCOM 2010 Presenter:
Ensemble Learning: An Introduction
Constructing Category Hierarchies for Visual Recognition Marcin Marszaklek and Cordelia Schmid.
Image Matching via Saliency Region Correspondences Alexander Toshev Jianbo Shi Kostas Daniilidis IEEE Conference on Computer Vision and Pattern Recognition.
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Distributed Representations of Sentences and Documents
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Chapter 5 Data mining : A Closer Look.
Label Tree in Large-Scale Mun Jonghwan. From small to large scale 2.
SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti.
1 A Graph-Theoretic Approach to Webpage Segmentation Deepayan Chakrabarti Ravi Kumar
Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所.
Scaling up Decision Trees. Decision tree learning.
MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Sharing Features Between Objects and Their Attributes Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin, 2 University of.
Taylor Rassmann.  Grouping data objects into X tree of clusters and uses distance matrices as clustering criteria  Two Hierarchical Clustering Categories:
1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Lecture Notes for Chapter 4 Introduction to Data Mining
Large Scale Distributed Distance Metric Learning by Pengtao Xie and Eric Xing PRESENTED BY: PRIYANKA.
Bootstrapped Optimistic Algorithm for Tree Construction
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM.
Musical Genre Categorization Using Support Vector Machines Shu Wang.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert Samy Bengio Yoshua Bengio Prepared : S.Y.C. Neural Information Processing Systems,
A REAL-TIME DEFORMABLE DETECTOR 謝汝欣 OUTLINE  Introduction  Related Work  Proposed Method  Experiments 2.
Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Chapter 15: Classification of Time- Embedded EEG Using Short-Time Principal Component Analysis by Nguyen Duc Thang 5/2009.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Ke (Kevin) Wu1,2, Philip Watters1, Malik Magdon-Ismail1
k-Nearest neighbors and decision tree
Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University
Linear regression project
Basic machine learning background with Python scikit-learn
Machine Learning Week 1.
Step-By-Step Instructions for Miniproject 2
Shape matching and object recognition using shape contexts
Word Embeddings with Limited Memory
Introduction to Boosting
Classification Boundaries
Concave Minimization for Support Vector Machine Classifiers
Presentation transcript:

Label Embedding Trees for Large Multi-class Tasks Samy Bengio Jason Weston David Grangier Presented by Zhengming Xing

Outline Introduction Label Trees Label Embeddings Experiment result

Introduction Main idea: propose a fast and memory saving multi-class classifier for large dataset based on trees structure method Large scale problem: the number of example Feature dimension Number of class

Introduction Label Tree: Label Predictors: Indexed nodes: Edges: Label sets: The root contain all classes, and each child label set is a subset of its parent K is the number of classes Disjoint tree: any two nodes at the same depth cannot share any labels.

Introduction Classifying an example:

Label Trees Tree loss I is the indicator function is the depth in the tree of the final prediction for x

Label tree Learning with fixed label tree: N,E,L chosen in advance Goal: minimize the tree loss over the variables F Given training data Relaxation 1 Relaxation 2 Replace indicator function with hinge loss and

Label tree Learning label tree structure for disjoint tree Treat A as the affinity matrix and apply the steps similar to spectral clustering Basic idea: group together labels into the same label set that are likely to be confused at test time.

Label embeddings is a k-dimensional vector with a 1 in the yth position and 0 otherwise Problem : how to learn W, V define solve

Label embeddings Method 1: The same two steps of algorithm 2 Learn V Learn W minimize

Label embedding Combine all the methods discussed above Method 2: join learn W and V minimize

Experiment Dataset

Experiment