Graphical Multi-Task Learning Dan Sheldon Cornell University NIPS SISO Workshop 12/12/2008.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
An Introduction of Support Vector Machine
Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
Learning for Structured Prediction Overview of the Material TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A.
Support Vector Machines
Pattern Recognition and Machine Learning: Kernel Methods.
Boosting Approach to ML
Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva CALD Masters Presentation 19 August 2002 Advisors: Alan Montgomery,
ImageNet Classification with Deep Convolutional Neural Networks
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
Computer vision: models, learning and inference
Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.
Discriminative and generative methods for bags of features
Lecture 14 – Neural Networks
On the Relationship between Visual Attributes and Convolutional Networks Paper ID - 52.
K nearest neighbor and Rocchio algorithm
1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell.
Graz University of Technology, AUSTRIA Institute for Computer Graphics and Vision Fast Visual Object Identification and Categorization Michael Grabner,
Instance Based Learning
Sample Selection Bias Lei Tang Feb. 20th, Classical ML vs. Reality  Training data and Test data share the same distribution (In classical Machine.
Recognition Of Textual Signs Final Project for “Probabilistic Graphics Models” Submitted by: Ezra Hoch, Golan Pundak, Yonatan Amit.
Spatial Pyramid Pooling in Deep Convolutional
CS Instance Based Learning1 Instance Based Learning.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Ensemble Learning (2), Tree and Forest
Radial Basis Function Networks
1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Data mining and machine learning A brief introduction.
Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Multi-Task Learning for HIV Therapy Screening Steffen Bickel, Jasmina Bogojeska, Thomas Lengauer, Tobias Scheffer.
Chapter 9 Neural Network.
3-1 Chapter 3 Aggregate Planning McGraw-Hill/Irwin Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved.
@delbrians Transfer Learning: Using the Data You Have, not the Data You Want. October, 2013 Brian d’Alessandro.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
H. Lexie Yang1, Dr. Melba M. Crawford2
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
NTU & MSRA Ming-Feng Tsai
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Using Asymmetric Distributions to Improve Text Classifier Probability Estimates Paul N. Bennett Computer Science Dept. Carnegie Mellon University SIGIR.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
School of Computer Science & Engineering
Instance Based Learning
Spring Courses CSCI 5922 – Probabilistic Models (Mozer) CSCI Mind Reading Machines (Sidney D’Mello) CSCI 7000 – Human Centered Machine Learning.
Paper Presentation: Shape and Matching
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Generalization ..
K Nearest Neighbor Classification
Learning with information of features
Deep Learning Hierarchical Representations for Image Steganalysis
CS Fall 2016 (Shavlik©), Lecture 2
Support Vector Machines
Chap 8. Instance Based Learning
On Convolutional Neural Network
Outline Background Motivation Proposed Model Experimental Results
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Human-object interaction
Rgh
Modeling IDS using hybrid intelligent systems
Visual Grounding.
Presentation transcript:

Graphical Multi-Task Learning Dan Sheldon Cornell University NIPS SISO Workshop 12/12/2008

Multi-Task Learning (MTL) Separate but related learning tasks --- solve them jointly to achieve better performance E.g., in document collection, learn classifiers to predict category, relevance to query 1, query 2, etc. Neural nets [Caruana 1997] Shared hidden layers Generative models / Hierarchical Bayes Shared hyper-parameters

Task Relationships Most previous work: pool of related tasks This work: leverage known structural information Graph structure on tasks Discriminative setting Regularized kernel methods

Motivating Application Predict presence/absence of Tree Swallow (migratory bird) at locations in NY. Observations: x i – date, time, location, habitat, etc. y i – saw a Tree Swallow? Significant change throughout the year How to model? Percent positive observations by month

Separate Tasks? Split training examples by month and train 12 separate models OK if lots of training data FebJan Mar Dec ….

Single Task? Use all training examples to learn a single classifier Include date as a feature to learn about month-to-month heterogeneity Jan, Feb, Mar, …, Dec

Symmetric MTL? FebJan Mar Dec …. Ignores known problem structure January is very weakly related to July

Graphical MTL Use a priori knowledge about structure of relationships, in the form of a graph. FebJan Mar Dec ….

Marketing in Social Network Alice Bob Alice Bob Symmetric Task Relationships. Prefer to leverage network structure! (known a priori)

Idea Use regularization to penalize differences between tasks that are directly connected Penalize by squared difference || f t – f t-1 || 2 f2f2 f1f1 f3f3 f 12 ….

Illustration Regularized learning: Trade off empirical risk vs. complexity. Penalize squared distance from origin.

Illustration Graphical MTL: Trade off empirical risk vs. task differences. Penalize sum of squared edge lengths. [Evgeniou, Micchelli and Pontil JMLR 2006]

Illustration Also add edges to origin. Task-specific regularization. Multi-Task regularization. Empirical Risk Note: translation invariant.

Related Work Multi-Task learning: lots! Caruana 1997, Baxter 2000, Ben-David and Schuller 2003, Ando and Zhang 2004 Multi-Task Kernels: Evgeniou, Michelli, Pontil 2006 General framework Focus on linear, symmetrical case (all experiments) Propose graph regularization, nonlinear kernels Task Networks: Kato, Kashima, Sugiyama, Asai, 2007 Second order cone programming

This Work Build on Evgeniou, Micchelli and Pontil Main contribution: Practical development of graphical multi-task kernels, focused on nonlinear case. Task-specific regularization New treatment of non-linear kernels Application

Technical Insights Key technical insight: Can reduce this problem to a single-task problem by learning one function f(x,t) and modifying the kernel: Base kernel: Multi-task kernel Task kernel Base kernel

Technical Insights Multi-task kernel: Construct task kernel K from graph Laplacian L. Base kernel:

Proof Sketch 1.Define task-specific function as function that supplies task ID:. 2.Claim:. Hence task-specific functions are comparable via inner products. (Relies on product kernel) 3.Claim: is a weighted sum of inner products between task-specific functions:. 4.Graph Laplacian gives the desired weights:

One more thing… Normalize task kernel to have unit diagonal Reason: Preserve scaling of K when choosing α All entries in [0,1]

Results Bird prediction task > 5% improvement Details: SVM with RBF kernels G = cycle Grid search for C and γ α = 2 -8 (robust to many choices) AUC Pooled Separate Multitask

Sensitivity to C and gamma Pooledα = α = 2 -6

Extensions Learn edge weights: detect periods of stability vs. change. Applications: Social networks Bird problem: Spatial regions. Many species. Faster training using graph structure. Percent positive observations by month

Thanks!