Estimating variable structure and dependence in multi-task learning via gradients By: Justin Guinney, Qiang Wu and Sayan Mukherjee Presented by: John Paisley.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Numerical Solution of Nonlinear Equations

Copula Regression By Rahul A. Parsa Drake University &

Linear Regression.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Pattern Recognition and Machine Learning

Support vector machine

Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.

Lecture 13 – Perceptrons Machine Learning March 16, 2010.

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

Pattern Recognition and Machine Learning

1-norm Support Vector Machines Good for Feature Selection  Solve the quadratic program for some : min s. t.,, denotes where or membership. Equivalent.

Lecture Notes for CMPUT 466/551 Nilanjan Ray

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.

Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!

An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.

Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.

CS Pattern Recognition Review of Prerequisites in Math and Statistics Prepared by Li Yang Based on Appendix chapters of Pattern Recognition, 4.

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Lecture II-2: Probability Review

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Outline Separating Hyperplanes – Separable Case

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.

BrainStorming 樊艳波 Outline Several papers on icml15 & cvpr15 PALM Information Theory Learning.

The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,

Andrew Ng Linear regression with one variable Model representation Machine Learning.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept

Basis Expansions and Regularization Part II. Outline Review of Splines Wavelet Smoothing Reproducing Kernel Hilbert Spaces.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

Measure Independence in Kernel Space Presented by: Qiang Lou.

Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.

Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.

Characterizing the Function Space for Bayesian Kernel Models Natesh S. Pillai, Qiang Wu, Feng Liang Sayan Mukherjee and Robert L. Wolpert JMLR 2007 Presented.

CpSc 881: Machine Learning

Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005.

Ridge Regression: Biased Estimation for Nonorthogonal Problems by A.E. Hoerl and R.W. Kennard Regression Shrinkage and Selection via the Lasso by Robert.

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,

Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.

Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.

Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.

Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.

Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.

Support vector machines

Linli Xu Martha White Dale Schuurmans University of Alberta

Background on Classification

Boosting and Additive Trees (2)

10701 / Machine Learning.

The Elements of Statistical Learning

Ying shen Sse, tongji university Sep. 2016

Identification of low-order output-error models

Linear regression Fitting a straight line to observations.

Support vector machines

6.5 Taylor Series Linearization

Support vector machines

30 m 2000 m 30 m 2000 m. 30 m 2000 m 30 m 2000 m.

Support vector machines

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Lecture 16. Classification (II): Practical Considerations

Presentation transcript:

Estimating variable structure and dependence in multi-task learning via gradients By: Justin Guinney, Qiang Wu and Sayan Mukherjee Presented by: John Paisley

Outline Outline of presentation General problem Review of single-task solution Extension to multi-task Experiments

General problem Have small number of high dimensional data, x, with corresponding response variable, y (fully supervised) Want to simultaneously build a classification or regression function and learn important features, as well as correlation between features (to know if two features are important in the same way) Xuejun presented their single-task solution. This paper extends this to the multi-task setting.

Single-Task Solution (classification) By Taylor expansion, estimate the classification function as Seek to minimize expected error Where is a weight function and ? And phi is a convex loss function. To solve this, regularize in RKHS

Single-Task (regression) Use the response variable for each input and only learn the gradient.

Single-Task (solution and value of interest) By representer theorem, this has solution of the form The gradient outer product (GOP) is the matrix with all feature information. This is approximated as This paper Xuejun’s paper Matlab

GOP This matrix is central to their paper because it tells all the information about the importance of each feature. The diagonal can be used to rank each feature’s importance and the off diagonal tells how features are correlated (therefore if two features are important in the same way, only one need be selected). My confusion: I take this to mean that which would resolve previous page However, constructing a discrete Gaussian kernel in Matlab, this isn’t true (and makes no sense to me why it should be true).

Extension to multi-task Very logical extension. They assume a base function and have a task-specific correction. Classification RKHS regularization Regression RKHS regularization

Experiments: