Robust Optimization and Applications in Machine Learning.

Slides:

Advertisements

Similar presentations

Component Analysis (Review)

Advertisements

Matrix Factorization with Unknown Noise

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.

The Stability of a Good Clustering Marina Meila University of Washington

1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.

A Casual Chat on Convex Optimization in Machine Learning Data Mining at Iowa Group Qihang Lin 02/09/2014.

Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at

Continuous optimization Problems and successes

Computer vision: models, learning and inference

Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.

Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.

Principal Component Analysis

Statistical Estimation of High Dimensional Covariance Matrices – a sampling from Prof. Hero’s research group Ted Tsiligkaridis SPEECS Friday, Sept. 9,

Efficient Sparse Coding Algorithms

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap

An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.

Direct Convex Relaxations of Sparse SVM Antoni B. Chan, Nuno Vasconcelos, and Gert R. G. Lanckriet The 24th International Conference on Machine Learning.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Principle of Locality for Statistical Shape Analysis Paul Yushkevich.

Independent Component Analysis (ICA) and Factor Analysis (FA)

Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.

3D Geometry for Computer Graphics

Cleaver – Classification of Expression Array Version 1.0 Hongli Li Spring Computational Biology Computer Science Department UMASS Lowell.

Optimal Placement and Selection of Camera Network Nodes for Target Localization A. O. Ercan, D. B. Yang, A. El Gamal and L. J. Guibas Stanford University.

Object Orie’d Data Analysis, Last Time

EE 290A: Generalized Principal Component Analysis Lecture 2 (by Allen Y. Yang): Extensions of PCA Sastry & Yang © Spring, 2011EE 290A, University of California,

Minimizing general submodular functions

Learning With Structured Sparsity

Sparse Inverse Covariance Estimation with Graphical LASSO J. Friedman, T. Hastie, R. Tibshirani Biostatistics, 2008 Presented by Minhua Chen 1.

Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,

Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.

An Introduction to Support Vector Machines (M. Law)

The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.

An Efficient Greedy Method for Unsupervised Feature Selection

A Semi-Blind Technique for MIMO Channel Matrix Estimation Aditya Jagannatham and Bhaskar D. Rao The proposed algorithm performs well compared to its training.

Robust Optimization and Applications Laurent El Ghaoui IMA Tutorial, March 11, 2003.

Lecture 2: Statistical learning primer for biologists

Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.

PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Model Selection, Regularization Machine Learning B Seyoung Kim Many of these slides are derived from Ziv-Bar Joseph. Thanks! 1.

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

Advanced Artificial Intelligence Lecture 8: Advance machine learning.

An Efficient Algorithm for a Class of Fused Lasso Problems Jun Liu, Lei Yuan, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona.

Extraction of Individual Tracks from Polyphonic Music Nick Starr.

Introduction to several works and Some Ideas Songcan Chen

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project A Distributed Newton Method for Network Optimization Ali Jadbabaie and Asu Ozdaglar.

DEEP LEARNING BOOK CHAPTER to CHAPTER 6

Object Orie’d Data Analysis, Last Time

Linli Xu Martha White Dale Schuurmans University of Alberta

Multi-task learning approaches to modeling context-specific networks

Multiplicative updates for L1-regularized regression

Background on Classification

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

Boosting and Additive Trees (2)

Menglong Li Ph.d of Industrial Engineering Dec 1st 2016

The Elements of Statistical Learning

Basic Algorithms Christina Gallner

Robust Optimization and Applications in Machine Learning

Lecture 25 Radial Basis Network (II)

Probabilistic Models with Latent Variables

Estimating Networks With Jumps

Dimension reduction : PCA and Clustering

Introduction to Radial Basis Function Networks

Sparse Principal Component Analysis

Outline Sparse Reconstruction RIP Condition

Outline Variance Matrix of Stochastic Variables and Orthogonal Transforms Principle Component Analysis Generalized Eigenvalue Decomposition.

Presentation transcript:

Robust Optimization and Applications in Machine Learning

Part 4: Sparsity in Unsupervised Learning

Unsupervised learning

Sparse PCA: outline

Principal Component Analysis

PCA for visualization

First principal component

Sparse PCA: outline

Why sparse factors?

PCA: rank-one case

sparse PCA: rank-one case

Sparse PCA: outline

SDP relaxation

Dual problem

Sparsity and robustness

Sparse PCA decomposition?

Sparse PCA: outline

First-order algorithm

Sparse PCA: outline

PITPROPS data

PITPROPS data: numerical results

Financial example

Covariance matrix

Second factor

Gene expression data

Clustering of gene expression data

Conclusions on sparse PCA

Part 4: Sparsity in Unsupervised Learning

Sparse Gaussian networks: outline

Gaussian network problem

Correlation-based approach

Approach based on the precision matrix

Example

Relevance network vs. graphical model

Can we check this?

Sparse inverse covariance and conditional independence

Related work

Maximum-likelihood estimation

Problems with ordinary MLE

MLE with cardinality penalty

Convex relaxation

Link with robustness

Properties of estimate

Algorithms: challenges

First- vs. second-order algorithms

Black- vs. grey-box first-order algorithms

Algorithms: problem structure

Nesterov’s smooth minimization algorithm

Nesterov’s method

Putting the problem in Nesterov’s format

Making the problem smooth

Optimal scheme for smooth minimization

Application to our problem

Dual block-coordinate descent

Properties of dual block-coordinate descent

Link with LASSO

Example

Inverse covariance estimates

Average error on zeros

Computing time

Classification error

Recovering structure

Part 4: summary

References