Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California.

Slides:

Advertisements

Similar presentations

Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.

Advertisements

ECG Signal processing (2)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

An Introduction of Support Vector Machine

Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li, AdaMKL: A Novel.

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.

Robust Multi-Kernel Classification of Uncertain and Imbalanced Data

Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005

1 Mining Relationships Among Interval-based Events for Classification Dhaval Patel 、 Wynne Hsu Mong 、 Li Lee SIGMOD 08.

Chapter 2 Random Vectors 與他們之間的性質 (Random vectors and their properties)

序列分析工具:MDDLogo 謝勝任林宗慶指導教授:李宗夷教授.

Support Vector Machines (and Kernel Methods in general)

Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.

1 Introduction to Chemical Engineering Thermodynamics Chapter 7 Applications of Thermodynamics to Flow Processes Smith.

Speeding up multi-task learning Phong T Pham. Multi-task learning  Combine data from various data sources  Potentially exploit the inter-relation between.

STAT0_sampling Random Sampling  母體： Finite population & Infinity population  由一大小為 N 的有限母體中抽出一樣本數為 n 的樣本，若每一樣本被抽出的機率是一樣的，這樣本稱為隨機樣本 (random sample)

1 Acceleration of Euclidean algorithm and extensions Acceleration of Euclidean algorithm and extensions V. Y. Pan, X. Wang, Proceedings of the 2002 International.

A Server-aided Signature Scheme Based on Secret Sharing for Mobile Commerce Source: Journal of Computers, Vol.19, No.1, April 2008 Author: Chin-Ling Chen,

Classification and risk prediction Usman Roshan. Disease risk prediction What is the best method to predict disease risk? –We looked at the maximum likelihood.

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Presented by: Travis Desell.

Support Vector Machines for Multiple- Instance Learning Authors: Andrews, S.; Tsochantaridis, I. & Hofmann, T. (Advances in Neural Information Processing.

自動機 (Automata) Time: 1:10~2:00 Monday: practice exercise, quiz 2:10~4:00 Wednesday: lecture Textbook: (new!) An Introduction to Formal Languages and Automata,

第二十一章研究流程、論文結構　　　　　　　與研究範例 21-1 　研究流程 21-2 　論文結構 21-3 　研究範例.

7.4 Lookback Options 指導教授：戴天時報告者：陳博宇. 章節結構 Floating Strike Lookback Black-Scholes-Merton Equation Reduction of Dimension Computation.

: GCD - Extreme II ★★★★☆ 題組： Contest Archive with Online Judge 題號： 11426: GCD - Extreme II 解題者：蔡宗翰解題日期： 2008 年 9 月 19 日題意：最多 20,000 組測資，題目會給一個數字.

Extreme Discrete Summation ★★★★☆ 題組： Contest Archive with Online Judge 題號： Extreme Discrete Summation 解題者：蔡宗翰解題日期： 2008 年 10 月 13 日.

Cluster Analysis 目的 – 將資料分成幾個相異性最大的群組基本問題 – 如何衡量事務之間的相似性 – 如何將相似的資料歸入同一群組 – 如何解釋群組的特性.

What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.

SVM (Support Vector Machines) Base on statistical learning theory choose the kernel before the learning process.

Scalable Text Mining with Sparse Generative Models

Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:

1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.

Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Presenter ： Chien-Shing Chen Author: Tie-Yan.

1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti.

“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.

The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Chun Kai Chen Author: Aravind.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison

Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000.

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Face Recognition by Support Vector Machines 指導教授 : 王啟州教授學生 : 陳桂華 Guodong Guo, Stan Z. Li, and Kapluk Chan School of Electrical and Electronic Engineering.

Considering Cost Asymmetry in Learning Classifiers Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007 by Bach, Heckerman and.

Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.

Online Learning of Maximum Margin Classifiers Kohei HATANO Kyusyu University (Joint work with K. Ishibashi and M. Takeda) p-Norm with Bias COLT 2008.

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert Samy Bengio Yoshua Bengio Prepared ： S.Y.C. Neural Information Processing Systems,

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Learning from imbalanced data in surveillance of nosocomial.

The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.

BSP: An iterated local search heuristic for the hyperplane with minimum number of misclassifications Usman Roshan.

Support Vector Machines

Basic machine learning background with Python scikit-learn

Kernels Usman Roshan.

Usman Roshan CS 675 Machine Learning

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California Advisor:Dr. Hsu. Reporter:Hung Ching-Wen

Outline 1. Motivation 2. Objective 3. Introduction 4. SVM 5. Squashing for SVM 6.EXPERIMENTS 7. conclusion

Motivation SVM provide classification model with strong theoretical foundation and excellent empirical performance. But the major drawback of SVM is the necessity to solve a large-scale quadratic programming problem.

Objective This paper combines likelihooh-based squashing with a probabilistic formulation of SVMs, enabling fast training on squashed data sets.

Introduction The applicability of SVMs to large datasets is limited,because the high computational cost. Speed-up training algorithms: Chunking,Osuna’s decomposition method SMO They can accelerate the training, but cannot scale well with the size of the training data.

Introduction Reducing the computational cost : Sampling Boosting Squashing(DuMouchel et. al.,Madigan et. al.) 本文作者提出 Squashing-SMO, 以解決 SVM 的高計算成本問題

SVM Training data:D= ｛ (xi,yi):i=1,…,N ｝ xi is a vector, yi=+1,-1 In linear SVM :The linear separating classify y= +b w is the normal vector b is the intercept of the hyperplane

SVM(non-separable)

SVM(a prior on w)

Squashing for SVM (1).Select a probabilistic model P((X,Y) ∣ θ) (2).Our objective is to find mle θ ML

Squashing for SVM (3). Training data:D= ｛ (xi,yi):i=1,…,N ｝ can be grouped into N c groups (Xc,Yc) sq :The squashed data point placed at the cluster C βc :the wieght

Squashing for SVM If take the prior of w is P(w) ～ exp(- ∥ w ∥ 2 )

Squashing for SVM (4).The optimization model for the squashed data:

Squashing for SVM Important design issues for the squashing algorithm: (1).the choice of the number and location of the squashing points (2).to sample the values of w from the prior p(w) (3).b can be made from the optimization model (4).fixed w,b,we evaluate the likelihood of training point, and repeat the selection procedure L times(L is length)

EXPERIMENTS experiment datasets: Synthetic data UCI machine learning UCI KKD repositories

EXPERIMENTS Evalute: Full-SMO,Srs-SMO(simple random simple),squash-SMO,boost-SMO Run:over 100 runs Performance: Misclassification rate,learning time,the memory

EXPERIMENTS(Results on Synthetic data) (Wf,bf):estimated by full-SMO (Ws,bs): :estimated by squashed or sampled data

EXPERIMENTS(Results on Synthetic data)

EXPERIMENTS(Results on Benchmark data)

EXPERIMENTS(Results on Benchmark data

EXPERIMENTS(Results on Benchmark data)

conclusion 1.we describe how the use of squashing make the training of SVM applicable to large datasets. 2.comparison with full-SMO show squash-SMO and boost-SMO are near-optimal performance with much lower time and memory. 3.srs-SMO has a higher misclassification rate. 4.squash-SMO and boost-SMO can tune parameter in cross-validation,it is impossible to full-SMO

conclusion 5.although the performance of squash-SMO and boost-SMO is similar on the benchmark problems. 6. squash-SMO can offer a better interpretability of model and can be expected to run faster than SMO that do not reside in the memory.

opinion It is a good ideal that the author describe how the use of squashing make the training of SVM applicable to large datasets. 我們可以根據資料性質來改變 w 的 prior distribution, 例如指數分配,Log-normal, 或用無母數方法去做