Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison November 14, 2015 TexPoint.

Slides:

Advertisements

Similar presentations

Chapter 6 Matrix Algebra.

Advertisements

ECG Signal processing (2)

An Introduction of Support Vector Machine

C&O 355 Lecture 4 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A.

SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

Support Vector Machines and Kernel Methods

The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.

Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.

Reduced Support Vector Machine

Matrices and Systems of Equations

Ch 7.2: Review of Matrices For theoretical and computation reasons, we review results of matrix theory in this section and the next. A matrix A is an m.

Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.

SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.

Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000

Support Vector Machines

What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Copyright © Cengage Learning. All rights reserved. 7.6 The Inverse of a Square Matrix.

Mathematical Programming in Support Vector Machines

Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.

Matrix. REVIEW LAST LECTURE Keyword Parametric form Augmented Matrix Elementary Operation Gaussian Elimination Row Echelon form Reduced Row Echelon form.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.

Support Vector Machines for Data Fitting and Classification David R. Musicant with Olvi L. Mangasarian UW-Madison Data Mining Institute Annual Review June.

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

WEEK 8 SYSTEMS OF EQUATIONS DETERMINANTS AND CRAMER’S RULE.

Support Vector Machine (SVM) Based on Nello Cristianini presentation

Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel Transductive Rademacher Complexity and its Applications.

The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,

Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.

Copyright © 2009 Pearson Education, Inc. CHAPTER 9: Systems of Equations and Matrices 9.1 Systems of Equations in Two Variables 9.2 Systems of Equations.

The Secrecy of Compressed Sensing Measurements Yaron Rachlin & Dror Baron TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.

CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct

Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison

Privacy-Preserving Linear Programming Olvi Mangasarian UW Madison & UCSD La Jolla UCSD – Center for Computational Mathematics Seminar January 11, 2011.

Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.

Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison.

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Copyright © 2011 Pearson Education, Inc. Solving Linear Systems Using Matrices Section 6.1 Matrices and Determinants.

CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.

Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.

Exact Differentiable Exterior Penalty for Linear Programming Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison December 20, 2015 TexPoint.

Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.

Matrices and Matrix Operations. Matrices An m×n matrix A is a rectangular array of mn real numbers arranged in m horizontal rows and n vertical columns.

Nonlinear Knowledge in Kernel Approximation Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison.

Nonlinear Knowledge in Kernel Machines Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Data Mining and Mathematical Programming Workshop.

Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.

Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison March 3, 2016 TexPoint.

SVMs in a Nutshell.

Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.

Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,

Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.

Exact Differentiable Exterior Penalty for Linear Programming

Systems of linear equations

The Inverse of a Square Matrix

Inverse of a Square Matrix

Matrix Solutions to Linear Systems

Section 9.5 Inverses of Matrices

Support Vector Machines

University of Wisconsin - Madison

Minimal Kernel Classifiers

Presentation transcript:

Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison November 14, 2015 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AAA A A A A A

Vertically Partitioned DataHorizontally Partitioned Data A A1A1 A2A2 A3A3 A¢1A¢1 A¢2A¢2 A¢3A¢3 Data Features 1 2..………….…………. n Examples m m

Problem Statement Entities with related data wish to learn a classifier based on all data The entities are unwilling to reveal their data to each other –If each entity holds a different set of features for all examples, then the data is said to be vertically partitioned –If each entity holds a different set of examples with all features, then the data is said to be horizontally partitioned Our approach: privacy-preserving support vector machine (PPSVM) using random kernels –Provides accurate classification –Does not reveal private information

Outline Support vector machines (SVMs) Reduced and random kernel SVMs Privacy-preserving SVM for vertically partitioned data Privacy-preserving SVM for horizontally partitioned data Summary

K(x 0, A 0 )u =  1  Support Vector Machines K(A +, A 0 )u ¸ e  +e K(A , A 0 )u · e   e + _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ K(x 0, A 0 )u =  K(x 0, A 0 )u =  Slack variable y ¸ 0 allows points to be on the wrong side of the bounding surface x 2 R n SVM defined by parameters u and threshold  of the nonlinear surface A contains all data points {+…+} ½ A + {  …  } ½ A  e is a vector of ones SVMs Minimize e 0 s (||u|| 1 at solution) to reduce overfitting Minimize e 0 y (hinge loss or plus function or max{, 0}) to fit data Linear kernel: (K(A, B)) ij = (AB) ij = A i B ¢ j = K(A i, B ¢ j ) Gaussian kernel, parameter  (K(A, B)) ij = exp(-  ||A i 0 -B ¢ j || 2 )

Support Vector MachineReduced Support Vector Machine L&M, 2001: replace the kernel matrix K(A, A 0 ) with K(A, Ā 0 ), where Ā 0 consists of a randomly selected subset of the rows of A M&T, 2006: replace the kernel matrix K(A, A 0 ) with K(A, B 0 ), where B 0 is a completely random matrix Random Reduced Support Vector Machine Using the random kernel K(A, B 0 ) is a key result for generating a simple and accurate privacy-preserving SVM

Error of Random Kernels is Comparable to Full Kernels: Linear Kernels Full Kernel AA 0 Error Random Kernel AB 0 Error Each point represents one of 7 datasets from the UCI repository B is a random matrix with the same number of columns as A and 10% as many rows. dim(AB 0 ) << dim(AA 0 ) Equal error for random and full kernels

Error of Random Kernels is Comparable to Full Kernels: Gaussian Kernels Full Kernel K(A, A 0 ) Error Random Kernel K(A, B 0 ) Error

Vertically Partitioned Data: Each entity holds different features for the same examples A¢1A¢1 A¢3A¢3 A¢2A¢2 A¢1A¢1 A¢2A¢2 A¢3A¢3

Serial Secure Computation of the Linear Kernel AA 0 Yu-Vaidya-Jiang 2006 A ¢ 1 A ¢ R 1 (A ¢ 1 A ¢ R 1 ) + A ¢ 2 A ¢ 2 0 ((A ¢ 1 A ¢ R 1 ) + A ¢ 2 A ¢ 2 0 ) + A ¢ 3 A ¢ 3 0

Our Parallel Secure Computation of the Random Linear Kernel AB 0 A¢1B¢10A¢1B¢10 A¢2B¢20A¢2B¢20 A¢3B¢30A¢3B¢30 A¢1B¢10A¢1B¢10 A¢2B¢20A¢2B¢20 A¢3B¢30A¢3B¢30

Privacy Preserving SVMs for Vertically Partitioned Data via Random Kernels Each of q entities privately owns a block of data A ¢ 1, …, A ¢ q that it is unwilling to share with the others Each entity j picks its own random matrix B ¢ j and distributes K(A ¢ j, B ¢ j ) to the other p - 1 entites K(A, B 0 ) = K(A ¢ 1, B ¢ 1 0 ) © … © K(A ¢ p, B ¢ p 0 ) – © is + for the linear kernel – © is the Hadamard (element-wise) product for the Gaussian kernel A new point x = (x 1 0, …, x p 0 ) 0 can be distributed amongst the entities by similarly computing K(x 0, B 0 ) = K(x 1 0, B ¢ 1 ) © … © K(x p 0, B ¢ p 0 ) Recovering A ¢ j from K(A ¢ j, B ¢ j 0 ) without knowing B ¢ j is essentially impossible

Results for PPSVM on Vertically Partitioned Data Compare classifiers which share feature data with classifiers which do not share –Seven datasets from the UCI repository Simulate situations in which each entity has only a subset of features –In first situation, features evenly divided between 5 entities –In second situation, each entity receives about 3 features

Error Rate of Sharing Data Generally Better than not Sharing: Linear Kernels Error Without Sharing Data Error Sharing Data Error Rate Without Sharing Error Rate With Sharing 7 datasets represented by two points each

Error Rate of Sharing Data Generally Better than not Sharing: Nonlinear Kernels Error Without Sharing Data Error Sharing Data

Horizontally Partitioned Data: Each entity holds different examples with the same features A1A1 A2A2 A3A3 A3A3 A2A2 A1A1

Privacy Preserving SVMs for Horizontally Partitioned Data via Random Kernels Each of q entities privately owns a block of data A 1, …, A q that they are unwilling to share with the other q - 1 entities The entities all agree on the same random basis matrix and distribute K(A j, B 0 ) to all entities K(A, B 0 ) = A j cannot be recovered uniquely from K(A j, B 0 )

B Privacy Preservation: Infinite Number of Solutions for A i given A i B 0 Given – – Consider an attempt to solve for row r of A i, 1 · r · m i from the equation –BA ir 0 = P ir, A ir 0 2 R n –Every square submatrix of the random matrix B is nonsingular –There are at least Thus there are solutions A i to the equation BA i 0 = P i If each entity has 20 points in R 30, there are solutions Furthermore, each of the infinite number of matrices in the affine hull of these matrices is a solution P ir A ir 0 =

Results for PPSVM on Horizontally Partitioned Data Compare classifiers which share examples with classifiers which do not share –Seven datasets from the UCI repository Simulate a situation in which each entity has only a subset of about 25 examples

Error Rate of Sharing Data is Better than not Sharing: Linear Kernels Error Without Sharing Data Error Sharing Data

Error Rate of Sharing Data is Better than not Sharing: Gaussian Kernels Error Without Sharing Data Error Sharing Data

Summary Privacy preserving SVM for vertically or horizontally partitioned data –Based on using the random kernel K(A, B 0 ) –Learn classifier using all data, but without revealing privately held data –Classification accuracy is better than an SVM without sharing, and comparable to an SVM where all data is shared