Proximal Support Vector Machine for Spatial Data Using P-trees1

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Classification: Alternative Techniques
SVM—Support Vector Machines
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Classification and Decision Boundaries
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Support Vector Machines
SVM (Support Vector Machines) Base on statistical learning theory choose the kernel before the learning process.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Artificial Neural Network Applications on Remotely Sensed Imagery Kaushik Das, Qin Ding, William Perrizo North Dakota State University
Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science.
Vertical Set Square Distance: A Fast and Scalable Technique to Compute Total Variation in Large Datasets Taufik Abidin, Amal Perera, Masum Serazi, William.
Data mining and machine learning A brief introduction.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
Clustering Analysis of Spatial Data Using Peano Count Trees Qiang Ding William Perrizo Department of Computer Science North Dakota State University, USA.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Ptree * -based Approach to Mining Gene Expression Data Fei Pan 1, Xin Hu 2, William Perrizo 1 1. Dept. Computer Science, 2. Dept. Pharmaceutical Science,
Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University.
Fast Kernel-Density-Based Classification and Clustering Using P-Trees Anne Denton Major Advisor: William Perrizo.
The Universality of Nearest Neighbor Sets in Classification and Prediction Dr. William Perrizo, Dr. Gregory Wettstein, Dr. Amal Shehan Perera and Tingda.
An Introduction to Support Vector Machine (SVM)
A Fast and Scalable Nearest Neighbor Based Classification Taufik Abidin and William Perrizo Department of Computer Science North Dakota State University.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
The Universality of Nearest Neighbor Sets in Classification and Prediction Dr. William Perrizo, Dr. Gregory Wettstein, Dr. Amal Shehan Perera and Tingda.
Our Approach  Vertical, horizontally horizontal data vertically)  Vertical, compressed data structures, variously called either Predicate-trees or Peano-trees.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Fast and Scalable Nearest Neighbor Based Classification Taufik Abidin and William Perrizo Department of Computer Science North Dakota State University.
An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University.
Vertical Set Square Distance Based Clustering without Prior Knowledge of K Amal Perera,Taufik Abidin, Masum Serazi, Dept. of CS, North Dakota State University.
P Left half of rt half ? false  Left half pure1? false  Whole is pure1? false  0 5. Rt half of right half? true  1.
CS 9633 Machine Learning Support Vector Machines
Item-Based P-Tree Collaborative Filtering applied to the Netflix Data
Decision Tree Classification of Spatial Data Streams Using Peano Count Trees Qiang Ding Qin Ding * William Perrizo Department of Computer Science.
k-Nearest neighbors and decision tree
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Efficient Image Classification on Vertically Decomposed Data
Decision Tree Induction for High-Dimensional Data Using P-Trees
Efficient Ranking of Keyword Queries Using P-trees
Efficient Ranking of Keyword Queries Using P-trees
Parallel Density-based Hybrid Clustering
Basic machine learning background with Python scikit-learn
North Dakota State University Fargo, ND USA
Yue (Jenny) Cui and William Perrizo North Dakota State University
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
An Introduction to Support Vector Machines
Efficient Image Classification on Vertically Decomposed Data
A Fast and Scalable Nearest Neighbor Based Classification
Vertical K Median Clustering
A Fast and Scalable Nearest Neighbor Based Classification
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Vertical K Median Clustering
COSC 4335: Other Classification Techniques
North Dakota State University Fargo, ND USA
Presented by: Chang Jia As for: Pattern Recognition
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Artificial Intelligence Lecture No. 28
Vertical K Median Clustering
North Dakota State University Fargo, ND USA
COSC 4368 Machine Learning Organization
The P-tree Structure and its Algebra Qin Ding Maleq Khan Amalendu Roy
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Proximal Support Vector Machine for Spatial Data Using P-trees1 Fei Pan, Baoying Wang, Dongmei Ren, Xin Hu, William Perrizo 1Patents are pending on P-tree technology by North Dakota State University

OUTLINE Introduction Brief Review of SVM Review of P-tree and EIN-ring Proximal Support Vector Machine Performance Analysis Conclusion

Introduction In this research paper, we develop an efficient proximal support vector machine (SVM) for spatial data using P-trees. The central idea is to fit a binary class boundary using piecewise linear segments.

Brief Review of SVM In very simple terms an SVM corresponds to a linear method (perceptron) in a very high dimensional feature space that is nonlinearly related to the input space. By using kernels, a nonlinear class boundary is is transformed into a linear boundary in a high dimensional feature space, where linear methods apply. The resulting classification in the original feature space is thereby exposed. Support Vector Machines (SVMs) are forcefully competing with Neural Networks as tools for solving pattern recognition problems. They are based on some beautifully simple ideas and provide a clear intuition of what learning from examples is all about. More importantly they are also showing high performances in practical applications.

More About of SVM The goal of a support vector machine classifier is to find the particular hyperplane in high dimensions for which the separation margin between two classes is maximized. Given a data set which contains the data belonging to two or more different classes, either linearly separable or non-separable, the problem is to find the optimal separating hyperplane (decision Boundary) to separate the data according to their class type.

More About of SVM Recently there has been explosion of interest in SVMs, which have empirically been shown to give good classification performance on a wide variety of problems. However, the training of SVMs is extremely slow for large scale data set. Lack of scalability is a problem for SVM.

Our Approach Our approach is a geometric method with well tuned accuracy and efficiency by using P-trees and EIN-rings (Equal Interval Neighborhood rings). Outliers in the training data are first identified and eliminated. The method is local (proximal) – I.e., no training phase is required. Preliminary tests show that the method has promise for both speed and accuracy.

But it is pure (pure0) so this branch ends Current practice: Sets of horizontal records Ptrees: vertically project each attribute; vertically project each bit pos of each attribute; processed vertically (vertical scans) compress each bit slice into a basic Ptree; 010 111 110 001 011 111 110 000 010 110 101 001 010 111 101 111 101 010 001 100 010 010 001 101 111 000 001 100 R( A1 A2 A3 A4) Horizontally AND basic Ptrees Horizontal structures (records) Scanned vertically R[A1] R[A2] R[A3] R[A4] 010 111 110 001 011 111 110 000 010 110 101 001 010 111 101 111 101 010 001 100 010 010 001 101 111 000 001 100 R11 1 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 0 0 R11 R12 R13 R21 R22 R23 R31 R32 R33 R41 R42 R43 The 1-Dimensional Ptree, P11, of R11 built by recording the truth of predicate “pure 1” recursively on halves, until purity is reached. 1. Whole is pure1? false 0 2. 1st half pure1? false  0 3. 2nd half pure1? false  0 0 0 P11 P12 P13 P21 P22 P23 P31 P32 P33 P41 P42 P43 0 0 0 1 10 1 0 01 0 0 0 1 01 10 1 0 0 1 0 1 0 0 10 01 4. 1st half of 2nd ? false  0 0 0 6. 1sthalf of 1st of 2nd? true1 0 0 0 1 1 5. 2nd half of 2nd half? true1 0 0 0 1 7. 2ndhalf of 1st of 2nd false0 0 0 0 1 10 But it is pure (pure0) so this branch ends Eg, to count, 111 000 001 100s, use “pure111000001100”: 0 23-level P11^P12^P13^P’21^P’22^P’23^P’31^P’32^P33^P41^P’42^P’43 = 0 0 22-level =2 01 21-level

When is Horizontal Processing of Vertical Structures a good idea? Their NOT for record-based workloads (e.g., SQL) (where the result is a set of records), changing horizontal record to vertical trees and then having to reconstruct horizontal result records, may mean excessive post processing. 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 0 0 R11 R12 R13 R21 R22 R23 R31 R32 R33 R41 R42 R43 010 111 110 001 011 111 110 000 010 110 101 001 010 111 101 111 101 010 001 100 010 010 001 101 111 000 001 100 R( A1 A2 A3 A4) They are for data mining workloads, result is often a bit (Yes/No, T/F), so no reconstructive post processing. 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 0 0 R11 R12 R13 R21 R22 R23 R31 R32 R33 R41 R42 R43 1

2-Dimensional Pure1-trees Node is 1 iff that quadrant is purely 1-bits, e.g., A bit-file (e.g., high-order bit of the RED band of a 2-D image) 1111110011111000111111001111111011110000111100001111000001110000 Which, in spatial raster order looks like: Run-length compress it into a quadrant tree using Peano order. 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1

1 55 level-3 (pure=43) 16 15 16 level-2 3 4 1 4 level-1 1 level-0 1 2 Counts are needed in DM. Predicate-trees are very compressed and can produce counts quickly. However, Count-trees are an alternative - each inode counts 1s in that quadrant): 1=001 1 55 level-3 (pure=43) 16 15 16 level-2 3 4 1 4 level-1 1 level-0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 2 3 2 3 2 . 2 . 3 7=111 ( 7, 1 ) ( 111, 001 ) 10.10.11

Logical Operations on Ptrees (are used to get counts of any pattern) Ptree 1 Ptree 2 AND result OR result AND operation is faster than the bit-by-bit AND since, there are shortcuts (any pure0 operand node means result node is pure0.) (any pure1, copy subtree of the other operand to the result) e.g., only load quadrant 2 to AND Ptree1, Ptree2, etc. The more operands there are in the AND, the greater the benefit due to this shortcut (more pure0 nodes).

Hilbert Ordering? In 2-dimensions, Peano ordering is 22-recursive z-ordering (raster ordering) Hilbert ordering is 44-recursive tuning fork ordering (H-trees have fanout=16) Somewhat better continuity characteristics, but a much less usable coordinate –to-quadrant translator.

3-Dimensional Ptrees

Same relation, showing values in binary Generalizing Peano compression to any table with numeric attributes. Raster Sorting: Attributes 1st Bit position 2nd Peano Sorting: Bit position 1st Attributes 2nd Unsorted relation Same relation, showing values in binary

Generalize Peano Sorting can make a big difference in classification speed Classificatin speed improvement (sample-based classifier) Using 5 UCI Machine Learning Repository data sets 120 100 Unsorted 80 Generalized Raster Time in Seconds 60 40 20 Generalized Peano adult spam mushroom function crop

Range predicate tree: Px>v For identifying/counting tuples satisfying a given range predicate. v=bm…bi…b0 Px>v = Pm opm … Pi opi Pi-1 … opk+1 Pk 1) opi is  if bi=1, opi is  otherwise 2) k is rightmost bit position with value “0” For example: Px >101 = (P2  (P1 P0))

Pxv v=bm…bi…b0 Pxv = P’m opm … P’i opi P’i-1 … opk+1P’k 1) opi is  if bi=0, opi is  otherwise 2) k is rightmost bit position of v with “0” For example: Px  101 = (P’2  P’1) Given a data set with d attributes, X = (An, An-1 … A0), and the binary representation of jth attribute Aj as (bj,mbj,m-1...bj,i… bj,0.)

Equal Interval Neighborhood Rings (EIN-rings) (using L distance) 3rd EIN-ring Diagram of EIN-Ring 2nd EIN-ring    C 1st EIN-ring EIN-Ring of data point c with radii r and fixed interval  is defined as the neighborhood ring R(c, r, r+) = {x X | r < |c-x|  r+}, where |c-x| is the distance between x and c.

EIN-ring Based Neighborhood Search Using Range Predicate Tree X X X x r  r r+ x x P X1 = (x1-r-, x1 + r + ] X2 = (x2-r-, x2 + r + ] P’ X1 = (x1-r, x1+r] X2 = (x2-r, x2+r] P ^ P’ X1 = (x1-r-, x1 + r + ] X1 = (x1-r, x1+r] X2 = (x2-r-, x2 + r + ] X2 = (x2-r, x2+r] EIN-ring Based Neighborhood Search Using Range Predicate Tree

Proximal Support Vector Machine (P-SVM) 1) Find region components (proximities) using EIN-rings. 2) Calculate EIN-ring membership and find support vector pairs. 3) If the training space has d feature dimensions, calculate d-nearest boundary sentries in the training data, to determine a local boundary hyperplane segment. The class label is then determined by the unclassified sample’s location relative to the boundary hyperplane. In this research paper, we propose a proximal Support Vector Machine using P-tree(P-SVM). The overall algorithm of P-SVM is as follows: (1). Finding Region Components: Partition the training data into region components using EIN-ring according to the class label (2). Finding Support Vectors: Calculate EIN-ring membership of a data set in region component g. Then find support vector pairs based on EIN-ring membership of the data set. (3). Fitting Boundary: Calculate d-nearest Boundary Sentries of a test data, which determine the boundary hyperplane of the test data. The class label of the test data is then determined by its location relative to the boundary hyperplane.

Step 1: Find region components using EIN-rings (defined above) (1) ensure that each data object j N is assigned to some center i N, the constraints (2) ensure that whenever a data object j is assigned to a center i, then a center must have been opened at i, Definition: Given a training data set X with C classes, Region Components are groups of the training data points, x, which have more than half classmates within  neighbors, where classmates are the data points with same class label as x, and  is a number threshold of neighbors. Algorithm: 1. Set number of neighbors with the Hawaiian ring(NBRx) as a fixed number , e.g., 4, 6, or 8. If NBRx < , decrement r and until NBRs  . 2. Check the neighbors of x, mark x the same group as its classmates’ within the neighborhood. If none of its classmates within the neighborhood are marked, mark x as a new group member. If the number of x’s classmates < /2, treat x as an outlier. 3. Merge groups which have the same class label and are reachable to each other within  neighbors Assume outliers are eliminated during a data cleaning process

Step 2:finding support vector pairs Step 3:fit boundary hyper plane EIN-ring membership c: component r: radius Support vector pair Boundary Sentry Boundary hyper plane + + + å = Î m r c x NBR w N M 1 , * + + + + + + + + + + - + - + - + * - - - # * - - - The EIN-ring membership of data x in region component g, Mxg, is defined as normalized summation of the weighted tuple-P-tree root counts within EIN-rings, which is calculated as follows where Ng is the number of data points in region components g, and wr is the weight of EIN-rings, R(x, r-1, r). NBRxg,r is the number of neighbors in group g within the EIN-ring, R(x, 0, r). The EIN-ring Membership Pair of data x in region component g, HMPxg, is defined as HMPxg = (Mxg, Mxg’), where g’ is the neighboring region component of data x. A pair of two candidate data support vectors, xi, xjX, ij, and xi  g, xj  g’, is the support vector pair, SVP(xi, xj), iff d(xi, xj)  d(xk, xl) and xk  g, xl  g’. In another word, xi and xj are at right side of boundary and the nearest neighbors from the different region components. - - - - - - - H ( x ) = wx + w

PRELIMINARY EVALUATION Experiments were performed on a 1GHz Pentium PC machine with 1GB main memory, running on Debian Linux. The original image size is 1320x1320. For experimental purpose we form 16x16, 32x32, 64x64, 128x128, 256x256 and 512x512 image by choosing pixels that are uniformly distributed in original image. Aerial (TIFF) Image and Yield Map from Oaks, North Dakota

PRELIMINARY EVALUATION Dataset n = # of pixels Testing Correctness % P-SVM C-SVM n = 16x16 86.4% 84.9% n = 32x32 89.0% 85.2% n = 64x64 90.3% 95.4% n = 128x128 92.0% 90.5% n = 256x256 94.1% 91.1% n = 512x512 As shown in Table 7, the testing correctness of P-SVM and C-SVM on these dataset is almost identical. It indicates that P-SVM has comparable accuracy with C-SVM. The average CPU run time of 30 runs on the five different sizes of data is shown in Figure 17. The following Table shows the experiment results of average error rate for 30 runs of P-SVM and C-SVM. In each experiment run, we randomly select 10% of data set as test data and the rest as training data. P-SVM correctness appears to be comparable to standard SVM

PRELIMINARY EVALUATION Inf. We see that P-SVM is faster than C-SVM on all five different sizes of data set. When the data set size increases, the run time of P-SVM method increases at a much lower rate than C-SVM. The experiment results show that P-SVM method is more scalable for large spatial data set The speed experiments were also performed. The average CPU run time of 30 runs on the five different sizes of data is shown in the Figure above. P-SVM speed appears to be superior to standard SVM

CONCLUSION In this paper, we propose an efficient P-tree based proximal Support Vector Machine (P-SVM), which appears to improve speed without sacrificing accuracy. In the future, more extensive experiments and combination of P-SVM with KNN will be explored.

THANKS