Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science.

Similar presentations


Presentation on theme: "Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science."— Presentation transcript:

1 Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science North Dakota State University Fargo, ND 58105 These notes contain NDSU confidential and Proprietary material. Patents pending on the P-tree technology

2 Outline Introduction P-Tree P-Tree Algebra Bayesian Classifier Calculating Probabilities using P-Trees Band-based vs. Bit-based approach Sample Data Classification Accuracy Classification Time Conclusion

3 Introduction Classification is a form of data analysis and data mining that can be used to extract models describing important data classes or to predict future data trends. Some data classification techniques are:  Decision Tree Induction  Bayesian  Neural Networks  K-Nearest Neighbor  Case Based Reasoning  Genetic Algorithm  rough sets  fuzzy logic techniques A Bayesian classifier is a statistical classifier, which uses Bayes’ theorem to predict class membership as a conditional probability that a given data sample falls into a particular class.

4 Introduction Cont.. The P-Tree data structure allows us to compute the Bayesian probability values efficiently, without resorting to the naïve Bayesian assumption. Bayesian classification with P-Trees has been used successfully in remotely sensed image precision agriculture to predict yield and in genomics (2-yeast hybrid classification) to place in the ACM 02KDD-cup competition. http://www.biostata.wisc.edu/~craven/kddcup/winners.html http://www.biostata.wisc.edu/~craven/kddcup/winners.html To completely eliminate the naïve assumption, a bit-based Bayesian classification is used instead of a band-based approach.

5 P-Tree Most spatial data comes in a band format called BSQ. Each BSQ band is divided into several files, one for each bit position of the data values. This format is called ‘bit Sequential’ or bSQ. Each bSQ bit file, B ij (file constructed from the j th bits of i th band), into a tree structure, called a Peano Tree (P-Tree). P-Trees represent tabular data in a lossless, compressed, bit-by-bit, recursive, datamining-ready arrangement.

6 A bSQ file, its raster spatial file and P-Tree  Peano or Z-ordering  Pure (Pure-1/Pure-0) quadrant  Root Count  Level  Fan-out  QID (Quadrant ID) 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 55 1681516 30414434 11100010110 1 55 0 4 444 158 11 10 300 10 1 11 3 0 1 11111100111110001111110011111110111111111111111111111111011111111111110011111000111111001111111011111111111111111111111101111111

7 P-Tree Algebra Logical operator –And –Or –Complement –Other (XOR, etc) Applying this operators we calculate value P-Trees, interval P-Trees, and slice P-Trees. Ptree: 55 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 16 ____8__ _15__ 16 / / | \ / | \ \ 3 0 4 1 4 4 3 4 //|\ //|\ //|\ 1110 0010 1101 Complement: 9 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 0 ____8__ __1__ 0 / / | \ / | \ \ 1 4 0 3 0 0 1 0 //|\ //|\ //|\ 0001 1101 0010

8 ’ indicates COMPLEMENT operation P-Tree Algebra Cont.. Basic P-Trees can be combined using logical operations to produce P-Trees for the original values at any level of bit precision. Using 8-bit precision for values, P b 11010011, which counts the numer of occurrences of 11010011 in each quadrant, can be constructed from the basic P-Trees as: P b 11010011 = P b1 AND P b2 AND P b3 ’ AND P b4 AND P b5 ’ AND P b6 ’ AND P b7 AND P b8 AND operation is simply the pixel-wise AND of the bits Similarly, any data set in the relational format can be represented as P-Trees. For any combination of values, (v1,v2,…,vn), where vi is from band-i, the quadrant-wise count of occurrences of this combination of values is given by: P(v1,v2,…,vn) = P 1 v1 ^ P 2 v2 ^ … ^ P n vn

9 Bayesian Classifier  Pr(C i | X) is the posterior probability  Pr(C i ) is the prior probability  Can find conditional probabilities, Pr(X|C i ).  Classify X with Max Pr(C i | X)  Since Pr(X) is constant for all classes, therefore, instead maximize Pr(X|Ci) * Pr(Ci). )( )(*)|( )|( XPr i C i CX X i C  Based on Bayes Theorem:

10 Calculating Probabilities Pr(X|Ci) Using naïve assumption Pr(X | C i ) = Pr( X 1 | C i ) × Pr( X 2 | C i )… × Pr( X n | C i )× Pr( X C | C i ) Scan the data and calculate Pr(X | C i ) for given X. Using P-Trees: Pr(X|C i ) = # training samples in C i having pattern X / # samples in class C i = RC[ P 1 (X 1 ) ^ P 2 (X 2 ) ^ … ^P n (X n ) ^ P C (C i ) ] / RC[ P C (C i ) ] Problem ? : if RC[ P 1 (X 1 ) ^ P 2 (X 2 ) ^ … ^P n (X n ) ^ P C (C i ) ] = 0 for all i i.e unclassified pattern does not exist in the training set.

11 Band-based-P-tree Approach When all RC = 0 for given pattern –Reduce the restrictiveness of the pattern Removing the attribute with least information gain –Calculate (assume attribute 2 has the least IG) Pr( X | C i ) = RC[ P 1 X 1 ^ P 3 X 3 ^ … ^ P n X n ^ P C C i ] / RC[ P C C i ] Calculation of information gain Using P-trees –1 time calculation for the entire training data

12 Bit-based Approach Search for similar patterns by removing the least significant bits in the attribute space. The order of the bits to be removed is selected by calculating the info gain (IG). (b)(a) R 00 01 10 11 01101100 G G (c) 01101100 R 01 10 11 R 00 01 10 11 01100011 G 00 01 10 11 (d) 01101100 R G E.g., Calculate the Bayesian conditional probability value for the pattern [G,R] = [10,01] in 2-attribute space. Assume IG for 1 st significant bit of R < that of G. Assume IG for 2 nd significant bit of G < that of R. Initially, search for the pattern, [10,01] (a). If not found, search for [1_,01] considering IG for the 2 nd significant bit. Search space will increase (b). If not found, search for [1_,0_] considering IG for the 2 nd significant bit. Search space will increase (c). If not found, search for [1_,_ _] considering IG for the 1 st significant bit. Search space will increase (d).

13 Experiments The experimental data was extracted from two sets of aerial photographs of the Best Management Plot (BMP) of the Oakes Irrigation Test Area (OITA) near Oaks, North Dakota. » The images were taken in 1997 and 1998. Each image contains 3 bands, red, green and blue reflectance values. » Three other files contain synchronized soil moisture, nitrate and yield values.

14 Classification Accuracy Accuracy of the proposed bit-based approach is compared with band-based, and KNN with Euclidian distance. It is clear that our approach out performs the others. Classification accuracy for '97 Data 0 10 20 30 40 50 60 70 80 90 1K4K16K65K260K Training Data Size (pixels) Band-PtreeKNN-Euc.Bit

15 Classification Accuracy Cont.. The accuracy of the approach was also compared to an existing Bayesian belief network classifier. The classifier is J Cheng's Bayesian Belief Network available at http://www.cs.ualberta.ca/~jcheng/.http://www.cs.ualberta.ca/~jcheng/ –This classifier was the winning entry for the KDD Cup 2001 data mining competition. The developer claims that the classifier can perform with or without domain knowledge. For the comparison smaller training data sets ranging from 4K to 16K pixels were used due to the inability of the implementation to handle larger data sets. Training Size (pixels) Bit-Ptree BasedBayesian Belief 400066 %26 % 1600067 %51 % The Belief network was built without using any domain knowledge to make it comparable with to P-Tree approach. Accuracy

16 Classification Time P-Tree approach requires no build time (lazy classifier). In most lazy classifiers the classification time per tuple varies with the number of items in the training set due to the requirement of having to scan the training data. P-Tree approach does not require a traditional data scan. The data in figure was collected using 5 significant bits and a threshold probability of 0.85. The time is given for scalability comparisons. Variation of Classification Time with Training Size for bit-P-tree alg. 0 100 200 300 0100200300 Trainig sample size (pixels)

17 Conclusion Naïve assumption reduces the accuracy of the classification in this particular application domain. Our approach increases accuracy of a P-Tree Bayesian classifier by completely eliminating the naïve assumption. –New approach has a better accuracy than the existing P-Tree based Bayesian classifier. –It was also shown to be better than a Bayesian belief network implementation and a Euclidian distance based KNN approach. It has the same computational cost with respect to the use of P-Tree operations as the previous P-tree approach, and is scalable with respect to the size of the data set.


Download ppt "Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science."

Similar presentations


Ads by Google