Distance and Similarity measures What does “close” Mean

Slides:



Advertisements
Similar presentations
Naïve-Bayes Classifiers Business Intelligence for Managers.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Data Mining Classification: Alternative Techniques
CLUSTERING PROXIMITY MEASURES
Instance Based Learning
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Classification and Decision Boundaries
Chapter 5: Linear Discriminant Functions
Chapter 2: Pattern Recognition
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
CS Instance Based Learning1 Instance Based Learning.
Advanced Multimedia Text Classification Tamara Berg.
Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Distance Metric Measures the dissimilarity between two data points. A metric is a fctn, d, of 2 points X and Y, such that d(X, Y) is positive definite:
2002/4/10IDSL seminar Estimating Business Targets Advisor: Dr. Hsu Graduate: Yung-Chu Lin Data Source: Datta et al., KDD01, pp
Partitioning – A Uniform Model for Data Mining Anne Denton, Qin Ding, William Jockheck, Qiang Ding and William Perrizo.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
RDF: A Density-based Outlier Detection Method Using Vertical Data Representation Dongmei Ren, Baoying Wang, William Perrizo North Dakota State University,
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
K-Nearest Neighbor Classification on Spatial Data Streams Using P-trees Maleq Khan, Qin Ding, William Perrizo; NDSU.
Fast Kernel-Density-Based Classification and Clustering Using P-Trees Anne Denton Major Advisor: William Perrizo.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Fast Similarity Metric Based Data Mining Techniques Using P-trees: k-Nearest Neighbor Classification  Distance metric based computation using P-trees.
Overview Data Mining - classification and clustering
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Fast Similarity Metric Based Data Mining Techniques Using P-trees: k-Nearest Neighbor Classification  Distance metric based computation using P-trees.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Introduction to Machine Learning, its potential usage in network area,
CSE 4705 Artificial Intelligence
Data Transformation: Normalization
An Evolutionary Approach
k-Nearest neighbors and decision tree
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
School of Computer Science & Engineering
Instance Based Learning
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Efficient Image Classification on Vertically Decomposed Data
Decision Tree Induction for High-Dimensional Data Using P-Trees
Efficient Ranking of Keyword Queries Using P-trees
Efficient Ranking of Keyword Queries Using P-trees
Clustering (3) Center-based algorithms Fuzzy k-means
Machine Learning Feature Creation and Selection
Efficient Image Classification on Vertically Decomposed Data
K Nearest Neighbor Classification
A Fast and Scalable Nearest Neighbor Based Classification
Data Warehousing and Data Mining
Data Mining extracting knowledge from a large amount of data
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Outline Introduction Background Our Approach Experimental Results
Classification Algorithms
COSC 4335: Other Classification Techniques
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Vertical K Median Clustering
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Nearest Neighbors CSC 576: Data Mining.
Data Mining Classification: Alternative Techniques
Group 9 – Data Mining: Data
A task of induction to find patterns
Physics-guided machine learning for milling stability:
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
A task of induction to find patterns
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Modeling IDS using hybrid intelligent systems
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Presentation transcript:

Distance and Similarity measures What does “close” Mean Distance and Similarity measures What does “close” Mean? Separation Boundaries

Distance Metric Measures the dissimilarity between two data points. A metric is a fctn, d, of 2 points X and Y, such that      d(X, Y) is positive definite:   if (X  Y), d(X, Y) > 0                                if (X = Y), d(X, Y) = 0 d(X, Y) is symmetric: d(X, Y) = d(Y, X) d(X, Y) satisfies triangle inequality: d(X, Y) + d(Y, Z)  d(X, Z)

Standard Distance Metrics Minkowski distance or Lp distance, Manhattan distance, (P = 1) Euclidian distance, (P = 2) Max distance, (P = )

An Example d1  d2  d For any positive integer p, Z A two-dimensional space: Manhattan, d1(X,Y) = XZ+ ZY = 4+3 = 7 Euclidian, d2(X,Y) = XY = 5 Max, d(X,Y) = Max(XZ, ZY) = XZ = 4 d1  d2  d For any positive integer p,

HOBbit Similarity Higher Order Bit (HOBbit) similarity: These notes contain NDSU confidential & Proprietary material. Patents pending on bSQ, Ptree technology Higher Order Bit (HOBbit) similarity: HOBbitS(A, B) = A, B: two scalars (integer) ai, bi : ith bit of A and B (left to right) m : number of bits Bit position: 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 x1: 0 1 1 0 1 0 0 1 x2: 0 1 0 1 1 1 0 1 y1: 0 1 1 1 1 1 0 1 y2: 0 1 0 1 0 0 0 0 HOBbitS(x1, y1) = 3 HOBbitS(x2, y2) = 4

HOBbit Distance (High Order Bifurcation bit) HOBbit distance between two scalar value A and B: dv(A, B) = m – HOBbit(A, B) Example: Bit position: 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 x1: 0 1 1 0 1 0 0 1 x2: 0 1 0 1 1 1 0 1 y1: 0 1 1 1 1 1 0 1 y2: 0 1 0 1 0 0 0 0 HOBbitS(x1, y1) = 3 HOBbitS(x2, y2) = 4 dv(x1, y1) = 8 – 3 = 5 dv(x2, y2) = 8 – 4 = 4 HOBbit distance for X and Y: In our example (considering 2-dim data): dh(X, Y) = max (5, 4) = 5

HOBbit Distance Is a Metric HOBbit distance is positive definite      if (X = Y), = 0      if (X  Y), > 0 HOBbit distance is symmetric HOBbit distance holds triangle inequality

Neighborhood of a Point Neighborhood of a target point, T, is a set of points, S, such that X  S if and only if d(T, X)  r 2r T X Manhattan Euclidian Max HOBbit If X is a point on the boundary, d(T, X) = r

Decision Boundary R1 X d(A,X) A d(B,X) decision boundary between points A and B, is the locus of the point X satisfying d(A, X) = d(B, X) B A Decision boundary for HOBbit Distance is perpendicular to axis that makes max distance  > 45 Euclidian B A Max Manhattan  < 45 Decision boundaries for Manhattan, Euclidean and max distance

Minkowski Metrics ? Lp-metrics (aka: Minkowski metrics) dp(X,Y) = (i=1 to n wi|xi - yi|p)1/p (weights, wi assumed =1) Unit Disks Boundary p=1 (Manhattan) p=2 (Euclidean) p=3,4,… . P= (chessboard) P=½, ⅓, ¼, … dmax≡ max|xi - yi|  d≡ limp  dp(X,Y). Proof (sort of) limp  { i=1 to n aip }1/p ‎ max(ai) ≡b. For p large enough, other aip << bp since y=xp increasingly concave, so i=1 to n aip  k*bp (k=duplicity of b in the sum), so {i=1 to n aip }1/p  k1/p*b and k1/p 1

P>1 Lp metrics q x1 y1 x2 y2 Lq distance x to y 2 .5 0 .5 0 .7071067812 4 .5 0 .5 0 .5946035575 9 .5 0 .5 0 .5400298694 100 .5 0 .5 0 .503477775 MAX .5 0 .5 0 .5 x y P>1 Lp metrics q x1 y1 x2 y2 Lq distance x to y 2 .71 0 .71 0 1.0 3 .71 0 .71 0 .8908987181 7 .71 0 .71 0 .7807091822 100 .71 0 .71 0 .7120250978 MAX .71 0 .71 0 .7071067812 x y q x1 y1 x2 y2 Lq distance x to y 2 .99 0 .99 0 1.4000714267 8 .99 0 .99 0 1.0796026553 100 .99 0 .99 0 .9968859946 1000 .99 0 .99 0 .9906864536 MAX .99 0 .99 0 .99 x y q x1 y1 x2 y2 Lq distance x to y 2 1 0 1 0 1.4142135624 9 1 0 1 0 1.0800597389 100 1 0 1 0 1.0069555501 1000 1 0 1 0 1.0006933875 MAX 1 0 1 0 1 x y x q x1 y1 x2 y2 Lq distance x to y 2 3 0 3 0 4.2426406871 3 3 0 3 0 3.7797631497 8 3 0 3 0 3.271523198 100 3 0 3 0 3.0208666502 MAX 3 0 3 0 3 y q x1 y1 x2 y2 Lq distance x to y 2 .9 0 .1 0 .9055385138 9 .9 0 .1 0 .9000000003 100 .9 0 .1 0 .9 1000 .9 0 .1 0 .9 MAX .9 0 .1 0 .9 x y q x1 y1 x2 y2 Lq distance x to y 6 90 0 45 0 90.232863532 9 90 0 45 0 90.019514317 100 90 0 45 0 90 MAX 90 0 45 0 90 y x

P<1 Lp metrics d1/p(X,Y) = (i=1 to n |xi - yi|1/p)p P<1 q x1 y1 x2 y2 Lq distance x to y 1 .1 0 .1 0 .2 .8 .1 0 .1 0 .238 .4 .1 0 .1 0 .566 .2 .1 0 .1 0 3.2 .1 .1 0 .1 0 102 .04 .1 0 .1 0 3355443 .02 .1 0 .1 0 112589990684263 .01 .1 0 .1 0 1.2676 E+29 2 .1 0 .1 0 .141421356 x y P<1 Lp metrics d1/p(X,Y) = (i=1 to n |xi - yi|1/p)p P<1 For p=0 (lim as p0), Lp doesn’t exist (Does not converge.) q x1 y1 x2 y2 Lq distance x to y 1 .5 0 .5 0 1 .8 .5 0 .5 0 1.19 .4 .5 0 .5 0 2.83 .2 .5 0 .5 0 16 .1 .5 0 .5 0 512 .04 .5 0 .5 0 16777216 .02 .5 0 .5 0 5.63 E+14 .01 .5 0 .5 0 6.34 E+29 2 .5 0 .5 0 .7071 x y q x1 y1 x2 y2 Lq distance x to y 1 .9 0 0.1 0 1 .8 .9 0 0.1 0 1.098 .4 .9 0 0.1 0 2.1445 .2 .9 0 0.1 0 10.82 .1 .9 0 0.1 0 326.27 .04 .9 0 0.1 0 10312196.962 .02 .9 0 0.1 0 341871052443154 .01 .9 0 0.1 0 3.8 E+29 2 .9 0 0.1 0 .906 x y

Other Interesting Metrics Canberra metric: dc(X,Y) = (i=1 to n |xi – yi| / (xi + yi) normalized manhattan distance Square Cord metric: dsc(X,Y) = i=1 to n ( xi – yi )2 Already discussed as Lp with p=1/2 Squared Chi-squared metric: dchi(X,Y) = i=1 to n (xi – yi)2 / (xi + yi) Scalar Product metric: dchi(X,Y) = X • Y = i=1 to n xi * yi Hyperbolic metrics: (which map infinite space 1-1 onto a sphere) Which are rotationally invariant? Translation invariant? Other? Some notes on distance functions can be found at http://www.cs.ndsu.NoDak.edu/~datasurg/distance_similarity.pdf

LDS - Local DisSimilarity Measure Measures the dissimilarity don’t have to be full distance metrics: A metric is a fctn, d, of 2 points X and Y, such that      d(X, Y) is positive definite:   if (X  Y), d(X, Y) > 0                                if (X = Y), d(X, Y) = 0 d(X, Y) is symmetric: d(X, Y) = d(Y, X) d(X, Y) satisfies triangle inequality: d(X, Y) + d(Y, Z)  d(X, Z) They can be dissimilarity measures satisfying only definiteness and symmetry. Or they can be only what we will call “local dissimilarities or LDS” satisfying positive definiteness. The fact that an LDS is not symmetric means that y can be the nearest nbr of x even though x is not the nearest nbr of y. This is the case, for instance in Kriging (in which dissimilarity can vary with angle as well as center).

http://www.cs.ndsu.nodak.edu/~datasurg/kddcup02 ) In this Bioinformatics Data Warehouse (BDW) model, we can define a LDS usefully as follows (actually used in the DataSURG winning submission to the ACM KDD-Cup competition in 2002: http://www.cs.ndsu.nodak.edu/~datasurg/kddcup02 ) In this competition we were given a training set of data on the yeast genes (the yeast slice of the GeneOrgDimTbl) with ~3000 genes and ~1700 atttibutes (1700 basic Ptrees). We were to classify unclassified gene samples as Y/N (what the Yes or No meant is not important here). The approach was to find all nearest training neighbors of an unclassified sample, g, and let them vote. The definition of nearest came from an LDS:  unclassified gene, g, define fg:TNonNegReals fg (x) = gi=1wixi wi are weights. This assumes that a training point should not be considered close unless it agrees with g where g is 1 (where g is 0 it doesn’t matter). This definition solves the curse of dimensionality (for sparse data anyway). It also worked! Possible improvements include figuring some of the g=o attributes into the LDS definition (e.g., lethality, stop codon type, duplicity, …) by simply including Not-lethal, for instance, as an attribute along with lethal, etc. Also one can step out from the sample more finely too. One could use horizontal ANDs of Ptrees for the initial nearest neighbor set, but then, if that set is small, use vertical scans from there onward (one can do vertical scans with vertical data! How?) t h w a y GeneOrgDimTbl __g0___g1___g2___g3 a o3/met /reg /met /sig /\p n c t i o n /____/____/____/____/ \ u o2/met /reg /met /reg /\ /\f m l e x /____/____/____/____/ \/bio o o1/met /met /met /sig /\ /\ /\c c a l i z a t i o n /____/____/____/____/ \/syn/hy\ o o0/reg /reg /met /sig /\ /\ /\ /\l o t e i n c l a s s /____/____/____/____/ \/bio/oxy/pl\ r \ \ \ \ \ /\ /\ /\ /\p \____\____\____\____\/syn/ \ \ \ \ \ / GeneDimTbl \____\____\ __g0___g1___g2___g3_ \ \ /1483/1672/1209/2134/GO-id \ /____/____/____/____/ /mip1/mip2/mip5/mip7/MIPS-link /____/____/____/____/ /EMB1/EMB4/EMB9/EMB2/EMBL-link /ML-2/ML-4/ML-8/ML-3/Medline-link1 Name__Speces__Vertibrate/____/____/____/____/ OrgDimTbl / /homo / / / / / /| o3/human/sapian / 1 / 5 / 8 / 9 / 6 / | /_____/_______/_________/____/____/____/____/ | / /drosoph/ / / / / /| | o2/fly /pseudod/ 0 / 10 / 1 / 3 / 7 / | /| /_____/_______/_________/____/____/____/____/ |/ | / /limulus/ / / / / /| | | o1/crab / polyli/ 0 / 8 / 4 / 4 / 9 / | /|9/| /_____/_______/_________/____/____/____/____/ |/ |/ | / /muscili/ / / / / /| | | | o0/mouse/muscilu/ 1 / / / / / | /|2/|8/| /_____/_______/_________/__g0/__g1/__g2/__g3/ |/ |/ |/ | N/| | | | | | | | / / | 0 | 2 | 1 | 0 e0 /|8/|0/|7/ MN/|1|____|____|____|____|/ |/ |/ |/ / | | | | | | | | / ExperimentDimTbl HY/|h|/| 8 | 5 | 1 | 1 e1 /|2/|8/ ================ / | |0|____|____|____|____|/ |/ |/ SA/|c|/| | | | | | | / / | |h|/| 0 | 0 | 0 | 0 e2 /|8/ AD/|a|/| |1|____|____|____|____|/ |/ / | |s|/| | | | | | / ED/|2|/| |a|/| 2 | 3 | 8 | 8 e3 / / | |b|/| |1|____|____|____|____|/ /|3|/| |c|/| / 17 / 12 / 9 / 1 / STZ/ | |2|/| |a|/__1_/__0_/__0_/_0__/o0 GeneOrgDimTbl / A|/| |a|/| / 2 / 7 / 12 / 17 / ============= CITY/|M |2|/| |s|/__1_/__1_/__1_/__0_/o1 / | /| |4|/| / 11 / 10 / 12 / 9 / (Chromosome#, STR/|C|/ |/| |a|/__0_/__0_/__1_/__1_/o2 p-arm/q-arm) / | | D|2|/| / 9 / 8 / 6 / 14 / UNIV/|1|/|N/| |4|/__0_/__0_/__1_/__0_/o3 / | | |/ |/| /| | | | | PI/|H|/|F| D|2|/ |_0__|_1__|_1__|_0__|g3 / | | |/|N/| / | | | | L/|A|/|3| |/A|/ |_1__|_1__|_0__|g2 /___________________ | | | | | | | | |_0__|_1__|g1 | 1 | 0 | 1 | 0 |e0 | | |____|____|____|____| |_0__|g0 | | | | | : | 0 | 0 | 1 | 1 |e1 Interaction table |____|____|____|____| (unipartite symmetric fact cube) | | | | | (e.g., ProteinProteinInteractionGraph | 1 | 0 | 1 | 1 |e2 or GeneAttributeSimilarityGraph) |____|____|____|____| | | | | | | 1 | 0 | 1 | 0 |e3 ExpGeneDimTbl (0=only coding sequence) |____|____|____|____| ============= :g0 g1 g2 g3

Correlation in Business Intelligence Business Intelligence (BI) problems are very interesting data mining problems. Typical BI training data sets are large and extremely unbalanced among classes. The Business Intelligence problem is a classification problem. Given a history of customer events (purchase or rentals) and satisfaction levels (ratings), one classifies potential future customer events as to most likely satisfaction level. This information is then used to suggest to customers what item they might like to purchase or rent next. Clearly this is the central need for any Internet based company or any company seeking to do targeted advertising (all companies?). The classification training set or Business Intelligence Data Set, BIDS, usually consists of the past history of customer satisfaction rating events (the rating of a product purchased or rented. In the Netflix case, it is a 1-5 rating of a rented movie). Thus, at it’s simplest, the training set looks something like BIDS(customerId, productID, rating), where the features are customerID and productID and the class label is rating. Very often there are other features such as DataOfEvent, etc. Of course there are many other single entity features that might be useful to the classification process, such as Customer Features (name, address, ethnicity, age,…) and Product features (type, DataOfCreation, Creator, color, weight, …).

Correlation in Business Intelligence Together these form a Data Warehouse in the Star Model in which the central fact is BIDS and the Dimension star points are the Customer and Product feature files. We will keep it simple and work only with BIDS. In the more complex case, the Dimension files are usually joined into the central fact file to form the Training Set.   It is assumed that the data is in vertical format. Horizontal format is the standard in the industry and is ubiquitous. However, it is the authors contention (proved valid by the success of the method) that vertical formatting is superior for many data mining applications, whereas for standard data processing, horizontal formatting is still best. Horizontal Data formatting simply means that the data on an entity type is collected into horizontal records (of fields), one for each entity instance (e.g., employee records, one for each employee). In order to process horizontal data, scans down the collection (file) of these horizontal records are required. Of course, these horizontal files can be indexed, providing efficient, alternate entry points, but still, scans are usually necessary (full inversion of the file will eliminate the need for scans but that is extremely expensive to do and maintain, and impossible for large, volatile horizontal files.).

Correlation in Business Intelligence Vertical Data formatting simply means that the data on an entity is collected into vertical slices by feature or by bit position of feature (or by some other vertical slicing of a coding scheme applied to a feature or features of the entity; such as value slicing in which each individual value in a feature column is bitmapped).   In this BIDS classifier, variant forms of Nearest Neighbor Vote based classification were combined with a vertical data structure (Predicate Tree or P-tree ) for efficient processing. Even for small data sets, this would result in a large number of horizontal training database scans to arrive at an acceptable solution. The use of a vertical data structure (P-tree), provides acceptable computation times (as opposed to scanning horizontal record data sets).

Correlation in Business Intelligence: BACKGROUND Similarity and Relevance Analysis Classification models that consider all attributes equally, such as the classical K-Nearest-Neighbor classification (KNN), work well when all attributes are similar in their relevance to the classification task[14]. This is, however, often not the case. The problem is particularly pronounced for situations with a large number of attributes in which some are known to be irrelevant. Many solutions have been proposed that weight dimensions according to their relevance to the classification problem. The weighting can be derived as part of the algorithm [5]. In an alternative strategy the attribute dimensions are scaled, using a evolutionary algorithm to optimize the classification accuracy of a separate algorithm, such as KNN [21]. In an effort to reduce the number of classification voters and to reduce the dimension of the vector space over which these class votes are calculated, judicious customer (also called “user”) and item (also called “movie”) selection proved extremely important. That is to say, if one lets all neighboring users vote over all neighboring movies, the resulting predicted class (rating) turns out be grossly in error. It is also important to note that the vertical representation facilitates efficient attribute relevance in situations with a large number of attributes. Attribute relevance for a particular attribute with respect to the class attribute can be achieved by only reading those two data columns compared to reading all the long rows of data in a horizontal approach.

Correlation in Business Intelligence: A Correlation Based Nearest Neighbor Pruning This Correlation for each pair of data points in a data set can be computed rapidly using vertical P-trees[2]. For example, for each given pair of user, (u,v), the this Correlation (also called 1 Perpendicular Correlation), or PC(u,v), is   [ SQRT{(v-u-vbar+ubar)o(v-u-vbar+ubar) /(a + nb)} ]2 where o is the inner product of the ratings of users, u and v, across n selected movies and a and b are tunable parameters. This approach avoids the requirement of computing all Euclidian distances between the sample and all the other points in the training data, in arriving at the Nearest Neighbor set. But more importantly, it produces a select set of neighbors which, when allowed to vote for the most likely rating, produces very low error. Later in this paper, an explanation will be given of the motivation for this correlation measure and a pictorial description of it will also be provided.

Correlation in Business Intelligence: REFERENCES 1. Abidin T., Perrizo W., SMART-TV: A Fast and Scalable Nearest Neighbor Based Classifier for Data Mining. SAC , 2006. 2. Abidin, T., Perera, A., Serazi,M., Perrizo,W., Vertical Set Square Distance, CATA-2005 3. Q. Ding, M. Khan, A. Roy, W. Perrizo, P-tree Algebra, Proceedings of the ACM Sym. on App. Comp., pp. 426-431, 2002. 4. Bandyopadhyay, S. Muthy, C.A., Pattern Classification Using Genetic Algs. Pattern Recognition Letters, V16, (1995). 5. Cost, S. Salzberg, S., A weighted nearest neighbor algorithm for learning with symbolicfeatures, Machine Learning, 1993 6. DataSURG, P-tree Application Programming Interface Documentation, http://midas.cs.ndsu.nodak.edu/~datasurg/ptree/ 7. Ding, Q., Ding, Q., Perrizo, W., “ARM on RSI Using Ptrees,” Pacific-Asia KDD Conf., pp. 66-79, Taipei, May2002. 8. Duch W, Grudzi ´N.K., and Diercksen G., Neural Minimal Distance Methods., World Congress of Comp. int. IJCNN’98 9. Goldberg, D.E., Genetic Algorithms in Search Optimization, and Machine Learning, Addison Wesley, 1989. 10. Guerra-Salcedo C., Whitley D., Feature Selection mechanisms for ensemble creation, AAAI Workshop, 1999. 11. Jain, A..; Zongker, D. Feature Selection: Evaluation, Application, and Small Sample Perf. IEEE TPAMI, V9:2, 1997 12. Khan M., Ding Q., Perrizo W., k-NN Classification on Spatial Data Streams Using P-trees, Springer LNAI 2002 13. Khan,M., Ding, Q., Perrizo,W., K-Nearest Neighbor Classification, PAKDD, pp. 517-528, 2002. 14. Krishnaiah, P.R., Kanal L.N., Handbook of statistics 2. North Holland, 1982. 15. Kuncheva, L.I., and Jain, L.C.: Designing Classifier Fusion Systems by Genetic Algs. IEEE Trans Ev. Comp. V33 2000 16. Lane, T., ACM Knowledge Discovery and Data Mining Cup 2006, http://www.kdd2006.com/kddcup.html 17. Martin-Bautista M.J., and Vila M.A.: A survey of genetic feature selection in mining issues. Congress on Ev. Comp. 1999 18. Perera, Perrizo, W., et al, Vertical Set Square Distance Based Clustering, Intelligent and Adaptive Systems and SE 2004. 19. Perera, A., Perrizo W., et al, P-tree Classification of Yeast Gene Deletion Data, SIGKDD Explorations, V4:2 2002. 20. Perera A., Perrizo W., Vertical K-Median Clustering, Conference on Computers and Their Applications 2006. 21. Punch, W.F., et al, Further research on feature selection and classification using genetic algorithms, Conf. on Gas, 1993. 22. Rahal, I. Perrizo, W., An Optimized for KNN Text Categorization using P-Trees. Symp. on Applied Computing, 2004. 23. Raymer, M.L. et al, Dim Reduction Using Genetic Algorithms. IEEE Trans Evolutionary Comp, Vol. 4, (2000) 164-171 24. Serazi, M. Perera, A., W., Perrizo,W., et al, DataMIME, ACM SIGMOD, Paris, France, June 2004. 25. Vafaie, H. De Jong, K.: Robust feature Selection algorithms, Tools with AI, 1993. 26. R. A. Fisher, Multiple Measurements in Axonomic Problems. Annals of Eugenics 7, pp. 179-188, 1936