Download presentation
Presentation is loading. Please wait.
Published byMarilyn Franklin Modified over 9 years ago
1
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures
2
Classification and Prediction Eager Learners: when given a set of training tuples, will construct a generalization model before receiving new tuples to classify Classification by decision tree induction Rule-based classification Classification by back propagation Support Vector Machines (SVM) Associative classification
3
Lazy vs. Eager Learning Lazy vs. eager learning Lazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tuple Eager learning (the above discussed methods): Given a set of training set, constructs a classification model before receiving new (e.g., test) data to classify Lazy: less time in training but more time in predicting
4
Lazy Learner: Instance-Based Methods Typical approaches k-nearest neighbor approach Instances represented as points in a Euclidean space.
5
The k-Nearest Neighbor Algorithm All instances correspond to points in the n-D space The nearest neighbor are defined in terms of Euclidean distance, dist(X 1, X 2 ) Target function could be discrete- or real- valued For discrete-valued, k-NN returns the most common value among the k training examples nearest to x q. _ + _ xqxq + __ + _ _ +
6
The k-Nearest Neighbor Algorithm k-NN for real-valued prediction for a given unknown tuple Returns the mean values of the k nearest neighbors Distance-weighted nearest neighbor algorithm Weight the contribution of each of the k neighbors according to their distance to the query x q Give greater weight to closer neighbors Robust to noisy data by averaging k-nearest neighbors
7
The k-Nearest Neighbor Algorithm How can I determine the value of k, the number of neighbors? In general, the larger the number of training tuples is, the larger the value of k is Nearest-neighbor classifiers can be extremely slow when classifying test tuples O(n) By simple presorting and arranging the stored tuples into search tree, the number of comparisons can be reduced to O(logN)
8
The k-Nearest Neighbor Algorithm Example: K=5
12
Fuzzy Set Approaches Rule-based systems for classification have the disadvantage that they involve sharp cutoffs for continuous attributes For example: IF (years_employed>2) AND (income>50K) THEN credit_card=approved What if a customer has 10 years employed and income is 49K?
13
Fuzzy Set Approaches Instead, we can discretize income into categories such as {low,medium,high}, and then apply fuzzy logic to allow “fuzzy” threshold for each category
14
Fuzzy Set Approaches Fuzzy theory is also known as possibility theory, it was proposed by Lotif Zadeh in 1965 Unlike the notion of traditional “crisp” sets where an element either belongs to a set S, in fuzzy theory, elements can belong to more than one fuzzy set
15
Fuzzy Set Approaches For example, the income value $49K belongs to both the medium and high fuzzy sets: M medium ($49K)=0.15 and M high ($49K)=0.96
16
Fuzzy Set Approaches Another example for temperature
17
Classifier Accuracy Measures classes(Real) buy computer = yes (Real) buy computer = no total (Predict) buy computer = yes 69544127366 (Predict) buy computer = no 4625882634 total7000 (Buy Computer) 3000 (Does not buy Computer) 10000
18
Classifier Accuracy Measures Alternative accuracy measures (e.g., for cancer diagnosis) sensitivity = t-pos/pos specificity = t-neg/neg precision = t-pos/(t-pos + f-pos) accuracy =
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.