Weighted kNN, clustering, “early” trees and Bayesian

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7a, March 10, 2015 Labs: more data, models, prediction, deciding with trees.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6a, February 25, 2014, SAGE 3101 kNN, K-Means, Clustering and Bayesian Inference.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification.
ARE OBSERVATIONS OBTAINED DIFFERENT?. ARE OBSERVATIONS OBTAINED DIFFERENT? You use different statistical tests for different problems. We will examine.
Copyright ©2011 Pearson Education 4-1 Chapter 4 Basic Probability Statistics for Managers using Microsoft Excel 6 th Global Edition.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 4-1 Chapter 4 Basic Probability Business Statistics: A First Course 5 th Edition.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 8b, March 21, 2014 Using the models, prediction, deciding.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7a, March 3, 2014, SAGE 3101 Interpreting weighted kNN, forms of clustering, decision trees and Bayesian.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
Basic Business Statistics Assoc. Prof. Dr. Mustafa Yüzükırmızı
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 4-1 Chapter 4 Basic Probability Basic Business Statistics 11 th Edition.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Introduction to Probability and Bayesian Decision Making Soo-Hyung Kim Department of Computer Science Chonnam National University.
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 3b, February 12, 2016 Lab exercises /assignment 2.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 4-1 Chapter 4 Basic Probability Business Statistics: A First Course 5 th Edition.
Introduction to Classifiers Fujinaga. Bayes (optimal) Classifier (1) A priori probabilities: and Decision rule: given and decide if and probability of.
Chap 4-1 Chapter 4 Using Probability and Probability Distributions.
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 5a, February 23, 2016 Weighted kNN, clustering, “early” trees and Bayesian.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
I. ANOVA revisited & reviewed
Using the models, prediction, deciding
Anticipating Patterns Statistical Inference
INTRODUCTION AND DEFINITIONS
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
More Bayes, Decision trees, and cross-validation
Matt Gormley Lecture 3 September 7, 2016
Data Analytics – ITWS-4600/ITWS-6600
Variables and Data A variable is a characteristic that changes or varies over time and/or for different individuals or objects under consideration. Examples:
Chapter 4 Using Probability and Probability Distributions
Group 1 Lab 2 exercises /assignment 2
Chapter 4 Basic Probability.
Classification, Clustering and Bayes…
Data Analytics – ITWS-4963/ITWS-6965
CH 5: Multivariate Methods
Discriminant Analysis
Overview of Supervised Learning
Chapter 2 Describing Data: Graphs and Tables
Figure 1.1 Rules for the contact lens data.
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
CATEGORICAL DATA CHAPTER 3
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Group 1 Lab 2 exercises and Assignment 2
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Classifiers Fujinaga.
Chapter 4 Basic Probability.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Chapter 2 Probability Sample Spaces and Events
Classification, Clustering and Bayes…
Assignment 2 (in lab) Peter Fox and Greg Hughes
Elementary Statistics 8th Edition
ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Lab weighted kNN, decision trees, random forest (“cross-validation” built in – more labs on it later in the course) Peter Fox and Greg Hughes Data Analytics.
Classifiers Fujinaga.
Mathematical Foundations of BME
Displaying Data – Charts & Graphs
Classification, Clustering and Bayes…
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Group 1 Lab 2 exercises and Assignment 2
Peter Fox Data Analytics ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Presentation transcript:

Weighted kNN, clustering, “early” trees and Bayesian Peter Fox Data Analytics – ITWS-4600/ITWS-6600/MATP-4450 Group 2 Module 6, February 12, 2018

Plot tools/ tips http://statmethods.net/advgraphs/layout.html http://flowingdata.com/2014/02/27/how-to-read-histograms-and-use-them-in-r/ pairs, gpairs, scatterplot.matrix, clustergram, etc. data() # precip, presidents, iris, swiss, sunspot.month (!), environmental, ethanol, ionosphere More script fragments in R are available on the web site (http://aquarius.tw.rpi.edu/html/DA )

And use a contingency table > data(Titanic) > mdl <- naiveBayes(Survived ~ ., data = Titanic) > mdl Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.formula(formula = Survived ~ ., data = Titanic) A-priori probabilities: Survived No Yes 0.676965 0.323035 Conditional probabilities: Class Survived 1st 2nd 3rd Crew No 0.08187919 0.11208054 0.35436242 0.45167785 Yes 0.28551336 0.16596343 0.25035162 0.29817159 Sex Survived Male Female No 0.91543624 0.08456376 Yes 0.51617440 0.48382560 Age Survived Child Adult No 0.03489933 0.96510067 Yes 0.08016878 0.91983122 Lab5b_nbayes1.R

http://www. ugrad. stat. ubc. ca/R/library/mlbench/html/HouseVotes84 http://www.ugrad.stat.ubc.ca/R/library/mlbench/html/HouseVotes84.html require(mlbench) data(HouseVotes84) model <- naiveBayes(Class ~ ., data = HouseVotes84) predict(model, HouseVotes84[1:10,-1]) predict(model, HouseVotes84[1:10,-1], type = "raw") pred <- predict(model, HouseVotes84[,-1]) table(pred, HouseVotes84$Class)

Exercise for you > data(HairEyeColor) > mosaicplot(HairEyeColor) > margin.table(HairEyeColor,3) Sex Male Female 279 313 > margin.table(HairEyeColor,c(1,3)) Hair Male Female Black 56 52 Brown 143 143 Red 34 37 Blond 46 81 How would you construct a naïve Bayes classifier and test it?

Cars?

Linear regression? Or?

Ionosphere: group2/lab2_kknn2.R require(kknn) data(ionosphere) ionosphere.learn <- ionosphere[1:200,] ionosphere.valid <- ionosphere[-c(1:200),] fit.kknn <- kknn(class ~ ., ionosphere.learn, ionosphere.valid) table(ionosphere.valid$class, fit.kknn$fit) # vary kernel (fit.train1 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15, kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 1)) table(predict(fit.train1, ionosphere.valid), ionosphere.valid$class) #alter distance (fit.train2 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15, kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 2)) table(predict(fit.train2, ionosphere.valid), ionosphere.valid$class)

Results ionosphere.learn <- ionosphere[1:200,] # convenience samping!!!! ionosphere.valid <- ionosphere[-c(1:200),] fit.kknn <- kknn(class ~ ., ionosphere.learn, ionosphere.valid) table(ionosphere.valid$class, fit.kknn$fit) b g b 19 8 g 2 122

(fit. train1 <- train. kknn(class ~. , ionosphere (fit.train1 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15, + kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 1)) Call: train.kknn(formula = class ~ ., data = ionosphere.learn, kmax = 15, distance = 1, kernel = c("triangular", "rectangular", "epanechnikov", "optimal")) Type of response variable: nominal Minimal misclassification: 0.12 Best kernel: rectangular Best k: 2 table(predict(fit.train1, ionosphere.valid), ionosphere.valid$class) b g b 25 4 g 2 120

(fit. train2 <- train. kknn(class ~. , ionosphere (fit.train2 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15, + kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 2)) Call: train.kknn(formula = class ~ ., data = ionosphere.learn, kmax = 15, distance = 2, kernel = c("triangular", "rectangular", "epanechnikov", "optimal")) Type of response variable: nominal Minimal misclassification: 0.12 Best kernel: rectangular Best k: 2 table(predict(fit.train2, ionosphere.valid), ionosphere.valid$class) b g b 20 5 g 7 119

However… there is more

Naïve Bayes – what is it? Example: testing for a specific item of knowledge that 1% of the population has been informed of (don’t ask how). An imperfect test: 99% of knowledgeable people test positive 99% of ignorant people test negative If a person tests positive – what is the probability that they know the fact?

Naïve approach… We have 10,000 representative people 100 know the fact/item, 9,900 do not We test them all: Get 99 knowing people testing knowing Get 99 not knowing people testing not knowing But 99 not knowing people testing as knowing Testing positive (knowing) – equally likely to know or not = 50%

Tree diagram 10000 ppl 1% know (100ppl) 99% test to know (99ppl) 1% test not to know (1per) 99% do not know (9900ppl) 1% test to know (99ppl) 99% test not to know (9801ppl)

Relation between probabilities For outcomes x and y there are probabilities of p(x) and p (y) that either happened If there’s a connection, then the joint probability = that both happen = p(x,y) Or x happens given y happens = p(x|y) or vice versa then: p(x|y)*p(y)=p(x,y)=p(y|x)*p(x) So p(y|x)=p(x|y)*p(y)/p(x) (Bayes’ Law) E.g. p(know|+ve)=p(+ve|know)*p(know)/p(+ve)= (.99*.01)/(.99*.01+.01*.99) = 0.5

How do you use it? If the population contains x what is the chance that y is true? p(SPAM|word)=p(word|SPAM)*p(SPAM)/p(word) Base this on data: p(spam) counts proportion of spam versus not p(word|spam) counts prevalence of spam containing the ‘word’ p(word|!spam) counts prevalence of non-spam containing the ‘word’

Or.. What is the probability that you are in one class (i) over another class (j) given another factor (X)? Invoke Bayes: Maximize p(X|Ci)p(Ci)/p(X) (p(X)~constant and p(Ci) are equal if not known) So: conditional indep -

P(xk | Ci) is estimated from the training samples Categorical: Estimate P(xk | Ci) as percentage of samples of class i with value xk Training involves counting percentage of occurrence of each possible value for each class Numeric: Actual form of density function is generally not known, so “normal” density (i.e. distribution) is often assumed

Digging into iris classifier<-naiveBayes(iris[,1:4], iris[,5]) table(predict(classifier, iris[,-5]), iris[,5], dnn=list('predicted','actual')) classifier$apriori classifier$tables$Petal.Length plot(function(x) dnorm(x, 1.462, 0.1736640), 0, 8, col="red", main="Petal length distribution for the 3 different species") curve(dnorm(x, 4.260, 0.4699110), add=TRUE, col="blue") curve(dnorm(x, 5.552, 0.5518947 ), add=TRUE, col = "green")

Bayes > cl <- kmeans(iris[,1:4], 3) > table(cl$cluster, iris[,5]) setosa versicolor virginica 2 0 2 36 1 0 48 14 3 50 0 0 # > m <- naiveBayes(iris[,1:4], iris[,5]) > table(predict(m, iris[,1:4]), iris[,5]) setosa 50 0 0 versicolor 0 47 3 virginica 0 3 47 pairs(iris[1:4],main="Iris Data (red=setosa,green=versicolor,blue=virginica)", pch=21, bg=c("red","green3","blue")[unclass(iris$Species)])

Ex: Classification Bayes Retrieve the abalone.csv dataset Predicting the age of abalone from physical measurements. Perform naivebayes classification to get predictors for Age (Rings). Interpret. Discuss in lab or on LMS.

Using a contingency table > data(Titanic) > mdl <- naiveBayes(Survived ~ ., data = Titanic) > mdl Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.formula(formula = Survived ~ ., data = Titanic) A-priori probabilities: Survived No Yes 0.676965 0.323035 Conditional probabilities: Class Survived 1st 2nd 3rd Crew No 0.08187919 0.11208054 0.35436242 0.45167785 Yes 0.28551336 0.16596343 0.25035162 0.29817159 Sex Survived Male Female No 0.91543624 0.08456376 Yes 0.51617440 0.48382560 Age Survived Child Adult No 0.03489933 0.96510067 Yes 0.08016878 0.91983122

Using a contingency table > predict(mdl, as.data.frame(Titanic)[,1:3]) [1] Yes No No No Yes Yes Yes Yes No No No No Yes Yes Yes Yes Yes No No No Yes Yes Yes Yes No [26] No No No Yes Yes Yes Yes Levels: No Yes

At this point… You may realize the inter-relation among classifications and clustering methods, at an absolute and relative level (i.e. hierarchical -> trees…) is COMPLEX… Trees are interesting from a decision perspective: if this or that, then this…. Beyond just distance measures: clustering (kmeans) to probabilities (Bayesian) And, so many ways to visualize them…