1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 5a, February 23, 2016 Weighted kNN, clustering, “early” trees and Bayesian.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7a, March 10, 2015 Labs: more data, models, prediction, deciding with trees.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6a, February 25, 2014, SAGE 3101 kNN, K-Means, Clustering and Bayesian Inference.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification.
1 Cluster Analysis EPP 245 Statistical Analysis of Laboratory Data.
Two-Sample Problems – Means 1.Comparing two (unpaired) populations 2.Assume: 2 SRSs, independent samples, Normal populations Make an inference for their.
Tree-Based Methods (V&R 9.1) Demeke Kasaw, Andreas Nguyen, Mariana Alvaro STAT 6601 Project.
Bayesian Networks. Male brain wiring Female brain wiring.
© The McGraw-Hill Companies, Inc., by Marc M. Triola & Mario F. Triola SLIDES PREPARED BY LLOYD R. JAISINGH MOREHEAD STATE UNIVERSITY MOREHEAD.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Figure 1.1 Rules for the contact lens data.. Figure 1.2 Decision tree for the contact lens data.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10a, April 1, 2014 Support Vector Machines.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4b, February 20, 2015 Lab: regression, kNN and K- means results, interpreting and evaluating models.
Machine Learning Queens College Lecture 2: Decision Trees.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 8b, March 21, 2014 Using the models, prediction, deciding.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10b, April 4, 2014 Lab: More on Support Vector Machines, Trees, and your projects.
1 Copyright © Cengage Learning. All rights reserved. 3 Descriptive Analysis and Presentation of Bivariate Data.
Kano Model & Multivariate Statistics Dr. Surej P John.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Statistical test for Non continuous variables. Dr L.M.M. Nunn.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7a, March 3, 2014, SAGE 3101 Interpreting weighted kNN, forms of clustering, decision trees and Bayesian.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 11a, April 7, 2014 Support Vector Machines, Decision Trees, Cross- validation.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
JBR1 Support Vector Machines Classification Venables & Ripley Section 12.5 CSU Hayward Statistics 6601 Joseph Rickert & Timothy McKusick December 1, 2004.
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 3b, February 12, 2016 Lab exercises /assignment 2.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Decision Tree Lab. Load in iris data: Display iris data as a sanity.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 6b, March 4, 2016 Interpretation: Regression, Clustering (plotting), Clustergrams, Trees and Hierarchies…
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
Using the models, prediction, deciding
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
More Bayes, Decision trees, and cross-validation
Data Analytics – ITWS-4600/ITWS-6600
Discriminant Analysis
Principal Component Analysis
Overview of Supervised Learning
Chapter 2 Describing Data: Graphs and Tables
Figure 1.1 Rules for the contact lens data.
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Weighted kNN, clustering, “early” trees and Bayesian
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
Classification and clustering - interpreting and exploring data
Assignment 2 (in lab) Peter Fox and Greg Hughes
STAT 312 Introduction Z-Tests and Confidence Intervals for a
ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Lab weighted kNN, decision trees, random forest (“cross-validation” built in – more labs on it later in the course) Peter Fox and Greg Hughes Data Analytics.
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Peter Fox Data Analytics ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 5a, February 23, 2016 Weighted kNN, clustering, “early” trees and Bayesian

Plot tools/ tips r/ pairs, gpairs, scatterplot.matrix, clustergram, etc. data() # precip, presidents, iris, swiss, sunspot.month (!), environmental, ethanol, ionosphere More script fragments in R are available on the web site ( ) 2

Weighted KNN… require(kknn) data(iris) m <- dim(iris)[1] val <- sample(1:m, size = round(m/3), replace = FALSE, prob = rep(1/m, m)) iris.learn <- iris[-val,] iris.valid <- iris[val,] iris.kknn <- kknn(Species~., iris.learn, iris.valid, distance = 1, kernel = "triangular") summary(iris.kknn) fit <- fitted(iris.kknn) table(iris.valid$Species, fit) pcol <- as.character(as.numeric(iris.valid$Species)) pairs(iris.valid[1:4], pch = pcol, col = c("green3", "red”)[(iris.valid$Species != fit)+1]) 3

summary Call: kknn(formula = Species ~., train = iris.learn, test = iris.valid, distance = 1, kernel = "triangular") Response: "nominal" fit prob.setosa prob.versicolor prob.virginica 1 versicolor versicolor versicolor setosa virginica virginica setosa versicolor virginica versicolor virginica

table fit setosa versicolor virginica setosa versicolor virginica

6 Look at Lab5b_kknn1_2016.R pcol <- as.character(as.numeric(iris.valid$Species)) pairs(iris.valid[1:4], pch = pcol, col = c("green3", "red”)[(iris.valid$Species != fit)+1])

Ctrees? We want a means to make decisions – so how about a “if this then this otherwise that” approach == tree methods, or branching. Conditional Inference – what is that? Instead of: if (This1.and. This2.and. This3.and. …) 7

Conditional Inference Tree > require(party) # don’t get me started! > str(iris) 'data.frame':150 obs. of 5 variables: $ Sepal.Length: num $ Sepal.Width : num $ Petal.Length: num $ Petal.Width : num $ Species : Factor w/ 3 levels "setosa","versicolor",..: > iris_ctree <- ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=iris) 8

Ctree > print(iris_ctree) Conditional inference tree with 4 terminal nodes Response: Species Inputs: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width Number of observations: 150 1) Petal.Length <= 1.9; criterion = 1, statistic = )* weights = 50 1) Petal.Length > 1.9 3) Petal.Width <= 1.7; criterion = 1, statistic = ) Petal.Length <= 4.8; criterion = 0.999, statistic = )* weights = 46 4) Petal.Length > 4.8 6)* weights = 8 3) Petal.Width > 1.7 7)* weights = 46 9

plot(iris_ctree) 10 Lab5b_ctree2_2016.R > plot(iris_ctree, type="simple”) # try this

Beyond plot: pairs pairs(iris[1:4], main = "Anderson's Iris Data -- 3 species”, pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)]) 11 Try Lab5b_pairs1_2016.R - USJudgeRatings

But the means for branching.. Do not have to be threshold based ( ~ distance) Can be cluster based = I am more similar to you if I possess these attributes (in this range) Thus: trees + cluster = hierarchical clustering In R: hclust (and others) in stats package 12

Try hclust for iris 13

gpairs(iris) 14 Try Lab5b_gpairs1_2016.R

Better scatterplots 15 install.packages("car") require(car) scatterplotMatrix(iris) Try Lab5b_spm_2016.R

splom(iris) # default 16 Try Lab5b_splom_2016.R

splom extra! require(lattice) super.sym <- trellis.par.get("superpose.symbol") splom(~iris[1:4], groups = Species, data = iris, panel = panel.superpose, key = list(title = "Three Varieties of Iris", columns = 3, points = list(pch = super.sym$pch[1:3], col = super.sym$col[1:3]), text = list(c("Setosa", "Versicolor", "Virginica")))) splom(~iris[1:3]|Species, data = iris, layout=c(2,2), pscales = 0, varnames = c("Sepal\nLength", "Sepal\nWidth", "Petal\nLength"), page = function(...) { ltext(x = seq(.6,.8, length.out = 4), y = seq(.9,.6, length.out = 4), labels = c("Three", "Varieties", "of", "Iris"), cex = 2) }) parallelplot(~iris[1:4] | Species, iris) parallelplot(~iris[1:4], iris, groups = Species, horizontal.axis = FALSE, scales = list(x = list(rot = 90))) 17

18

19

Shift the dataset… 20

Hierarchical clustering > d <- dist(as.matrix(mtcars)) > hc <- hclust(d) > plot(hc) 21

Swiss - pairs 22 pairs(~ Fertility + Education + Catholic, data = swiss, subset = Education < 20, main = "Swiss data, Education < 20")

ctree 23 require(party) swiss_ctree <- ctree(Fertility ~ Agriculture + Education + Catholic, data = swiss) plot(swiss_ctree)

Hierarchical clustering 24 > dswiss <- dist(as.matrix(swiss)) > hs <- hclust(dswiss) > plot(hs)

scatterplotMatrix 25

require(lattice); splom(swiss) 26

27

28

And use a contingency table > data(Titanic) > mdl <- naiveBayes(Survived ~., data = Titanic) > mdl 29 Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.formula(formula = Survived ~., data = Titanic) A-priori probabilities: Survived No Yes Conditional probabilities: Class Survived 1st 2nd 3rd Crew No Yes Sex Survived Male Female No Yes Age Survived Child Adult No Yes Try Lab5b_nbayes1_2016.R

ench/html/HouseVotes84.html require(mlbench) data(HouseVotes84) model <- naiveBayes(Class ~., data = HouseVotes84) predict(model, HouseVotes84[1:10,-1]) predict(model, HouseVotes84[1:10,-1], type = "raw") pred <- predict(model, HouseVotes84[,-1]) table(pred, HouseVotes84$Class) 30

Exercise for you > data(HairEyeColor) > mosaicplot(HairEyeColor) > margin.table(HairEyeColor,3) Sex Male Female > margin.table(HairEyeColor,c(1,3)) Sex Hair Male Female Black Brown Red Blond How would you construct a naïve Bayes classifier and test it? 31

Cars? 32

Linear regression? Or? 33

Ionosphere: Lab5b_kknn2_2016.R require(kknn) data(ionosphere) ionosphere.learn <- ionosphere[1:200,] ionosphere.valid <- ionosphere[-c(1:200),] fit.kknn <- kknn(class ~., ionosphere.learn, ionosphere.valid) table(ionosphere.valid$class, fit.kknn$fit) # vary kernel (fit.train1 <- train.kknn(class ~., ionosphere.learn, kmax = 15, kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 1)) table(predict(fit.train1, ionosphere.valid), ionosphere.valid$class) #alter distance (fit.train2 <- train.kknn(class ~., ionosphere.learn, kmax = 15, kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 2)) table(predict(fit.train2, ionosphere.valid), ionosphere.valid$class) 34

Results ionosphere.learn <- ionosphere[1:200,] # convenience samping!!!! ionosphere.valid <- ionosphere[-c(1:200),] fit.kknn <- kknn(class ~., ionosphere.learn, ionosphere.valid) table(ionosphere.valid$class, fit.kknn$fit) b g b 19 8 g

(fit.train1 <- train.kknn(class ~., ionosphere.learn, kmax = 15, + kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 1)) Call: train.kknn(formula = class ~., data = ionosphere.learn, kmax = 15, distance = 1, kernel = c("triangular", "rectangular", "epanechnikov", "optimal")) Type of response variable: nominal Minimal misclassification: 0.12 Best kernel: rectangular Best k: 2 table(predict(fit.train1, ionosphere.valid), ionosphere.valid$class) b g b 25 4 g

(fit.train2 <- train.kknn(class ~., ionosphere.learn, kmax = 15, + kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 2)) Call: train.kknn(formula = class ~., data = ionosphere.learn, kmax = 15, distance = 2, kernel = c("triangular", "rectangular", "epanechnikov", "optimal")) Type of response variable: nominal Minimal misclassification: 0.12 Best kernel: rectangular Best k: 2 table(predict(fit.train2, ionosphere.valid), ionosphere.valid$class) b g b 20 5 g

However… there is more 38

Naïve Bayes – what is it? Example: testing for a specific item of knowledge that 1% of the population has been informed of (don’t ask how). An imperfect test: –99% of knowledgeable people test positive –99% of ignorant people test negative If a person tests positive – what is the probability that they know the fact? 39

Naïve approach… We have 10,000 representative people 100 know the fact/item, 9,900 do not We test them all: –Get 99 knowing people testing knowing –Get 99 not knowing people testing not knowing –But 99 not knowing people testing as knowing Testing positive (knowing) – equally likely to know or not = 50% 40

Tree diagram ppl 1% know (100ppl) 99% test to know (99ppl) 1% test not to know (1per) 99% do not know (9900ppl) 1% test to know (99ppl) 99% test not to know (9801ppl) 41

Relation between probabilities For outcomes x and y there are probabilities of p(x) and p (y) that either happened If there’s a connection, then the joint probability = that both happen = p(x,y) Or x happens given y happens = p(x|y) or vice versa then: –p(x|y)*p(y)=p(x,y)=p(y|x)*p(x) So p(y|x)=p(x|y)*p(y)/p(x) (Bayes’ Law) E.g. p(know|+ve)=p(+ve|know)*p(know)/p(+ve)= (.99*.01)/(.99* *.99) =

How do you use it? If the population contains x what is the chance that y is true? p(SPAM|word)=p(word|SPAM)*p(SPAM)/p(w ord) Base this on data: –p(spam) counts proportion of spam versus not –p(word|spam) counts prevalence of spam containing the ‘word’ –p(word|!spam) counts prevalence of non-spam containing the ‘word’ 43

Or.. What is the probability that you are in one class (i) over another class (j) given another factor (X)? Invoke Bayes: Maximize p(X|Ci)p(Ci)/p(X) (p(X)~constant and p(Ci) are equal if not known) So: conditional indep - 44

P(x k | C i ) is estimated from the training samples – Categorical: Estimate P(x k | C i ) as percentage of samples of class i with value x k Training involves counting percentage of occurrence of each possible value for each class –Numeric: Actual form of density function is generally not known, so “normal” density (i.e. distribution) is often assumed 45

Digging into iris classifier<-naiveBayes(iris[,1:4], iris[,5]) table(predict(classifier, iris[,-5]), iris[,5], dnn=list('predicted','actual')) classifier$apriori classifier$tables$Petal.Length plot(function(x) dnorm(x, 1.462, ), 0, 8, col="red", main="Petal length distribution for the 3 different species") curve(dnorm(x, 4.260, ), add=TRUE, col="blue") curve(dnorm(x, 5.552, ), add=TRUE, col = "green") 46

47

Bayes > cl <- kmeans(iris[,1:4], 3) > table(cl$cluster, iris[,5]) setosa versicolor virginica # > m <- naiveBayes(iris[,1:4], iris[,5]) > table(predict(m, iris[,1:4]), iris[,5]) setosa versicolor virginica setosa versicolor virginica pairs(iris[1:4],main="Iris Data (red=setosa,green=versicolor,blue=virginica)", pch=21, bg=c("red","green3","blue")[u nclass(iris$Species)])

Ex: Classification Bayes Retrieve the abalone.csv dataset Predicting the age of abalone from physical measurements. Perform naivebayes classification to get predictors for Age (Rings). Interpret. Discuss on Friday. 49

Using a contingency table > data(Titanic) > mdl <- naiveBayes(Survived ~., data = Titanic) > mdl 50 Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.formula(formula = Survived ~., data = Titanic) A-priori probabilities: Survived No Yes Conditional probabilities: Class Survived 1st 2nd 3rd Crew No Yes Sex Survived Male Female No Yes Age Survived Child Adult No Yes

Using a contingency table > predict(mdl, as.data.frame(Titanic)[,1:3]) [1] Yes No No No Yes Yes Yes Yes No No No No Yes Yes Yes Yes Yes No No No Yes Yes Yes Yes No [26] No No No Yes Yes Yes Yes Levels: No Yes 51

At this point… You may realize the inter-relation among classifications and clustering methods, at an absolute and relative level (i.e. hierarchical -> trees…) is COMPLEX… –Trees are interesting from a decision perspective: if this or that, then this…. Beyond just distance measures: clustering (kmeans) to probabilities (Bayesian) And, so many ways to visualize them… 52