1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7a, March 10, 2015 Labs: more data, models, prediction, deciding with trees.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6a, February 25, 2014, SAGE 3101 kNN, K-Means, Clustering and Bayesian Inference.
TYPES OF DATA. Qualitative vs. Quantitative Data A qualitative variable is one in which the “true” or naturally occurring levels or categories taken by.
What is Statistical Modeling
Machine Learning in R and its use in the statistical offices
1 Cluster Analysis EPP 245 Statistical Analysis of Laboratory Data.
Learning Bayesian Networks
BCOR 1020 Business Statistics
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Exercise Session 10 – Image Categorization
1 Statistics This lecture covers chapter 1 and 2 sections in Howell Why study maths in psychology? “Mathematics has the advantage of teaching you.
Tree-Based Methods (V&R 9.1) Demeke Kasaw, Andreas Nguyen, Mariana Alvaro STAT 6601 Project.
Bayesian Networks. Male brain wiring Female brain wiring.
Univariate Graphs III Review Create histogram from Commands Window. Multipanel histogram. Quantile Plots Quantile-Normal Plots Quantile-Quantile Plots.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Figure 1.1 Rules for the contact lens data.. Figure 1.2 Decision tree for the contact lens data.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10a, April 1, 2014 Support Vector Machines.
Machine Learning Queens College Lecture 2: Decision Trees.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 8b, March 21, 2014 Using the models, prediction, deciding.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10b, April 4, 2014 Lab: More on Support Vector Machines, Trees, and your projects.
Kano Model & Multivariate Statistics Dr. Surej P John.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
The Three Analytics Techniques. Decision Trees – Determining Probability.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7a, March 3, 2014, SAGE 3101 Interpreting weighted kNN, forms of clustering, decision trees and Bayesian.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 11a, April 7, 2014 Support Vector Machines, Decision Trees, Cross- validation.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
JBR1 Support Vector Machines Classification Venables & Ripley Section 12.5 CSU Hayward Statistics 6601 Joseph Rickert & Timothy McKusick December 1, 2004.
In Stat-I, we described data by three different ways. Qualitative vs Quantitative Discrete vs Continuous Measurement Scales Describing Data Types.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Introduction to Probability and Bayesian Decision Making Soo-Hyung Kim Department of Computer Science Chonnam National University.
Types of Data: Ratio Specific intervals between consecutive numbers True zero value.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Charts Overview PowerPoint Prepared by Alfred P.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Decision Tree Lab. Load in iris data: Display iris data as a sanity.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Statistical Concepts and Analysis in R Fish 552: Lecture 9.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 5a, February 23, 2016 Weighted kNN, clustering, “early” trees and Bayesian.
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 6b, March 4, 2016 Interpretation: Regression, Clustering (plotting), Clustergrams, Trees and Hierarchies…
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
Using the models, prediction, deciding
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
More Bayes, Decision trees, and cross-validation
Data Analytics – ITWS-4963/ITWS-6965
Discriminant Analysis
Principal Component Analysis
Overview of Supervised Learning
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Weighted kNN, clustering, “early” trees and Bayesian
K-Means Lab.
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
Classification and clustering - interpreting and exploring data
STAT 312 Introduction Z-Tests and Confidence Intervals for a
ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Lab weighted kNN, decision trees, random forest (“cross-validation” built in – more labs on it later in the course) Peter Fox and Greg Hughes Data Analytics.
Mathematical Foundations of BME
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Peter Fox Data Analytics ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Presentation transcript:

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification

Plot tools/ tips r/ pairs, gpairs, scatterplot.matrix, clustergram, etc. data() # precip, presidents, iris, swiss, sunspot.month (!), environmental, ethanol, ionosphere More script fragments in R will be available on the web site ( ) 2

Weighted KNN? require(kknn) data(iris) m <- dim(iris)[1] val <- sample(1:m, size = round(m/3), replace = FALSE, prob = rep(1/m, m)) iris.learn <- iris[-val,] iris.valid <- iris[val,] iris.kknn <- kknn(Species~., iris.learn, iris.valid, distance = 1, kernel = "triangular") summary(iris.kknn) fit <- fitted(iris.kknn) table(iris.valid$Species, fit) pcol <- as.character(as.numeric(iris.valid$Species)) pairs(iris.valid[1:4], pch = pcol, col = c("green3", "red”)[(iris.valid$Species != fit)+1]) 3

4 Look at Lab5b_wknn_2015.R

Ctree > iris_ctree <- ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=iris) > print(iris_ctree) Conditional inference tree with 4 terminal nodes Response: Species Inputs: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width Number of observations: 150 1) Petal.Length <= 1.9; criterion = 1, statistic = )* weights = 50 1) Petal.Length > 1.9 3) Petal.Width <= 1.7; criterion = 1, statistic = ) Petal.Length <= 4.8; criterion = 0.999, statistic = )* weights = 46 4) Petal.Length > 4.8 6)* weights = 8 3) Petal.Width > 1.7 7)* weights = 46 5

plot(iris_ctree) 6 Try Lab6b_5_2014.R > plot(iris_ctree, type="simple”) # try this

Swiss - pairs 7 pairs(~ Fertility + Education + Catholic, data = swiss, subset = Education < 20, main = "Swiss data, Education < 20")

New dataset - ionosphere require(kknn) data(ionosphere) ionosphere.learn <- ionosphere[1:200,] ionosphere.valid <- ionosphere[-c(1:200),] fit.kknn <- kknn(class ~., ionosphere.learn, ionosphere.valid) table(ionosphere.valid$class, fit.kknn$fit) # vary kernel (fit.train1 <- train.kknn(class ~., ionosphere.learn, kmax = 15, kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 1)) table(predict(fit.train1, ionosphere.valid), ionosphere.valid$class) #alter distance (fit.train2 <- train.kknn(class ~., ionosphere.learn, kmax = 15, kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 2)) table(predict(fit.train2, ionosphere.valid), ionosphere.valid$class) 8

Results ionosphere.learn <- ionosphere[1:200,] # convenience samping!!!! ionosphere.valid <- ionosphere[-c(1:200),] fit.kknn <- kknn(class ~., ionosphere.learn, ionosphere.valid) table(ionosphere.valid$class, fit.kknn$fit) b g b 19 8 g

(fit.train1 <- train.kknn(class ~., ionosphere.learn, kmax = 15, + kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 1)) Call: train.kknn(formula = class ~., data = ionosphere.learn, kmax = 15, distance = 1, kernel = c("triangular", "rectangular", "epanechnikov", "optimal")) Type of response variable: nominal Minimal misclassification: 0.12 Best kernel: rectangular Best k: 2 table(predict(fit.train1, ionosphere.valid), ionosphere.valid$class) b g b 25 4 g

(fit.train2 <- train.kknn(class ~., ionosphere.learn, kmax = 15, + kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 2)) Call: train.kknn(formula = class ~., data = ionosphere.learn, kmax = 15, distance = 2, kernel = c("triangular", "rectangular", "epanechnikov", "optimal")) Type of response variable: nominal Minimal misclassification: 0.12 Best kernel: rectangular Best k: 2 table(predict(fit.train2, ionosphere.valid), ionosphere.valid$class) b g b 20 5 g

However… there is more 12

Bayes > cl <- kmeans(iris[,1:4], 3) > table(cl$cluster, iris[,5]) setosa versicolor virginica # > m <- naiveBayes(iris[,1:4], iris[,5]) > table(predict(m, iris[,1:4]), iris[,5]) setosa versicolor virginica setosa versicolor virginica pairs(iris[1:4],main="Iris Data (red=setosa,green=versicolor,blue=virginica)", pch=21, bg=c("red","green3","blue")[u nclass(iris$Species)])

Using a contingency table > data(Titanic) > mdl <- naiveBayes(Survived ~., data = Titanic) > mdl 14 Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.formula(formula = Survived ~., data = Titanic) A-priori probabilities: Survived No Yes Conditional probabilities: Class Survived 1st 2nd 3rd Crew No Yes Sex Survived Male Female No Yes Age Survived Child Adult No Yes

Using a contingency table > predict(mdl, as.data.frame(Titanic)[,1:3]) [1] Yes No No No Yes Yes Yes Yes No No No No Yes Yes Yes Yes Yes No No No Yes Yes Yes Yes No [26] No No No Yes Yes Yes Yes Levels: No Yes 15

Naïve Bayes – what is it? Example: testing for a specific item of knowledge that 1% of the population has been informed of (don’t ask how). An imperfect test: –99% of knowledgeable people test positive –99% of ignorant people test negative If a person tests positive – what is the probability that they know the fact? 16

Naïve approach… We have 10,000 representative people 100 know the fact/item, 9,900 do not We test them all: –Get 99 knowing people testing knowing –Get 99 not knowing people testing not knowing –But 99 not knowing people testing as knowing Testing positive (knowing) – equally likely to know or not = 50% 17

Tree diagram ppl 1% know (100ppl) 99% test to know (99ppl) 1% test not to know (1per) 99% do not know (9900ppl) 1% test to know (99ppl) 99% test not to know (9801ppl) 18

Relation between probabilities For outcomes x and y there are probabilities of p(x) and p (y) that either happened If there’s a connection then the joint probability - both happen = p(x,y) Or x happens given y happens = p(x|y) or vice versa then: –p(x|y)*p(y)=p(x,y)=p(y|x)*p(x) So p(y|x)=p(x|y)*p(y)/p(x) (Bayes’ Law) E.g. p(know|+ve)=p(+ve|know)*p(know)/p(+ve)= (.99*.01)/(.99* *.99) =

How do you use it? If the population contains x what is the chance that y is true? p(SPAM|word)=p(word|SPAM)*p(SPAM)/p(w ord) Base this on data: –p(spam) counts proportion of spam versus not –p(word|spam) counts prevalence of spam containing the ‘word’ –p(word|!spam) counts prevalence of non-spam containing the ‘word’ 20

Or.. What is the probability that you are in one class (i) over another class (j) given another factor (X)? Invoke Bayes: Maximize p(X|Ci)p(Ci)/p(X) (p(X)~constant and p(Ci) are equal if not known) So: conditional indep - 21

P(x k | C i ) is estimated from the training samples – Categorical: Estimate P(x k | C i ) as percentage of samples of class i with value x k Training involves counting percentage of occurrence of each possible value for each class –Numeric: Actual form of density function is generally not known, so “normal” density is often assumed 22

Digging into iris classifier<-naiveBayes(iris[,1:4], iris[,5]) table(predict(classifier, iris[,-5]), iris[,5], dnn=list('predicted','actual')) classifier$apriori classifier$tables$Petal.Length plot(function(x) dnorm(x, 1.462, ), 0, 8, col="red", main="Petal length distribution for the 3 different species") curve(dnorm(x, 4.260, ), add=TRUE, col="blue") curve(dnorm(x, 5.552, ), add=TRUE, col = "green") 23

24

Decision tree (example) > require(party) # don’t get me started! > str(iris) 'data.frame':150 obs. of 5 variables: $ Sepal.Length: num $ Sepal.Width : num $ Petal.Length: num $ Petal.Width : num $ Species : Factor w/ 3 levels "setosa","versicolor",..: > iris_ctree <- ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=iris) 25

plot(iris_ctree) 26 Try Lab6b_5_2014.R > plot(iris_ctree, type="simple”) # try this

Beyond plot: pairs pairs(iris[1:4], main = "Anderson's Iris Data -- 3 species”, pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)]) 27 Try Lab6b_2_2014.R - USJudgeRatings

Try hclust for iris 28

gpairs(iris) 29 Try Lab6b_3_2014.R

Better scatterplots 30 install.packages("car") require(car) scatterplotMatrix(iris) Try Lab6b_4_2014.R

splom(iris) # default 31 Try Lab6b_7_2014.R

splom extra! require(lattice) super.sym <- trellis.par.get("superpose.symbol") splom(~iris[1:4], groups = Species, data = iris, panel = panel.superpose, key = list(title = "Three Varieties of Iris", columns = 3, points = list(pch = super.sym$pch[1:3], col = super.sym$col[1:3]), text = list(c("Setosa", "Versicolor", "Virginica")))) splom(~iris[1:3]|Species, data = iris, layout=c(2,2), pscales = 0, varnames = c("Sepal\nLength", "Sepal\nWidth", "Petal\nLength"), page = function(...) { ltext(x = seq(.6,.8, length.out = 4), y = seq(.9,.6, length.out = 4), labels = c("Three", "Varieties", "of", "Iris"), cex = 2) }) parallelplot(~iris[1:4] | Species, iris) parallelplot(~iris[1:4], iris, groups = Species, horizontal.axis = FALSE, scales = list(x = list(rot = 90))) > Lab6b_7_2014.R 32

33

34

Using a contingency table > data(Titanic) > mdl <- naiveBayes(Survived ~., data = Titanic) > mdl 35 Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.formula(formula = Survived ~., data = Titanic) A-priori probabilities: Survived No Yes Conditional probabilities: Class Survived 1st 2nd 3rd Crew No Yes Sex Survived Male Female No Yes Age Survived Child Adult No Yes Try Lab6b_9_2014.R

ench/html/HouseVotes84.html require(mlbench) data(HouseVotes84) model <- naiveBayes(Class ~., data = HouseVotes84) predict(model, HouseVotes84[1:10,-1]) predict(model, HouseVotes84[1:10,-1], type = "raw") pred <- predict(model, HouseVotes84[,-1]) table(pred, HouseVotes84$Class) 36

Exercise for you > data(HairEyeColor) > mosaicplot(HairEyeColor) > margin.table(HairEyeColor,3) Sex Male Female > margin.table(HairEyeColor,c(1,3)) Sex Hair Male Female Black Brown Red Blond How would you construct a naïve Bayes classifier and test it? 37

Hierarchical clustering > d <- dist(as.matrix(mtcars)) > hc <- hclust(d) > plot(hc) 38

ctree 39 require(party) swiss_ctree <- ctree(Fertility ~ Agriculture + Education + Catholic, data = swiss) plot(swiss_ctree)

Hierarchical clustering 40 > dswiss <- dist(as.matrix(swiss)) > hs <- hclust(dswiss) > plot(hs)

scatterplotMatrix 41

require(lattice); splom(swiss) 42

43

44

At this point… You may realize the inter-relation among classification at an absolute and relative level (i.e. hierarchical -> trees…) –Trees are interesting from a decision perspective: if this or that, then this…. Beyond just distance measures (kmeans) to probabilities (Bayesian) So many ways to visualize them… 45