Multivariate statistics

Slides:



Advertisements
Similar presentations
Associative Learning Memories -SOLAR_A
Advertisements

BIOL 582 Lecture Set 22 One-Way MANOVA, Part II Post-hoc exercises Discriminant Analysis.
Component Analysis (Review)
Exercise 1 In the ISwR data set alkfos, do a PCA of the placebo and Tamoxifen groups separately, then together. Plot the first two principal components.
Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2010.
Multivariate statistics
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Problem: SVM training is expensive – Mining for hard negatives, bootstrapping Solution: LDA (Linear Discriminant Analysis). – Extremely fast training,
An introduction to Principal Component Analysis (PCA)
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
1 Cluster Analysis EPP 245 Statistical Analysis of Laboratory Data.
1 Data Analysis  Data Matrix Variables ObjectsX1X1 X2X2 X3X3 …XPXP n.
Canonical correlations
JBR1 Linear Discriminant Analysis zTwo approaches – Fisher & Mahalanobi zFor two-group discrimination - essentially equivalent to multiple regression zFor.
Stat 112: Lecture 8 Notes Homework 2: Due on Thursday Assessing Quality of Prediction (Chapter 3.5.3) Comparing Two Regression Models (Chapter 4.4) Prediction.
Basics of discriminant analysis
Contingency tables and Correspondence analysis Contingency table Pearson’s chi-squared test for association Correspondence analysis using SVD Plots References.
1 Cluster Analysis EPP 245 Statistical Analysis of Laboratory Data.
1 Multivariate Analysis and Discrimination EPP 245 Statistical Analysis of Laboratory Data.
1 Multivariate Analysis and Discrimination EPP 245 Statistical Analysis of Laboratory Data.
Foundation of High-Dimensional Data Visualization
Evaluating Classifiers
Chapter 3 Data Exploration and Dimension Reduction 1.
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Some matrix stuff.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Xuhua Xia Slide 1 MANOVA All statistical methods we have learned so far have only one continuous DV and one or more IVs which may be continuous or categorical.
Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor.
Complex Analytic Designs. Outcomes (DVs) Predictors (IVs)1 ContinuousMany Continuous1 CategoricalMany Categorical None(histogram)Factor Analysis: PCA,
Available at Chapter 13 Multivariate Analysis BCB 702: Biostatistics
Canonical Correlation Psy 524 Andrew Ainsworth. Matrices Summaries and reconfiguration.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
Multivariate Data Analysis Chapter 1 - Introduction.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Validation.
Chapter 6 Cross Validation.
Discriminant Analysis
CS623: Introduction to Computing with Neural Nets (lecture-16) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Multivariate Statistics with Grouped Units Hal Whitehead BIOL4062/5062.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Introduction to Classifiers Fujinaga. Bayes (optimal) Classifier (1) A priori probabilities: and Decision rule: given and decide if and probability of.
MULTIVARIATE ANALYSIS. Multivariate analysis  It refers to all statistical techniques that simultaneously analyze multiple measurements on objects under.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
SPH 247 Statistical Analysis of Laboratory Data. Supervised and Unsupervised Learning Logistic regression and Fisher’s LDA and QDA are examples of supervised.
Principal Component Analysis
LECTURE 10: DISCRIMINANT ANALYSIS
DETERMINANTS A determinant is a number associated to a square matrix. Determinants are possible only for square matrices.
Statistical Techniques
EPP 245/298 Statistical Analysis of Laboratory Data
Discriminant Analysis
Classifiers Fujinaga.
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Example of PCR, interpretation of calibration equations
Principal Component Analysis (PCA)
Principal Component Analysis and Linear Discriminant Analysis
Model Selection In multiple regression we often have many explanatory variables. How do we find the “best” model?
Investigation 4 Students will be able to identify correlations in data and calculate and interpret standard deviation.
Linear Discriminant Analysis
L.C.M (Lowest Common Multiple) & G.C.F (Greatest Common Factor)
Introduction PCA (Principal Component Analysis) Characteristics:
CS623: Introduction to Computing with Neural Nets (lecture-15)
Combining satellite imagery and machine learning to predict poverty
Classifiers Fujinaga.
LECTURE 09: DISCRIMINANT ANALYSIS
Introduction.
Fig. 4 Visualization of the 20 occipital lobe models, trained to predict EmoNet categories from brain responses to emotional images. Visualization of the.
8/22/2019 Exercise 1 In the ISwR data set alkfos, do a PCA of the placebo and Tamoxifen groups separately, then together. Plot the first two principal.
Scatter plots of the posterior probability values belonging to normal and IM gastric tissue categories calculated by (A) FP; (B) HW; and (C) integrated.
Presentation transcript:

Multivariate statistics PCA: principal component analysis Correspondence analysis Canonical correlation Discriminant function analysis LDA (linear discriminant function analysis) QDA(quadratic …) Cluster analysis MANOVA Xuhua Xia

lda in MASS package library(MASS) fit<-lda(Species~.,data=iris,CV=T) ct<-table(iris$Species,fit$class) prop.table(ct,1) # each row sum to 1 prop.table(ct,2) # each column sum to 1 diag(prop.table(ct,1)) sum(diag(prop.table(ct))) fit<-lda(Species~.,data=iris) pred<-predict(fit,iris) nd<-data.frame(LD1=pred$x[,1],LD2=pred$x[,2],Sp=iris$Species) library(ggplot2) ggplot(data=nd)+geom_point(aes(x=LD1,y=LD2,color=Sp)) install.packages("klaR") library(klaR) partimat(Species~.,data=iris,method="qda") CV=T will do leave-one-out cross validation. The predicted group ID is in fit$class Xuhua Xia

-2 -1 1 2 3 -10 -5 5 10 LD1 LD2 Sp setosa versicolor virginica

qda in MASS package library(MASS) fit<-qda(Species~.,data=iris,CV=T) fit2<-qda(Species~.,data=iris,CV=T,method="mle") ct<-table(iris$Species,fit2$class) diag(prop.table(ct,1)) sum(diag(prop.table(ct))) library(klaR) partimat(Species~.,data=iris,method="qda") Xuhua Xia

Cross-validation Holdout method with jackknifing K-fold cross validation: Divide data in each category into K sets. Each set will be used a test set and the rest will be pooled as training data. Every data point gets to be in a test set exactly once, and gets to be in a training set k-1 times. Leave-one-out cross validation: K-fold cross validation taken to its extreme, with K equal to N, the number of data points in the set. Xuhua Xia