New Methods in Ecology Complex statistical tests, and why we should be cautious!

Slides:



Advertisements
Similar presentations
Computing in Archaeology Session 12. Multivariate statistics © Richard Haddlesey
Advertisements

Step three: statistical analyses to test biological hypotheses General protocol continued.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Lesson 10: Linear Regression and Correlation
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Correlation and regression
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Lecture 7: Principal component analysis (PCA)
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Multivariate Methods Pattern Recognition and Hypothesis Testing.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
x – independent variable (input)
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
10/17/071 Read: Ch. 15, GSF Comparing Ecological Communities Part Two: Ordination.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Factor Analysis Psy 524 Ainsworth.
Multiple Regression Analysis The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Relationships between Variables. Two variables are related if they move together in some way Relationship between two variables can be strong, weak or.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Introduction to Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 15: Correlation and Regression Part 2: Hypothesis Testing and Aspects of a Relationship.
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
Next Colin Clarke-Hill and Ismo Kuhanen 1 Analysing Quantitative Data 1 Forming the Hypothesis Inferential Methods - an overview Research Methods Analysing.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Complex Analytic Designs. Outcomes (DVs) Predictors (IVs)1 ContinuousMany Continuous1 CategoricalMany Categorical None(histogram)Factor Analysis: PCA,
Available at Chapter 13 Multivariate Analysis BCB 702: Biostatistics
Interpreting Principal Components Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University L i n.
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
The Statistical Imagination Chapter 15. Correlation and Regression Part 2: Hypothesis Testing and Aspects of a Relationship.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 22.
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Education 795 Class Notes Factor Analysis Note set 6.
Chapter 13.  Both Principle components analysis (PCA) and Exploratory factor analysis (EFA) are used to understand the underlying patterns in the data.
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Principal Component Analysis
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Descriptive measures of the degree of linear association R-squared and correlation.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Canonical Correlation Analysis (CCA). CCA This is it! The mother of all linear statistical analysis When ? We want to find a structural relation between.
Some statistical musings Naomi Altman Penn State 2015 Dagstuhl Workshop.
Stats Methods at IC Lecture 3: Regression.
Exploratory Factor Analysis
Unsupervised Learning
Exploring Microarray data
Dimension Reduction in Workers Compensation
Principal Components Analysis
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Interpreting Principal Components
Descriptive Statistics vs. Factor Analysis
Covariance Vs Correlation Matrix
Principal Components Analysis
Principal Component Analysis
Factor Analysis (Principal Components) Output
Unsupervised Learning
Presentation transcript:

New Methods in Ecology Complex statistical tests, and why we should be cautious!

Complex tests Logistic Regression Principal Components Analysis Cluster Analysis Multivariate Multivariate tests mean you have a single explanatory variable, but multiple response variables.

Logistic Regression

Insects were exposed to a pesticide to determine the effectiveness of the treatment. The response is dead individuals from a sample DoseDeadBatch

Linear regression on the proportions killed vs dose dose At dose 0, Proportion killed is less than 0 (negative deaths?) and greater than dose 4, get > 100% mortality! P(kill) = ax + b

Need to ensure the model is bounded by 0 and 1, build a new equation No longer have impossible predictions, and the model fits better dose P(kill)

dose P(kill) Can now look at what proportion would be killed at a particular dosage

Logistic regression issues… Implementing and coding the model can be difficult Can be tough to work through the equation Is it easier to design around the issue? DoseDeadBatch Use the same number in each batch, use “number dead” as the response variable?

Multivariate Statistics Single explanatory variable, multiple response variables Multivariate tests can be useful and insightful Can be deeply confusing Very often misused Difficult to explain the results Used to mask bad designs, confuse/impress stupid people.

Parrots in Bonaire Sam Williams Sam collected a load of data on different aspects of the birds’ biology

Parrots in Bonaire What to do with all this? 1 descriptive variable (nest) Multiple response variables Principal component analysis…

Principal Component Analysis Obtains values for as many principle components as there are response variables Each PC accounts for some more of the total variation Each nest has a PC value for each PC Each response variable has a rotation value for each PC What do these PC values and rotation values relate to? God knows

Principal Component Output Principle Component Scree plot, first few Principal components account for much of the variation

Principal Component Output Biplot of the first 2 principle components Can be used to look for correlations Some significance tests (redundancy analysis) Lots of noise!

Other use of PCA each nest/individual/replicate has a value of each Principal component Can use these values as a response variable, and subject to other tests Called “Dimensionality Reduction”

Salmon Genomics and Survival Gene expression data for ~16000 genes, from ~300 fish. Each fish is a replicate, each gene is a response variable

16000 genes is lot of data, and a lot of variation. Do a PCA on the genes, use the PC values as a response variable Reduces the dimension of the data, rather than response variables, now have 1 (PC1, or PC2) Can then use this in other tests. Salmon Genomics and Survival

Principle component Related value of PC1 to survival of the fish, showed a correlation for one stock

days Proportion surviving ScotchChilkoAdams Salmon Genomics and Survival Condensed the gene expression data into something useable Method insanely complex and computer intensive Still don’t really know what PC1 is!

Cluster Analysis Like PCA, a multivariate method Unlike PCA, looks for patterns within the data Produces a hierarchical cluster Groups similar individuals together Unsupervised Have to then decide where groups lie Try and relate the grouping to something else?

Cluster Analysis

Multivariate Summary Multivariate statistics are useful for data mining Often used when data collection was done improperly/you’ve been given data sets Can indicate how to proceed Can be very messy Totally opposite to the a priori “carry out an experiment to test a hypothesis” idea.

Can be very useful and insightful if used properly More complex doesn’t necessarily mean better Can be difficult to interpret Remember the golden rule – know how to analyse the type of data you will collect, before you collect it! Complex stats Summary