Course in Statistics and Data analysis Course B DAY2 September 2009 Stephan Frickenhaus www.awi.de/en/go/bioinformatics.

Slides:



Advertisements
Similar presentations
Regression and correlation methods
Advertisements

Chapter 12 Simple Linear Regression
Inference for Regression
Correlation and regression Dr. Ghada Abo-Zaid
Data in R. General form of data ID numberSexWeightLengthDiseased… 112m … 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX!
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Chapter 10 Regression. Defining Regression Simple linear regression features one independent variable and one dependent variable, as in correlation the.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Lecture 5 Regression. Homework Issues…past 1.Bad Objective: Conduct an experiment because I have to for this class 2.Commas – ugh  3.Do not write out.
Plots, Correlations, and Regression Getting a feel for the data using plots, then analyzing the data with correlations and linear regression.
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
Correlation and Regression. Relationships between variables Example: Suppose that you notice that the more you study for an exam, the better your score.
Regression Basics For Business Analysis If you've ever wondered how two or more things relate to each other, or if you've ever had your boss ask you to.
PLAY. I Spy the “Menu Bar” I Spy the “Title Bar”
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Hypothesis Testing in Linear Regression Analysis
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Year 9 Business & Enterprise Competition. Lesson Aims: Collate and process data from your questionnaire Present information in a suitable format Draw.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Chapter 10 Correlation and Regression
 Graph of a set of data points  Used to evaluate the correlation between two variables.
Regression. Population Covariance and Correlation.
Relationship between two variables Two quantitative variables: correlation and regression methods Two qualitative variables: contingency table methods.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Lecture 4 Introduction to Multiple Regression
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Creating a Residual Plot and Investigating the Correlation Coefficient.
Correlation. Up Until Now T Tests, Anova: Categories Predicting a Continuous Dependent Variable Correlation: Very different way of thinking about variables.
C HAPTER 4: I NTRODUCTORY L INEAR R EGRESSION Chapter Outline 4.1Simple Linear Regression Scatter Plot/Diagram Simple Linear Regression Model 4.2Curve.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Example x y We wish to check for a non zero correlation.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Free Powerpoint Templates ROHANA BINTI ABDUL HAMID INSTITUT E FOR ENGINEERING MATHEMATICS (IMK) UNIVERSITI MALAYSIA PERLIS.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
REGRESSION REVISITED. PATTERNS IN SCATTER PLOTS OR LINE GRAPHS Pattern Pattern Strength Strength Regression Line Regression Line Linear Linear y = mx.
Correlation and Regression Stats. T-Test Recap T Test is used to compare two categories of data – Ex. Size of finch beaks on Baltra island vs. Isabela.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Stats Methods at IC Lecture 3: Regression.
Section 12.2 Linear Regression
Regression and Correlation
Correlation and Simple Linear Regression
B&A ; and REGRESSION - ANCOVA B&A ; and
Multiple Regression.
Statistics Correlation
2. Find the equation of line of regression
Multivariate Data Summary
STA 282 – Regression Analysis
Hypothesis Testing and Comparing Two Proportions
Hypothesis testing and Estimation
Statistical Inference about Regression
Simple Linear Regression
Eviews Tutorial for Labor Economics Lei Lei
Inferential Statistics
Ch 4.1 & 4.2 Two dimensions concept
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Course in Statistics and Data analysis Course B DAY2 September 2009 Stephan Frickenhaus

DAY2 How to import data from excel Multivariate data in plots, linear models ANOVA Ideas of Clustering and Modelling

Import of Data from Excel Change in R to the directory where the „tab1.txt“ is (File->Change Dir.). Load in R into a variable V V=read.table(file=“tab1.t xt“,header=T) You may use the column named „day“ as row-names: V=read.table(file=“tab1.t xt“,header=T, row.names=“day“) Copy a rectangular part (or all) from your table Paste into a TEXT-file in the Windows- EDITOR check column-names Save as „tab1.txt“

Problems In case of prblems with decimal `,` or `.`: Tell R which is the decimal point in read.table If you get a text-file with commas, not tabs separating columns: V=read.table(…, dec=“.“) V=read.table(…, sep=“,“)

Saving result tables R-analysis results, e.g., from filtering etc. are sometimes exported to text-files - can be imported in Excel or R later Do this without quotes for each entry: write.table(V, file=“res.txt“, quote=F) Save only 2 desired columns („size“ and „class“): write.table(rbind(V$size,V$class), file=“res2.txt“)

Multivariate Data Suppose we have Diameter and Height of Diatoms measured Work with „diatoms.txt“ What is the relation between these? It one dependent on the other? What is the strategy of the organism?

Correlation test Is there a significant correlation? cor.test(D,H) Checks if the observed correlation is significant non-zero We find negative corr., near -1 (strong) A good p-value shows significant correlation.

Text We can conclude that these diatoms show a special trend: increasing height, when decreasing diameter. What does this mean? Can we say that this has a compensating function? It could be that the cell does maintain volume (centric shape). Volume V=R^2*pi*H = 1/4 D^2 *pi * H So we expect a linear relation between H and 1/D^2 we need a regression… …it is found in R: lm(Y~X) Try ?lm to see how. See „diatoms.R“

Linear models To fit a model to data Suppose we have a sample of measured (y,x1,x2,x3) The simplest model showing influence of all 3 x has the form y=a*x1+b*x2+c*x3+d Coefficients a,b,c,d obtained from lm(y~x1+x2+x3) Each coefficients value may be non- significant, so it could as well be set to zero. summary(lm()) shows these significances

Check „lm.R“ The data y was created with coefficients 1, 1, 0.5 and a random term runif/3 We see estimates of these coefficients from the fit under „Estimate“. Now, we could write the fitted model as y.fit(x1,x2,x3)= *x *x *x3 Use this to draw a ± error bar around the y.fit If you want no intercept, use y~x1+x2+x3-1

conclusions Variables x with significant coefficients, i.e., Pr(|t|>)<alpha, are said to have an effect on y. Sometimes there are relations between the explaining varibles, say x1 and x2 are correlated, like x2=2*x1. Then, y=c1*x1+c2*x2 can be reduced like Y=(c1+2)*x1

ANOVA With two different treatments we make the t- test to compare means. The influence of a factor/treatment with more than 2 variants is commonly analysed by ANOVA, i.e., more than two means are compared at the same time. The Null is that all samples means are from the same pop [the treatment has no effect].

ANOVA In R ist like linear models, but with factors that influence the means. See dataset ANOVA.txt Try aov(y~f.c) A weak p, effect may be unclear because of the other factors

But which means do differ? f.c has 3 levels. We are not allowed to look at the means of each level. We must make all pairwise comparisons for significance This is known as „post-hoc“-test One is TukeyHSD It gives a table of pairwise tests of means Since data is used more than once, well discover more likely some effect. HSD corrects p- values for multiple-tests

Post-hoc Almost significant effect, comparing group 1 with 0 adjusted p for 3 tests

A graphical view plot(y~f.c)

Compare with a T-test So, the adjusted p-value 0.06 from HSD is greater

Ideas of clustering and modeling Clustering is a way to detect/display groups in data that might point to a factor which affects the sample. Different ways: –Mapping: plot multivariate data in a special way to see groups –Discriminant analysis: use a known factor (e.g., strain) to find a maping that best seperates the known groups Use the discriminant to classify new data !!!

PCA Download data PCA.txt See PCA.R to make a PCA for that multivariate data PC1 is rotated data, with maximal variance PC2 has smaller variance we could separate / discriminate with this line

Linear Discriminant check LDA.R and LDA.txt to see similar results the original 3-class 3D-data in a 2D LDA new data (squares) classified (predicted) accoring to the LDA