R for Research Data Analysis using R Day2: Advanced R Baburao Kamble University of Nebraska-Lincoln.

Slides:



Advertisements
Similar presentations
Introduction to Lattice Graphics Richard Pugh 4th December 2012.
Advertisements

Rich Pugh Andy Nicholls Head to Head: Lattice vs ggplot2 Rich Pugh
BA 275 Quantitative Business Methods
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
CHAPTER 24: Inference for Regression
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Chapter 12 Simple Linear Regression
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Linear Regression Example Data
BCOR 1020 Business Statistics Lecture 24 – April 17, 2008.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
Introduction to Regression Analysis, Chapter 13,
Chapter 12 Section 1 Inference for Linear Regression.
Linear Regression/Correlation
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Wednesday, May 13, 2015 Report at 11:30 to Prairieview.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Chapter 10 Inference for Regression
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
CHAPTER 12 More About Regression
MATH-138 Elementary Statistics
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Linear Regression.
Statistics for Managers using Microsoft Excel 3rd Edition
CHAPTER 12 More About Regression
CHAPTER 10 Correlation and Regression (Objectives)
Correlation and Regression
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
PENGOLAHAN DAN PENYAJIAN
2-1 Data Summary and Display 2-1 Data Summary and Display.
Correlation and Regression
CHAPTER 12 More About Regression
Chapter 14 Inference for Regression
CHAPTER 12 More About Regression
Presentation transcript:

R for Research Data Analysis using R Day2: Advanced R Baburao Kamble University of Nebraska-Lincoln

Working with RStudio New R files The command prompt Select Files Plots Packages (for advanced analyses) Help

Agenda R Advanced visualization (ggplot, lattice) Descriptive Statistics Regression Analysis Time Series Data Analysis Forecasting/Prediction Workshop Material:

Advanced Visualization To present R graphics users with enough information to make an informed choice as to which graphics package best meets their needs Simple or Advanced Visualization

Overview of Lattice Graphics One of the graphic systems of R (others include “Traditional” and “ggplot”) An implementation of the S+ “Trellis” Graphics Written by Deepayan Sarkar, Fred Hutchinson Cancer Research Center

List of Lattice Graphic Functions FunctionDescriptionGraph Type xyplot Scatter plotBivariate histogram Univariate histogramUnivariate densityplot Univariate density line plotUnivariate barchart Bar chartUnivariate bwplot Box and whisker plotBivariate qq Normal QQ plotUnivariate dotplot Label dot plotBivariate cloud 3D scatter plot3D wireframe 3D surface plot3D splom Scatter matrix plotData Frame parallel Multivariate parallel plotData Frame

ggplot

Graphing in ggplot2 Library(ggplot2) plotname <- ggplot(data, aes(x = xname, y = yname) + geom_point() ggplot2 graphics work with layers

ggplot demo Adv_Visualization.RAdv_Visualization.R

Descriptive Statistics Quantitatively describing the main features of a collection of information Descriptive statistics shows or summarize data in a meaningful way such that, for example, patterns might emerge from the data Mean Mode Median Standard deviation DescriptiveStatistics.RDescriptiveStatistics.R

Linear Regression Analysis In statistics, regression analysis is a statistical process for estimating the relationships among variables.

Linear Regression Analysis Regression analysis is used to predict the value of one variable (the dependent variable) on the basis of other variables (the independent variables). Dependent variable: denoted Y Independent variables: denoted X 1, X 2, …, X k If we have only one independent variable then model will look like which is referred to as simple linear regression. We would be interested in estimating β 0 and β 1 from the data we collect. Regression.RRegression.R

Interpreting the output No.Name 1Formula 2Residuals 3Estimated Coefficient 4Standard Error of #3 5t-value of #3 6Variable p-value 7Significance Stars 8Significance Legend 9Residual Std Error / Degrees of Freedom 11R-squared 11F-statistic & p- value

Interpreting the output No.NameDescription 1ModelRegression model formula 2ResidualsThe residuals are the difference between the actual values of the variable you're predicting and predicted values from your regression 3Estimated Coefficient The estimated coefficient is the value of slope calculated by the regression. 4Standard Error of #3 Measure of the variability in the estimate for the coefficient. 5t-value of #3Score that measures whether or not the coefficient for this variable is meaningful for the model. t-value is used to calculate p-value and the significance levels. 6Variable p- value Probability the variable is NOT relevant. This number to be as small as possible 7Significance Stars The stars are shorthand for significance levels, with the number of asterisks displayed according to the p-value computed. *** for high significance and * for low significance. 8Significance Legend The more punctuation there is next to your variables, the better. Blank=bad, Dots=pretty good, Stars=good, More Stars=very good 9Residual Std Error / Degrees of Freedom Residual Std Error / Degrees of Freedom. The Degrees of Freedom is the difference between the number of observations included in your training sample and the number of variables used in your model (intercept counts as a variable). 11R-squaredMetric for evaluating the goodness of fit of your model. 11F-statistic & p-value Performs an F-test on the model. This takes the parameters of our model (in our case we only have 1) and compares it to a model that has fewer parameters. The DF, or degrees of freedom, pertains to how many variables are in the model. In our case there is one variable so there is one degree of freedom.

Regression Analysis

Checking the validity of the linear model Residuals vs. fitted: Look for spread around the line y = 0 and no obvious trend. Normal Q-Q plot (Quantile-Quantile) : The residuals are normal if this graph falls close to a straight line. Scale-Location plot shows the square root of the standardized residuals. The tallest points, are the largest residuals. Cook's distance plot identifies points which have a lot of influence in the regression line. Residuals vs. leverages plot shows observations with potentially high influence Cook's distances vs. leverage/(1-leverage) plot(fit)

Time Series Examples Definition: A sequence of measurements over time  Biology  Meteorology  Finance  Social science  Epidemiology  Medicine  Speech  Geophysics  Seismology  Robotics

Seasonal and Trend decomposition using Loess STL is a very versatile and robust method for decomposing time series. STL is an acronym for “Seasonal and Trend decomposition using Loess”, while Loess is a method for estimating nonlinear relationships. The STL method was developed by Cleveland et al. (1990) TrendAnalysis.RTimeSeriesDemo.RTrendAnalysis.RTimeSeriesDemo.R

HOW? WHY? HeatMap.RHeatMap.R How to apply this in presentation?