A Brief Introduction to R Programming Darren J. Fitzpatrick, PhD The Bioinformatics Support Team 27/08/2015.

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

R for Macroecology Aarhus University, Spring 2011.
Biostatistics-Lecture 4 More about hypothesis testing Ruibin Xi Peking University School of Mathematical Sciences.
ANOVA example 4 Polychlorinated biphenyls (PCBs) previously used in the manufacture of large electrical transformers and capacitors, are extremely hazardous.
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
A Simple Guide to Using SPSS© for Windows
Chapter 2 Simple Comparative Experiments
1 Multivariate Analysis and Discrimination EPP 245 Statistical Analysis of Laboratory Data.
1 Confidence Interval for Population Mean The case when the population standard deviation is unknown (the more common case).
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Comparing Population Parameters (Z-test, t-tests and Chi-Square test) Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director,
Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Introduction to programming in MATLAB MATLAB can be thought of as an super-powerful graphing calculator Remember the TI-83 from calculus? With many more.
Lab 5 Hypothesis testing and Confidence Interval.
A Statistical Analysis Example of A Full Functional Utilization of An Engineering Calculator Li-Fei Huang Dept. of App. Statistics.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
Statistics for clinical research An introductory course.
Quantitative Research in Education Sohee Kang Ph.D., lecturer Math and Statistics Learning Centre.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 2b, February 6, 2015 Lab exercises: beginning to work with data: filtering, distributions, populations,
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
R-Graphics Day 2 Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
Trinity College Dublin, The University of Dublin A Brief Introduction to Scientific Programming with Python Karsten Hokamp, PhD TCD Bioinformatics Support.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Introduction to SAS Essentials Mastering SAS for Data Analytics
+ Chapter 12: Inference for Regression Inference for Linear Regression.
1 Experimental Statistics - week 2 Review: 2-sample t-tests paired t-tests Thursday: Meet in 15 Clements!! Bring Cody and Smith book.
PY 603 – Advanced Statistics II TR 12:30-1:45pm 232 Gordon Palmer Hall Jamie DeCoster.
STAT 3130 Statistical Methods I Lecture 1 Introduction.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Illustrations using R B. Jones Dept. of Political Science UC-Davis.
Introduction to Python Lesson 1 First Program. Learning Outcomes In this lesson the student will: 1.Learn some important facts about PC’s 2.Learn how.
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Data Science and Big Data Analytics Chap 3: Data Analytics Using R
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
CHAPTER 27: One-Way Analysis of Variance: Comparing Several Means
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python Karsten Hokamp, PhD Genetics TCD, 03/11/2015.
© 2015 by Wade Rogers Introduction to R Cytomics Workshop December, 2015.
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 2b, February 5, 2016 Lab exercises: beginning to work with data: filtering, distributions, populations,
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Statistical Programming Using the R Language Lecture 3 Hypothesis Testing Darren J. Fitzpatrick, Ph.D April 2016.
Chris Knight Beginners’ workshop.
Statistical Concepts and Analysis in R Fish 552: Lecture 9.
Lecture 9 Statistics in R Trevor A. Branch FISH 552 Introduction to R.
Statistical Programming Using the R Language Lecture 1 Basic Concepts I Darren J. Fitzpatrick, Ph.D April 2016.
MIS2502: Data Analytics Introduction to Advanced Analytics and R.
Pinellas County Schools
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Statistical Programming Using the R Language Lecture 2 Basic Concepts II Darren J. Fitzpatrick, Ph.D April 2016.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 16 : Summary Marshall University Genomics Core Facility.
Common Linear & Classification for Machine Learning using Microsoft R
Statistical Programming Using the R Language
Advanced Data Analytics
Course Review Questions will not be all on one topic, i.e. questions may have parts covering more than one area.
Bivariate Testing (ttests and proportion tests)
Second Annual Cytomics Workshop April, 2017
Introduction to R Programming with AzureML
Discriminant Analysis
Univariate Data Exploration
Bivariate Testing (Chi Square)
MIS2502: Data Analytics Introduction to R and RStudio
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
MIS2502: Data Analytics Introduction to Advanced Analytics and R
Presentation transcript:

A Brief Introduction to R Programming Darren J. Fitzpatrick, PhD The Bioinformatics Support Team 27/08/2015

Overview What is R? Why might it be useful? An Overview of Rstudio A First Program Basic Syntax of R Indexing Rows and Columns Exploratory Data Analysis using R/RStudio

Trinity College Dublin, The University of Dublin What is R and why bother?  R is fundamentally a programming language suitable for data analysis  R has ~4000 packages enabling advanced data analytics, exploration and visualisation  Bioconductor a suite of specialised tools for biological data analysis integrates with R  R has a learning curve but once the basics are mastered, it offers flexibility to deal with any imaginable analytics problem.

Trinity College Dublin, The University of Dublin What can be done?

Trinity College Dublin, The University of Dublin An Overview of RStudio Inbuilt text editor for writing and saving R code Console/Interpreter for running R Code Plots, Packages and HELP!

Trinity College Dublin, The University of Dublin A First Program Write code, select and press “run” R executes code

Trinity College Dublin, The University of Dublin Basic Syntax of R > print('hello world') > [1] "hello world" print() is an inbuilt R function Functions are always of the form function() Arguments are passed to a function using the brackets ‘hello world’ is an argument

Trinity College Dublin, The University of Dublin Basic Syntax of R R has many useful inbuilt functions some of which we will use today. Examples include the following: sum() add numbers together mean() calculate the mean of a set of numbers sd() calculate the standard deviation of a set of numbers t.test() perform a Student’s t-test wilcoxon.test() perform a Wilcoxon/Mann-Whitney test fisher.test() perform a Fisher’s exact test chisq.test() perform a Chi-squared test plot() basic plotting function hist() plot histogram

Trinity College Dublin, The University of Dublin The Iris Data Set > attach(iris)# Fetch data > x <- as.matrix(iris[,-5]) # Make an ugly heatmap > heatmap(x, cexCol=0.7) Let’s look at the data! We will explore the famous Fisher’s Iris Data Set which is available with R. The data is in the form of a data structure called a data frame. A data frame is a tabular representation of data using rows and columns.

Trinity College Dublin, The University of Dublin The Iris Data Set > nrow(iris) # No. of rows [1] 150 > ncol(iris) # No of columns [1] 5 > dim(iris) # The dimensions [1] 150 5

Trinity College Dublin, The University of Dublin The Iris Data Set A nicer heatmap! We will learn to make these plots in an extended R workshop.

Trinity College Dublin, The University of Dublin Indexing Rows and Columns Data frames have a matrix structure comprising rows and columns. To access rows and columns we use indexing. Indexing is of the form: dataset[from row:to row, from col:to col] Some examples: > iris[1,] # The first row of the data Sepal.Length Sepal.Width Petal.Length Petal.Width Species setosa

Trinity College Dublin, The University of Dublin Indexing Rows and Columns > iris[1:5,] # The first 5 rows of the data Sepal.Length Sepal.Width Petal.Length Petal.Width Species setosa setosa setosa setosa setosa > iris[1:5, 1] # The first 5 rows of the first column [1]

Trinity College Dublin, The University of Dublin Indexing Rows and Columns Find the species > names(iris) [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" > unique(iris[,5])# Using column number [1] setosa versicolor virginica > unique(iris$Species)# Using $ and column name Extract Sepal.Length values for the setosa species >setosa_sepal_length <- iris[which(iris$Species=='setosa'), 1] How would you extract the sepal length data for the virginica species?

Trinity College Dublin, The University of Dublin Exploratory Data Analysis > virginica_sepal_length <- iris[which(iris$Species=='virginica'), 1] Defined two variables containing sepal length data for two species. setosa_sepal_length virginica_sepal_length How do we begin to explore this data? Calculate the means of both data sets Calculate the standard deviation for both data sets Plot histograms of both data sets Perform statistics to ask if sepal length differs between species

Trinity College Dublin, The University of Dublin Exploratory Data Analysis > mean(setosa_sepal_length) > mean(virginica_sepal_length) > sd(setosa_sepal_length) > sd(virginica_sepal_length) Do means and standard deviations differ? Would you expect the distributions of the data to differ?

Trinity College Dublin, The University of Dublin Exploratory Data Analysis R can render nice descriptive plots such as boxplots, various flavours of scatterplots and histograms. These require additional knowledge - today we will keep it simple. Code for the plots here can be found in the ‘Additional_Plots.R’ file on

Trinity College Dublin, The University of Dublin Exploratory Data Analysis Look up the hist() function using the help manual. R help always gives the following: The arguments that a function can take A description (not always clear!) of what those arguments are. Try the following: > hist(setosa_sepal_length) > hist(setosa_sepal_length, breaks=10, main='Sepal Length (Setosa)', col='darkred', xlab='Sepal Length')

Trinity College Dublin, The University of Dublin Exploratory Data Analysis You should see something like this!

Trinity College Dublin, The University of Dublin Exploratory Data Analysis Use the hist() function to plot the sepal lengths for the virginica species. Change the title of the graph Change the colour (darkgreen, darkslategrey, purple) Tell R to plot two histograms side by side > par(mfrow=c(1,2)) Now, run your histogram code for both data sets.

Trinity College Dublin, The University of Dublin Exploratory Data Analysis You should see something like this!

Trinity College Dublin, The University of Dublin Hypothesis Testing We want to test if the distributions of sepal lengths in Setosa and Virginica are different to each other. H 0 : mean setosa = mean virginica H 1 : mean setosa ≠ mean virginica Use the help utility to work out how to do a two-sample unpaired t-test. Is there a significant difference in sepal lengths between the two species?

Trinity College Dublin, The University of Dublin Hypothesis Testing > t.test(setosa_sepal_length, virginica_sepal_length) Welch Two Sample t-test data: setosa_sepal_length and virginica_sepal_length t = , df = , p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y

Trinity College Dublin, The University of Dublin Resources The R official website for downloading software and help A free online book – “Statistics in R Using Biological Examples” Quick-R – a site with nice examples of how to do various analyses in R Bioconductor – a suite of R packages for biological data analysis

Trinity College Dublin, The University of Dublin Conclusions You have been briefly introduced to the Rstudio environment and coding in R You are familiar with the basics of variables, data frames, indexing, plotting and hypothesis testing. A more comprehensive R course planned for the near future will include such topics: Coding in R – writing functions, loops and scripts Further exploratory data analysis Further hypothesis testing (Fishers, Chi, Mann-Whitney) Statistical modelling (linear regression, anova) Biological data analysis – GWAS, differential expression, your interests!

Thank You