Second Annual Cytomics Workshop April, 2017

Slides:



Advertisements
Similar presentations
PRE-SCHOOL QUANT WORKSHOP II R THROUGH EXCEL. NEW YORK TIMES INFOGRAPHICS GALARY The Jobless Rate for People Like You Home Prices in Selected Cities For.
Advertisements

Enhancing Spotfire with the Power of R
Programming Paradigms and languages
Templates and Styles Excel Advanced. Templates are pre- designed and formatted spreadsheets –They provide consistency of layout/structure –They.
Introduction to BioConductor Friday 23th nov 2007 Ståle Nygård Statistical methods and bioinformatics for the analysis of microarray.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
Welcome to E-Prime E-Prime refers to the Experimenter’s Prime (best) development studio for the creation of computerized behavioral research. E-Prime is.
Data Analytics and Dynamic Languages Lee E. Edlefsen, Ph.D. VP of Engineering 1.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
© 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.
What is R Muhammad Omer. What is R  R is the programing language software for statistical computing and data analysis  The R language is extensively.
Introduction to R Statistical Software Anthony (Tony) R. Olsen USEPA ORD NHEERL Western Ecology Division Corvallis, OR (541)
1 An Introduction – UCF, Methods in Ecology, Fall 2008 An Introduction By Danny K. Hunt & Eric D. Stolen Getting Started with R (with speaker notes)
XP New Perspectives on Introducing Microsoft Office XP Tutorial 1 1 Introducing Microsoft Office XP Tutorial 1.
Introduction to MATLAB adapted from Dr. Rolf Lakaemper.
Introduction to MATLAB Session 1 Prepared By: Dina El Kholy Ahmed Dalal Statistics Course – Biomedical Department -year 3.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Introduction to SPSS Edward A. Greenberg, PhD
CONTENTS Arrival Characters Definition Merits Chararterstics Workflows Wfms Workflow engine Workflows levels & categories.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
1 Lab of COMP 406 Teaching Assistant: Pei-Yuan Zhou Contact: Lab 1: 12 Sep., 2014 Introduction of Matlab (I)
1 Computer Programming (ECGD2102 ) Using MATLAB Instructor: Eng. Eman Al.Swaity Lecture (1): Introduction.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
A Brief Introduction to R Programming Darren J. Fitzpatrick, PhD The Bioinformatics Support Team 27/08/2015.
What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation tools. Others include Maple Mathematica MathCad.
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
1 Software Reliability Analysis Tools Joel Henry, Ph.D. University of Montana.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Lecture 20: Choosing the Right Tool for the Job. What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation.
Application Software System Software.
CIS 601 Fall 2003 Introduction to MATLAB Longin Jan Latecki Based on the lectures of Rolf Lakaemper and David Young.
© 2015 by Wade Rogers Introduction to R Cytomics Workshop December, 2015.
PROGRAMMING IN R Introduction to R. In this session I will: Introduce you to the R program and windows Show how to install R Write basic programs in R.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
Introduction to R Aedín Culhane
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Pinellas County Schools
Introduction to R Chris Free. Introduction to R Free! Superior (if not comparable) to commercial alternatives Available on all platforms Not just for.
Introduction to Algorithm. What is Algorithm? an algorithm is any well-defined computational procedure that takes some value, or set of values, as input.
EMPA Statistical Analysis
CST 1101 Problem Solving Using Computers
Miscellaneous Excel Combining Excel and Access.
GO! with Microsoft Office 2016
Introduction to Visual Basic 2008 Programming
Introduction to R Carolina Salge March 29, 2017.
Matlab Training Session 4: Control, Flow and Functions
GO! with Microsoft Access 2016
Outline Matlab tutorial How to start and exit Matlab Matlab basics.
INTRODUCTION TO BASIC MATLAB
MATLAB DENC 2533 ECADD LAB 9.
Introduction to MATLAB
An introduction to data analysis using R
Welcome to E-Prime E-Prime refers to the Experimenter’s Prime (best) development studio for the creation of computerized behavioral research. E-Prime is.
Use of Mathematics using Technology (Maltlab)
Matlab tutorial course
Code is on the Website Outline Comparison of Excel and R
Digital Image Processing
Communication and Coding Theory Lab(CS491)
Experiment No. (1) - an introduction to MATLAB
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Simulation And Modeling
R Statistical Language
R Course 1st Lecture.
Chapter 17 JavaScript Arrays
Presentation transcript:

Second Annual Cytomics Workshop April, 2017 Introduction to R Second Annual Cytomics Workshop April, 2017

Outline Background Motivating examples Starting R, entering commands Bioconductor Motivating examples Starting R, entering commands How to get help R fundamentals Sequences and Repeats Characters and Numbers Vectors and Matrices Data Frames and Lists Importing data from spreadsheets briefly emphasize that R is an excellent tool for data (statistical) analysis - powerful array of analysis tools - for flow, can help eliminate human bias - automate repetitive analysis - operate on very large data sets

R R Is an integrated suite of software facilities for data manipulation, simulation, calculation and graphical display. It handles and analyzes data very effectively and it contains a suite of operators for calculations on arrays and matrices. In addition, it has the graphical capabilities for very sophisticated graphs and data displays. It is an elegant, object-oriented programming language. Started by Robert Gentleman and Ross Ihaka (hence “R”) in 1995 as a free, independent, open-source implementation of the S programming language (now part of Spotfire) Currently, maintained by the R Core development team – an international group of hard-working volunteer developers http://www.r-project.org http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf

Bioconductor Bioconductor “Is an open source and open development software project to provide tools for the analysis and comprehension of genomic data.” Goals To provide widespread access to a broad range of powerful statistical and graphical methods for the analysis of genomic data. To provide a common software platform that enables the rapid development and deployment of extensible, scalable, and interoperable software. To further scientific understanding by producing high-quality documentation and reproducible research. To train researchers on computational and statistical methods for the analysis of genomic data. http://bioconductor.org/overview

Flow Cytometry in Bioconductor About 40 packages specific to flow cytometry available in Bioconductor What’s so different about flow cytometry anyway?

A motivating example I’ve just collected data from a T cell stimulation experiment in a 96-well plate format. I need to gate the data on CD3/CD4. How consistent are the distributions, so that I can establish one set of gates for the whole plate and be confident that the results are valid for all of the wells?

A motivating example

Another motivating example I’m concerned that drawing gates to analyze my data introduces unintended bias. Additionally, since I have multiple data files, drawing multiple gates is time consuming. Can I use R to compute gates and then apply these same objective gating criteria to multiple data files?

Another motivating example Automated gating of rare events

A third example I often drain my tubes since I’m trying to acquire as many events as I can from a limited sample for a rare event assay. I’m concerned that the disruption of flow near the beginning and end of the acquisition (and sometimes in the middle due to minor clogs) may introduce an “artificial phenotype”. Is there some way to automatically detect and edit out portions of a file that aren’t consistent with the rest? Cleaning data when the tube runs out

A third example Cleaning data when the tube runs out

Back to the basics R is a command-line driven program the prompt is: > you type a command (shown in blue), and R executes the command and gives the answer (shown in black) R follows exactly the directions you give it – even if these are not the directions you mean to give it! You must be very precise since R is case sensitive and has many syntactical requirements. However, once you learn these simple rules, it is an extremely fast and dynamic tool to analyze data. And don’t worry, there are many many help tools which we will explore later…

Simple example: enter a set of measurements use the function c()to combine terms together Create a variable named mfi Put the result of c() into mfi using the assignment operator <- (you can also use =) The [1] indicates that the result is a vector Everything in R is a function [e.g. c()] Emphasize the create a variable step as crucial – in R you are constantly defining and redefining variables, so it’s a good idea to keep track of the variables you assign!

Rstudio Rstudio is an Integrated Development Environment (IDE) for R.

Rstudio Console

Rstudio Editor

Rstudio Env, History

Rstudio Your best friend

Rstudio lower right pane

Rstudio lower right pane

Rstudio lower right pane

Rstudio lower right pane

Rstudio help

Rstudio help

Rstudio help

Rstudio help

Package Vignette – really good help!

basic data structures

Sequences and Repeats

Characters and Numbers Characters and character strings are enclosed in “” or ‘’ Special numbers NA – “Not Available” Inf – “Infinity” NaN – “Not a Number”

Factors Factors capture categorical data (variables that take on discrete, often descriptive, values) We’ll see more about factors when we talk about data frames …

Vectors and Matrices

Vectors and Matrices The subset operator for vectors and matrices is [ ] Explain what a subset operator IS.

Vectors and Matrices You can extend the length of a vector via subsetting … but not a matrix

Vectors and Matrices However, all's not lost if you want to extend either the columns … … or rows

Data Frames A Data Frame is like a matrix, except that the data type in each column need not be the same (data polymorphism) Often, a Data Frame is created from an Excel spreadsheet using the function read.table() or read.csv() Save As… a tab-delimited text file.

Data Frames from spreadsheets

Data Frames from spreadsheets

Data Frames from spreadsheets

Lists Lists are to Vectors as Data Frames are to Matrices