Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Similar presentations


Presentation on theme: "© 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry."— Presentation transcript:

1 © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry fingerprinting

2 Outline Background  R  Bioconductor Motivating examples Starting R, entering commands How to get help R fundamentals  Sequences and Repeats  Characters and Numbers  Vectors and Matrices  Data Frames and Lists  Importing data from spreadsheets flowCore  Loading flow cytometry (FCS) data  gating  compensation  transformation  visualization flowFP  Binning  Fingerprinting  Comparing multivariate distributions Writing your own functions Installing and running R on your computer Suggestions for further reading and reference

3 Background R  Is an integrated suite of software facilities for data manipulation, simulation, calculation and graphical display.  It handles and analyzes data very effectively and it contains a suite of operators for calculations on arrays and matrices.  In addition, it has the graphical capabilities for very sophisticated graphs and data displays.  It is an elegant, object-oriented programming language.  Started by Robert Gentleman and Ross Ihaka (hence “R”) in 1995  as a free, independent, open-source implementation of the S programming language (now part of Spotfire)  Currently, maintained by the R Core development team – an international group of hard-working volunteer developers http://www.r-project.org http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf

4 Background Bioconductor  “Is an open source and open development software project to provide tools for the analysis and comprehension of genomic data.”  Goals  To provide widespread access to a broad range of powerful statistical and graphical methods for the analysis of genomic data.  To provide a common software platform that enables the rapid development and deployment of extensible, scalable, and interoperable software.  To further scientific understanding by producing high-quality documentation and reproducible research.  To train researchers on computational and statistical methods for the analysis of genomic data. http://bioconductor.org/overview

5 A motivating example I’ve just collected data from a T cell stimulation experiment in a 96-well plate format. I need to gate the data on CD3/CD4. How consistent are the distributions, so that I can establish one set of gates for the whole plate?

6 A motivating example

7 Another motivating example I’m concerned that drawing gates to analyze my data introduces unintended bias. Additionally, since I have multiple data files, drawing multiple gates is time consuming. Can I use R to compute gates and then apply these same objective gating criteria to multiple data files?

8 Another motivating example Autogate lymphocytes and monocytes Automatically analyze FMO tubes

9 Back to the basics R is a command-line driven program  the prompt is: >  you type a command (shown in blue), and R executes the command and gives the answer (shown in black)

10 Simple example: enter a set of measurements use the function c() to combine terms together Create a variable named mfi Put the result of c() into mfi using the assignment operator <- (you can also use =) The [1] indicates that the result is a vector

11 Help, functions, polymorphism  help (log)  ?log  apropos(“log”)

12 Vignettes – really good help!

13 Sequences and Repeats

14 Characters and Numbers Characters and character strings are enclosed in “” or ‘’ Special numbers NA – “Not Available” Inf – “Infinity” NaN – “Not a Number”

15 Vectors and Matrices

16 The subset operator for vectors and matrices is [ ]

17 Vectors and Matrices You can extend the length of a vector via subsetting … but not a matrix

18 Vectors and Matrices However, all’s not lost if you want to extend either the columns … … or rows

19 Data Frames A Data Frame is like a matrix, except that the data type in each column need not be the same  Often, a Data Frame is created from an Excel spreadsheet using the function read.table() Save As… a tab-delimited text file.

20 Data Frames from spreadsheets

21

22

23 Lists

24 Handling Flow Cytometry Data: flowCore flowCore is a base package that supports reading and manipulation of FCS data files The fundamental object that encapsulates the data in an FCS file is a flowFrame A container object that holds a collection of flowFrames is called a flowSet In the next slides we will go over  reading an FCS file  gating  compensation  transformation  visualization

25 Check out the example data

26 Read an FCS file, summarize the flowFrame

27

28

29

30 Apply the lymphocyte gate with Subset

31 needs to be transformed because it is rendering the linear data in the FCS file

32 hasn’t been compensated!

33

34 Lines require library(fields) Percentages are in summary(fres)$p[1:4] Percentages are drawn in the graph with text()

35 Fingerprinting Flow Cytometry Data: flowFP flowFP  aims to transform flow cytometric data into a form amenable to algorithmic analysis tools  Acts as in intermediate step between acquisition of high-throughput FCM data and empirical modeling, machine learning and knowledge discovery  Implements ideas from Roederer M, Moore W, Treister A, Hardy RR & Herzenberg LA. Probability binning comparison: a metric for quantitating multivariate distribution differences. Cytometry 45:47-55, 2001. Rogers WT, Moser AR, Holyst HA, Bantly A, Mohler ER III, Scangas G, and Moore JS, Cytometric Fingerprinting: Quantitative Characterization of Multivariate Distributions, Cytometry 73A: 430-441, 2008. and

36 The basic idea Subdivide multivariate space into bins  Call this a “model” of the space For each flowFrame in a flowSet, count the number of events in each bin in the model Flatten the collection of counts for a flowFrame into a 1D feature vector Combine all of the feature vectors together into a n x m matrix  n = number of flowFrames (instances)  m = number of bins in the model (features) Also, tag each event with its bin membership  facilitates visualization, interpretation  can be used for gating

37 Probability Binning

38

39

40 Bin Number > plot (mod, fs)

41 Class Constructors flowFPModel (base class)  Consumes a flowFrame or flowSet  Produces a model, which is a recipe for subdividing multivariate space flowFP  Consumes a flowFrame or flowSet, and a flowFPModel  Produces a flowFP, which represents the multivariate probability density function as a fingerprint  Also tags each event with its bin membership flowFPPlex  Consumes a collection of flowFPs  The flowFPPlex is a container object to facilitate handling large and complex collections of flowFPs

42

43

44

45

46

47 Writing Your Own Functions comments declaration assignment return code block # #It’s a good idea to comment your code # myfunc <- function (arg1=10, arg2,...) { # your code goes here answer <- log (arg1, base=arg2) return (answer) }

48 Writing Your Own Functions

49

50 Obtaining R and Bioconductor R  http://cran.r-project.org/ Bioconductor  http://bioconductor.org/GettingStarted

51 General Reference Material A good beginner’s guide to R  http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf A nice one-page reference card  http://cran.r-project.org/doc/contrib/Short-refcard.pdf Outstanding summary of R/Bioconductor, with many examples  http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#R_favor ite The definitive reference for writing R extensions (advanced!)  http://cran.r-project.org/doc/manuals/R-exts.pdf Books  William N. Venables and Brian D. Ripley. Modern Applied Statistics with S. Fourth Edition. Springer, New York, 2002. ISBN 0-387-95457-0.  John M. Chambers. Programming with Data. Springer, New York, 1998. ISBN 0-387-98503-4 (aka “the Green Book”)

52 Flow-Specific References Vignettes  http://bioconductor.org/packages/2.6/bioc/vignettes/flowCore/inst/doc/HowTo-flowCore.pdf  http://bioconductor.org/packages/2.6/bioc/vignettes/flowViz/inst/doc/filters.pdf  http://bioconductor.org/packages/2.6/bioc/vignettes/flowStats/inst/doc/GettingStartedWithFlo wStats.pdf  http://bioconductor.org/packages/2.6/bioc/vignettes/flowQ/inst/doc/DataQualityAssessment.p df  http://bioconductor.org/packages/2.6/bioc/vignettes/flowFP/inst/doc/flowFP_HowTo.pdf Original Articles  flowCore  Hahne, F., N. LeMeur, et al. (2009). "flowCore: a Bioconductor package for high throughput flow cytometry." BMC Bioinformatics 10: 106.  Fingerprinting  Rogers, W. T., A. R. Moser, et al. (2008). "Cytometric fingerprinting: quantitative characterization of multivariate distributions." Cytometry A 73(5): 430-41.  Rogers, W. T. and H. A. Holyst (2009). "flowFP: A Bioconductor Package for Fingerprinting Flow Cytometric Data." Advances in Bioinformatics 2009(Article ID 193947): 11.

53 Contact Me! Wade Rogers rogersw@mail.med.upenn.edu 267-350-9680 (o) 610-368-5821 (m)


Download ppt "© 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry."

Similar presentations


Ads by Google