Download presentation
Presentation is loading. Please wait.
Published byMilton Gregory Modified over 9 years ago
1
© 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry fingerprinting
2
Outline Background R Bioconductor Motivating examples Starting R, entering commands How to get help R fundamentals Sequences and Repeats Characters and Numbers Vectors and Matrices Data Frames and Lists Importing data from spreadsheets flowCore Loading flow cytometry (FCS) data gating compensation transformation visualization flowFP Binning Fingerprinting Comparing multivariate distributions Writing your own functions Installing and running R on your computer Suggestions for further reading and reference
3
Background R Is an integrated suite of software facilities for data manipulation, simulation, calculation and graphical display. It handles and analyzes data very effectively and it contains a suite of operators for calculations on arrays and matrices. In addition, it has the graphical capabilities for very sophisticated graphs and data displays. It is an elegant, object-oriented programming language. Started by Robert Gentleman and Ross Ihaka (hence “R”) in 1995 as a free, independent, open-source implementation of the S programming language (now part of Spotfire) Currently, maintained by the R Core development team – an international group of hard-working volunteer developers http://www.r-project.org http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf
4
Background Bioconductor “Is an open source and open development software project to provide tools for the analysis and comprehension of genomic data.” Goals To provide widespread access to a broad range of powerful statistical and graphical methods for the analysis of genomic data. To provide a common software platform that enables the rapid development and deployment of extensible, scalable, and interoperable software. To further scientific understanding by producing high-quality documentation and reproducible research. To train researchers on computational and statistical methods for the analysis of genomic data. http://bioconductor.org/overview
5
A motivating example I’ve just collected data from a T cell stimulation experiment in a 96-well plate format. I need to gate the data on CD3/CD4. How consistent are the distributions, so that I can establish one set of gates for the whole plate?
6
A motivating example
7
Another motivating example I’m concerned that drawing gates to analyze my data introduces unintended bias. Additionally, since I have multiple data files, drawing multiple gates is time consuming. Can I use R to compute gates and then apply these same objective gating criteria to multiple data files?
8
Another motivating example Autogate lymphocytes and monocytes Automatically analyze FMO tubes
9
Back to the basics R is a command-line driven program the prompt is: > you type a command (shown in blue), and R executes the command and gives the answer (shown in black)
10
Simple example: enter a set of measurements use the function c() to combine terms together Create a variable named mfi Put the result of c() into mfi using the assignment operator <- (you can also use =) The [1] indicates that the result is a vector
11
Help, functions, polymorphism help (log) ?log apropos(“log”)
12
Vignettes – really good help!
13
Sequences and Repeats
14
Characters and Numbers Characters and character strings are enclosed in “” or ‘’ Special numbers NA – “Not Available” Inf – “Infinity” NaN – “Not a Number”
15
Vectors and Matrices
16
The subset operator for vectors and matrices is [ ]
17
Vectors and Matrices You can extend the length of a vector via subsetting … but not a matrix
18
Vectors and Matrices However, all’s not lost if you want to extend either the columns … … or rows
19
Data Frames A Data Frame is like a matrix, except that the data type in each column need not be the same Often, a Data Frame is created from an Excel spreadsheet using the function read.table() Save As… a tab-delimited text file.
20
Data Frames from spreadsheets
23
Lists
24
Handling Flow Cytometry Data: flowCore flowCore is a base package that supports reading and manipulation of FCS data files The fundamental object that encapsulates the data in an FCS file is a flowFrame A container object that holds a collection of flowFrames is called a flowSet In the next slides we will go over reading an FCS file gating compensation transformation visualization
25
Check out the example data
26
Read an FCS file, summarize the flowFrame
30
Apply the lymphocyte gate with Subset
31
needs to be transformed because it is rendering the linear data in the FCS file
32
hasn’t been compensated!
34
Lines require library(fields) Percentages are in summary(fres)$p[1:4] Percentages are drawn in the graph with text()
35
Fingerprinting Flow Cytometry Data: flowFP flowFP aims to transform flow cytometric data into a form amenable to algorithmic analysis tools Acts as in intermediate step between acquisition of high-throughput FCM data and empirical modeling, machine learning and knowledge discovery Implements ideas from Roederer M, Moore W, Treister A, Hardy RR & Herzenberg LA. Probability binning comparison: a metric for quantitating multivariate distribution differences. Cytometry 45:47-55, 2001. Rogers WT, Moser AR, Holyst HA, Bantly A, Mohler ER III, Scangas G, and Moore JS, Cytometric Fingerprinting: Quantitative Characterization of Multivariate Distributions, Cytometry 73A: 430-441, 2008. and
36
The basic idea Subdivide multivariate space into bins Call this a “model” of the space For each flowFrame in a flowSet, count the number of events in each bin in the model Flatten the collection of counts for a flowFrame into a 1D feature vector Combine all of the feature vectors together into a n x m matrix n = number of flowFrames (instances) m = number of bins in the model (features) Also, tag each event with its bin membership facilitates visualization, interpretation can be used for gating
37
Probability Binning
40
Bin Number > plot (mod, fs)
41
Class Constructors flowFPModel (base class) Consumes a flowFrame or flowSet Produces a model, which is a recipe for subdividing multivariate space flowFP Consumes a flowFrame or flowSet, and a flowFPModel Produces a flowFP, which represents the multivariate probability density function as a fingerprint Also tags each event with its bin membership flowFPPlex Consumes a collection of flowFPs The flowFPPlex is a container object to facilitate handling large and complex collections of flowFPs
47
Writing Your Own Functions comments declaration assignment return code block # #It’s a good idea to comment your code # myfunc <- function (arg1=10, arg2,...) { # your code goes here answer <- log (arg1, base=arg2) return (answer) }
48
Writing Your Own Functions
50
Obtaining R and Bioconductor R http://cran.r-project.org/ Bioconductor http://bioconductor.org/GettingStarted
51
General Reference Material A good beginner’s guide to R http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf A nice one-page reference card http://cran.r-project.org/doc/contrib/Short-refcard.pdf Outstanding summary of R/Bioconductor, with many examples http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#R_favor ite The definitive reference for writing R extensions (advanced!) http://cran.r-project.org/doc/manuals/R-exts.pdf Books William N. Venables and Brian D. Ripley. Modern Applied Statistics with S. Fourth Edition. Springer, New York, 2002. ISBN 0-387-95457-0. John M. Chambers. Programming with Data. Springer, New York, 1998. ISBN 0-387-98503-4 (aka “the Green Book”)
52
Flow-Specific References Vignettes http://bioconductor.org/packages/2.6/bioc/vignettes/flowCore/inst/doc/HowTo-flowCore.pdf http://bioconductor.org/packages/2.6/bioc/vignettes/flowViz/inst/doc/filters.pdf http://bioconductor.org/packages/2.6/bioc/vignettes/flowStats/inst/doc/GettingStartedWithFlo wStats.pdf http://bioconductor.org/packages/2.6/bioc/vignettes/flowQ/inst/doc/DataQualityAssessment.p df http://bioconductor.org/packages/2.6/bioc/vignettes/flowFP/inst/doc/flowFP_HowTo.pdf Original Articles flowCore Hahne, F., N. LeMeur, et al. (2009). "flowCore: a Bioconductor package for high throughput flow cytometry." BMC Bioinformatics 10: 106. Fingerprinting Rogers, W. T., A. R. Moser, et al. (2008). "Cytometric fingerprinting: quantitative characterization of multivariate distributions." Cytometry A 73(5): 430-41. Rogers, W. T. and H. A. Holyst (2009). "flowFP: A Bioconductor Package for Fingerprinting Flow Cytometric Data." Advances in Bioinformatics 2009(Article ID 193947): 11.
53
Contact Me! Wade Rogers rogersw@mail.med.upenn.edu 267-350-9680 (o) 610-368-5821 (m)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.