Aedín Culhane aedin@jimmy.harvard.edu Introduction to Bioc Aedín Culhane aedin@jimmy.harvard.edu http://bcb.dfci.harvard.edu/~aedin http://www.hsph.harvard.edu/research/aedin-culhane/
Bioconductor To install use script on Bioconductor Website source("http://www.bioconductor.org/biocLite.R") biocLite()
What Packages do I need? Specific to you data and analysis pipeline but for examples: Bioconductor Workshops Bioconductor Workflows
Packages Overview BioConductor web site Bioconductor BiocViews Task view Software Annotation Data Experimental Data
Main types of Annotation Packages Gene centric AnnotationDbi packages: Organism: org.Mm.eg.db. Technology/Platform: hgu133plus2.db. GeneSets and Pathway (biology level): GO.db or KEGG.db .db packages can be queried with sql or accessed using annotation package (totable, get, mget) Genome centric GenomicFeatures packages: Transriptome level: TxDb.Hsapiens.UCSC.hg19.knownGene Generic features: Can generate via GenomicFeatures biomaRt: Query web-based `biomart' resource for genes, sequence, SNPs, and etc. See http://www.bioconductor.org/help/course-materials/2011/BioC2011/LabStuff/AnnotationSlidesBioc2011.pdf
Bioconductor resources Mailing List (sign up for daily digest) Documentation, workshop/course material online Slides from talks, pdf of tutorials, R code Help available for each software package Each package MUST contain vignette (howto) Other resources ww.Rseek.org www.r-bloggers.com
Vignette Tutorials, provide worked example of package Required in Bioconductor packages library("Biobase") library("GOstats") # Load package of interest openVignette()
Getting Data into R & Bioconductor Aedín Culhane aedin@jimmy.harvard.edu http://www.hsph.harvard.edu/research/aedin-culhane/
Simple Excel SpreadSheet data Simple table read.table() read.csv() scan() However more datatype specialized. See Technologies on BiocViews. http://www.bioconductor.org/packages/release/Bioc Views.html GDC - GenomicDataCommons Microarray Data- GEOquery, ArrayExpress,
A Microarray Overview
Reading Affymetrix Data May 2011 Reading Affymetrix Data library(affy) require(affy) # Alternative affybatch <- ReadAffy(celfile.path="[Location of your data]") eSet<-justRMA()
Sample R code
Other Arrays Illumina 2 color spotted arrays Other arrays Lumi package May 2011 Other Arrays Illumina Lumi package 2 color spotted arrays Limma package Other arrays http://www.bioconductor.org/help/workflows/oli go-arrays/
May 2011 R Code
More on GEOquery require(GEOquery) May 2011 More on GEOquery require(GEOquery) Let's try to load the GDS810 dataset which contains data on Alzheimer's disease at various stages of severity. GDS810<-getGEO("GDS810") The getGEO function returns an object of class GEOData. You can get a description of this class like this: help("GEOData-class") Meta(GDS810) Columns(GDS810) head(Table(GDS810))
Assessing Data Quality May 2011 Assessing Data Quality
ExpressionSet Class in R May 2011 ExpressionSet Class in R
R basics: Getting help To get help help.search(“mean”) help(mean) help.search(“mean”) apropos("mean") example(mean) http://www.bioconductor.org/help/