Presentation is loading. Please wait.

Presentation is loading. Please wait.

Getting Data into R & Bioconductor

Similar presentations


Presentation on theme: "Getting Data into R & Bioconductor"— Presentation transcript:

1 Getting Data into R & Bioconductor
Aedín Culhane

2 Simple Excel SpreadSheet data
Already described Read.table() Read.csv() scan() Are other formats eg netcdf However more datatype specialized. Look at Technologies on BiocViews.

3 Some common data types Microarray SNP Increasingly NGS May 2011

4 A Microarray Overview

5 Reading Affymetrix Data
library(affy) require(affy) # Alternative affybatch <- ReadAffy(celfile.path="[Location of your data]") eSet<-justRMA() May 2011

6 Sample R code

7 ExpressionSet Class in R
May 2011

8 Assessing Data Quality
May 2011

9 Public Microarray Data
ArrayExpress 21997 Studies (622,617 profiles,) GEO 22,735 Studies (558,074 profiles) Statistics May 2011

10 >500,000 arrays x $500 = $250,000,000 Cancer Studies account for >14% of all studies in databases…

11 R Code May 2011

12 More on GEOquery require(GEOquery)
Let's try to load the GDS810 dataset which contains data on Alzheimer's disease at various stages of severity. GDS810<-getGEO("GDS810") The getGEO function returns an object of class GEOData. You can get a description of this class like this: help("GEOData-class") Meta(GDS810) Columns(GDS810) head(Table(GDS810)) May 2011

13 Affy SNP Arrays May 2011

14 Process – Affy SNP Arrays (Oligo package)
May 2011

15 Other Arrays Illumina 2 color spotted arrays Other arrays Lumi package
Limma package Other arrays May 2011

16 Next Generation Sequencing Data

17 R Code May 2011

18 Exercise From GEO bring down GSE
Download the dataset GSE1297 using getGEO This data will be downloaded as an eSet, so to see the expression data and phenoData, use pData and exprs Use ArrayQualityMetrics to Assess the data quality of these data May 2011

19 With thanks to May 2011

20 Quick Aside: Interpreting hierarchical clustering trees
Hierarchical analysis results viewed using a dendrogram (tree) Distance between nodes (Scale) Ordering of nodes not important (like baby mobile) A B Tree A and B are equivalent


Download ppt "Getting Data into R & Bioconductor"

Similar presentations


Ads by Google