Download presentation
0
Microarray Data Analysis of Illumina Data Using R/Bioconductor
Reddy Gali, Ph.D.
1
Agenda Introduction to microarrays
Workflow of a gene expression microarray experiment Microarray experimental design Public microarray databases Microarray preprocessing - Quality control and Diagnostic analysis
2
Agenda Introduction to R/Bioconductor
Installation of R and Bioconductor Packages General data analysis and strategies Data analysis using lumi package Data analysis using limma package 2
3
Workflow of Gene Expression
Biological question Experimental design QC Tissue / sample preparation Extraction of Total RNA Probe amplification & labeling Microarray hybridization & processing Image analysis Data analysis Expression measures - Normalization - Statistical Filtering - Clustering - Pathway analysis Biological Verification
4
Pitfalls of Microarray Experiment
Gene expression changes detected by microarray analysis cannot be validated by other methods - Inadequate design Data quality is low - Statistical approach is not adequate - Expression level of gene is below detection limit - Change in gene expression is small - Microarray detection probe is not specific or not sensitive 4
5
Questions usually asked
What kind of technology or microarrays I have to use How many replicates do I need What is a real replicate Do I need statistical advice Should I do technical replicate Should I pool my samples How do I analyze my dataset What software should I use 5
6
Design of Microarray Experiment
Replicates Goal, resources, technology, quality, design and analysis Two fold change – 3 replicates Smaller change – 5 replicates Technical replicates and Biological replicates Sample pooling Amount of sample Replicates of pooled sample No way to find variance between samples 6
7
Gene Expression Omnibus- GEO
7
8
Public Microarray Databases
BodyMap - SMD - RIKEN - MGI - GEO - CIBEX - ArrayExpress - 8
9
Microarray Platforms Agilent Microarrays 60-mer format
Codelink Bioarrays 30-mer format Affymetrix GeneChips 25-mer format Illumina Beadchips NimbleGen 60-mer format 9
10
Illumina Bead Array Technology
Silica Beads Each bead is covered with hundreds of thousands of copies of a specific oligonucleotide 10
11
Some Facts Each bead carries copies of probes with, on average, 30 replicates of every bead type per array Around 105 copies of a particular DNA sequence of interest are covalently attached to each bead DNA sequences (oligonucleoties) attached to the beads are 75 base pairs in length, with 25 base pairs used for decoding and 50 base pairs used for target hybridization A pool of different bead types is created, beads of the same type having the same probe sequence attached
12
Box Plots of unnormalized data
12
13
Raw vs Normalized data Raw Data Normalized Data 13
14
Histograms of unnormalized data
14
15
Why Normalize It adjusts the individual hybridization intensities to balance them appropriately so that meaningful biological comparisons can be made. Unequal quantities of starting RNA Differences in labeling or detection efficiencies between the fluorescent dyes used Systematic biases in the measured expression levels. Sample preparation Variability in hybridization Spatial effects Scanner settings Experimenter bias 15
16
Free Software – Data analysis
Bioconductor is an open source and open development software project to provide tools for the analysis and comprehension of genomic data. TMEV 4.0 is an application that allows the viewing of processed microarray slide representations and the identification of genes and expression patterns of interest. 16
17
R / Bioconductor R and Bioconductor packages
R ( )is a comprehensive statistical environment and programming language for professional data analysis and graphical display. Bioconductor ( is an open source and open development software project for the analysis of microarray, sequence and genome data. More 300 Bioconductor packages. 17
18
R / Bioconductor - Installation
18
19
Preparing R for analysis
20
Preparing R for analysis
21
Preparing R for analysis
22
Preparing R for analysis
23
Preparing R for analysis
24
Analysis using lumi R package
- Loading data into R/Bioconductor >lumi_data <- lumiR(‘worshop_data.csv') Summary of the loaded data >lumi_data - Quality control of loaded data >summary(lumi_data, 'QC')
25
>density(lumi.Rdata)
26
>boxplot(lumi.Rdata)
27
>MAplot(lumi.Rdata)
28
>> plot(lumi.Rdata, what='sampleRelation')
>> plot(lumi.Rdata, what=‘cv') >> plot(lumi.Rdata, what=‘outlier')
29
Variance Stabilization
> lumi.Tdata <- lumiT(lumi.Rdata) > lumi.VSdata <- plotVST(lumi.Tdata)
30
> lumi.Ndata <- lumiN(lumi.Tdata)
Normalization > lumi.Ndata <- lumiN(lumi.Tdata) Or Do all the default preprocessing in one step > lumi.N.Q <- lumiExpresso(lumi.Rdata) Background Correction: bgAdjust Variance Stabilizing Transform method: vst Normalization method: quantile Perform all the QC again > summary(lumi.Ndata, 'QC')
31
Differential expression
>design <- model.matrix(~ -1 + factor(c(1, 1, 1,1, 2, 2, 2,2))) >colnames(design) = c("control","affected") >fit <- lmFit(lumi.Ndata, design) >cont.matrix <- makeContrasts(signature = affected - control,levels=design) >fit2 <- contrasts.fit(fit, cont.matrix) >ebFit <- eBayes(fit2) >results <- topTable(ebFit, number=100, sort.by="B", resort.by="M") >print(results) >write.table(topTable(ebFit, coef=1, adjust="fdr", sort.by="B", number=25000), file="results.xls", row.names=F, sep="\t")
32
Thank you http://catalyst.harvard.edu Reddy Gali, Ph.D.
Phone:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.