Download presentation
Presentation is loading. Please wait.
0
Microarray Analysis Using R/Bioconductor
Reddy Gali, Ph.D.
1
Agenda Introduction to microarrays
Workflow of a gene expression microarray experiment Publishing microarray data (MIAME format) Microarray experimental design Public microarray databases Microarray preprocessing - Quality control and Diagnostic analysis
2
Agenda Introduction to R/Bioconductor
Installation of R and Bioconductor Packages General data analysis and strategies Data analysis using AffylmGUI 2
3
Microarray Applications
Analyze and compare patterns of gene expression - before and after an intervention - between tissue types - between transgenic strains - in neighboring cells (laser capture microdissection) Find DNA copy-number variations SNP detection Tool for genotyping High throughput screening tool for drug discovery Elucidate gene function (RNAi microarrays; Silva et al., PNAS 2004) Investigate interactions between DNA and protein (ChIP on Chip) 3
4
Workflow of Gene Expression
Biological question Experimental design QC Tissue / sample preparation Extraction of Total RNA Probe amplification & labeling Microarray hybridization & processing Image analysis Data analysis Expression measures - Normalization - Statistical Filtering - Clustering - Pathway analysis Biological Verification
5
Pitfalls of Microarray Experiment
Gene expression changes detected by microarray analysis cannot be validated by other methods - Inadequate design Data quality is low - Statistical approach is not adequate - Expression level of gene is below detection limit - Change in gene expression is small - Microarray detection probe is not specific or not sensitive 5
6
Microarray Processing
7
Two color vs Single color
Homemade Microarray Affymetrix GeneChip Tissue Tissue normal diseased normal diseased Total RNA Total RNA cDNA synthesis First-strand cDNA synthesis Double-stranded cDNA Cy5 Cy3 in vitro transcription Cy3 or Cy5 labeled cDNA Biotin-labeled cRNA Mixing Hybridization Hybridization and Staining Raw Data Output Raw Data Output Expression Ratio to Absolute Expression Values 7
8
Affymetrix probe design
PM MM 11 Probe pairs / Probe Set Multiple Probe Sets / Gene Lipshutz et al; 1999; Nature Genetics, 21(1):20-24 8
9
Questions usually asked
What kind of technology or microarrays I have to use How many replicates do I need What is a real replicate Do I need statistical advice Should I do technical replicate Should I do dye swap Should I pool my samples How do I analyze my dataset What software should I use 9
10
Design of Microarray Experiment
Replicates Goal, resources, technology, quality, design and analysis Two fold change – 3 replicates Smaller change – 5 replicates Technical replicates and Biological replicates Sample pooling Amount of sample Replicates of pooled sample No way to find variance between samples 10
11
MIAME- How to publish Minimum Information About a Microarray Experiment (MIAME)- 11
12
MIAME – Check list Type of experiment: for example, is it a comparison of normal vs. diseased tissue, a time course, or is it designed to study the effects of a gene knock-out? Experimental factors: the parameters or conditions tested, such as time, dose, or genetic variation. The number of hybridizations performed in the experiment. The type of reference used for the hybridizations, if any. Hybridization design: if applicable, a description of the comparisons made in each hybridization, whether to a standard reference sample, or between experimental samples. An accompanying diagram or table may be useful. Quality control steps taken: for example, replicates or dye swaps. 12
13
MIAME – Check list The origin of the biological sample (for instance, name of the organism, the provider of the sample) and its characteristics: for example, gender, age, developmental stage, strain, or disease state. Manipulation of biological samples and protocols used: for example, growth conditions, treatments, separation techniques. Protocol for preparing the hybridization extract: for example, the RNA or DNA extraction and purification protocol. Labeling protocol(s) External controls (spikes) 13
14
MIAME – Check list Type of scanning hardware and software used: this information is appropriate for a materials and methods section. Type of image analysis software used: specifications should be stated in the materials and methods. A description of the measurements produced by the image-analysis software and a description of which measurements were used in the analysis. The complete output of the image analysis before data selection and transformation (spot quantitation matrices). Data selection and transformation procedures. Final gene expression data table(s) used by the authors to make their conclusions after data selection and transformation (gene expression data matrices). 14
15
Gene Expression Omnibus- GEO
15
16
Public Microarray Databases
BodyMap - SMD - RIKEN - MGI - GEO - CIBEX - ArrayExpress - 16
17
Microarray Platforms Agilent Microarrays 60-mer format
Codelink Bioarrays 30-mer format Affymetrix GeneChips 25-mer format Illumina Beadchips NimbleGen 60-mer format 17
18
RNA quality OD 260/280 1.8-2 Electropherograms: degradation, rRNA peaks Bio-analyzer graphs
19
Microarray data Mining
Biological question Experimental design Microarray experiment Biological verification/ interpretation Estimation/Testing Clustering Classification/Prediction Data analysis Expression quantification Normalization Image analysis Pre-processing 19
20
Microarray data Mining
CDF / CEL Quality assessment Background correction probe level normalization probe set summary Log ratios Log intensities Identify genes Clustering etc 20
21
Microarrays – Image Inspection
Microarray: - Visual inspection of the chip Scratches, bubbles, uneven hybridization outlier detection 21
22
Diagnostic plots-RNA degradation
22
23
Box Plots of unnormalized data
23
24
Raw vs Normalized data Raw Data Normalized Data 24
25
Histograms of unnormalized data
25
26
QC stats 26
27
Why Normalize It adjusts the individual hybridization intensities to balance them appropriately so that meaningful biological comparisons can be made. Unequal quantities of starting RNA Differences in labeling or detection efficiencies between the fluorescent dyes used Systematic biases in the measured expression levels. Sample preparation Variability in hybridization Spatial effects Scanner settings Experimenter bias 27
28
Data analysis workflow
28
29
Free Software – Data analysis
Bioconductor is an open source and open development software project to provide tools for the analysis and comprehension of genomic data. TMEV 4.0 is an application that allows the viewing of processed microarray slide representations and the identification of genes and expression patterns of interest. dCHIP DNA-Chip Analyzer (dChip) is a software package for probe-level (e.g. Affymetrix platform) and high-level analysis of gene expression microarrays and SNP microarrays. 29
30
R / Bioconductor R and Bioconductor packages
R ( )is a comprehensive statistical environment and programming language for professional data analysis and graphical display. Bioconductor ( is an open source and open development software project for the analysis of microarray, sequence and genome data. More 300 Bioconductor packages. 30
31
R / Bioconductor - Installation
31
32
OneChannelGUI A graphical interface (GUI) for Bioconductor libraries to be used for quality control, normalization, filtering, statistical validation and data mining for single channel microarrays Affymetrix IVT, Human Gene 1.0 ST and exon arrays are implemented OneChannelGUI is an add-on Bioconductor package providing a new set of functions extending the capability of the affylmGUI package. 32
33
TCL and Tk pacakges ActiveTcl is ActiveState's distribution of Tcl. It is most commonly used for rapid prototyping, scripted applications and GUIs. Install Tcl - Tcl/Tk packages, BWidget and Tktable Install in C:\Tcl Directory 33
34
Installing R/ Active Tcl
34
35
Installing AffylmGUI packages for Affymetrix data
install.packages("affylmGUI",contriburl=" source(" biocLite("affylmGUI", dependencies=TRUE) biocLite("affylmGUI") biocLite("tkrplot") biocLite("affyPLM") biocLite("R2HTML") biocLite("xtable") library(affylmGUI) 35
36
AffylmGUI Browser 36
37
OneChannelGUI Installation
source(" biocLite("oneChannelGUI") biocLite("oneChannelGUI ", dependencies=TRUE) library(oneChannelGUI) 37
38
OneChannelGUI 38
39
Target File creation Create, with excel, a tab delimited file named targets.txt Targets file is made of three columns with the following header: Name, FileName, Target In column Name place a brief name (e.g. c1, c2, etc) In column FileName place the name of the corresponding .CEL file In column Target place the experimental conditions (e.g. control, treatment, etc) Place targets.txt and CEL files into a folder (directory) 39
40
Target File 40
41
Working with OnechannelGUI
B 41
42
Working with OnechannelGUI
Click on “File” to start a new project B C Click on “New” to start a new project Selected 3’IVT arrays D Select working directory that has the .CEL files and targets.txt file 42
43
Working with OnechannelGUI
44
Working with OnechannelGUI
Quality control Statistical analysis Normalization Filtering Biological Knowledge extraction Annotation 44
45
Quality Control plots Click on Quality Control menu 45
46
QC plots/reports Work with your data set
Plot various QC plots and come up with what arrays are not of good quality Plot RNA degradation plot Download affyQCreport package and create a QC report for the dataset you are working > library(affyQCReport) > QCReport(mydata, file=“reddy.pdf”) 46
47
Working with OnechannelGUI
Quality control Statistical analysis Normalization Filtering Biological Knowledge extraction Annotation 47
48
Probe set summary A Click on probe set menu
and select the probe set summary and normalization option. B 48
49
Normalization 49
50
Exercise 4 Calculate probe set summaries with GCRMA and RMA
Export and save the normalized values 50
51
Working with OnechannelGUI
Quality control Statistical analysis Normalization Filtering Biological Knowledge extraction Annotation 51
52
Filtering - OnechannelGUI
Signal features: Percent intensities greater of a user defined value Interquantile range (IQR) greater of a defined value Annotation features: Specific gene features (i.e. GO term, presence of transcriptional regulative elements in promoters, etc.) Using Ingenuity pathway knowledge base 52
53
Filtering Perform IQR filter at 0.25 followed by an intensity filter at 50% of the arrays with and intensity over 100. Export the data as tab delimited file. -Question: How many probe sets are left after the first and the second filter? Using transcription factors from Ingenuity create a file containing only the entrez genes without header and use it to filter the data set. Save the data set 53
54
Linear Modeling (Limma)
54
55
Differential Expression
Computer contrasts builds differential expression 55
56
MA and Volcano plots 56
57
Expression values P-values Average intensity Gene Description
Gene Symbol Log2 FC Log-odd statistics T statistics AffyID 57
58
Differential Expression
Use the “Table of Genes Ranked in order of Differential Expression” and filter the genes and export the normalized expression values Plot differentially expressed genes with raw p-value ≤ 0.05 and an absolute fold change ≥ 1 for the two contrasts. Using "Venn Diagram between probe set lists“, evaluate the level of overlap between the two sets. Hint: make two sets from two contrasts 58
59
Thank you http://catalyst.harvard.edu Reddy Gali, Ph.D.
Phone:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.