Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Microarry and Related High Throughput Analysis BMI 705

Similar presentations


Presentation on theme: "Introduction to Microarry and Related High Throughput Analysis BMI 705"— Presentation transcript:

1 Introduction to Microarry and Related High Throughput Analysis BMI 705
Kun Huang Department of Biomedical Informatics Ohio State University

2 What is microarray? Affymetrix-like arrays – single channel (background-green, foreground-red) cDNA arrays – two channel (red, green, yellow) CGH array, DNA methylation array, SNP array, etc. CHIP-on-Chip Tissue microarray Future - Sequencing The difference between the microarrays that are currently available must be understood before proceeding in this tutorial. Microarrays can be generalised into two categories: cDNA (oligonucleotide) spotted arrays High-density synthetic oligonucleotide (Affymetrix-like) arrays Broadly speaking this catergorisation is based mainly on the way the arrays are manufactured and the way they are used. cDNA arrays are mostly used in 2-dye (also known as 2-channel) experiments. This means that 2 samples of RNA are taken from 2 seperate sources and cDNA is created from each by using a technique called RT-PCR (Reverse Transcriptase Polymerase Chain Reaction). Each cDNA sample is labeled with its own specific dye. Both cDNA samples are then washed over a single array, after which dye intensities on each spot is calculated using a fluorescence camera. cDNA arrays are typically less dense than Affymetrix-like arrays, containing approximately 20,000 sequences, each of which is about 60 to 80 nucleotides long. In contrast to the 2-dye approach, Affymetrix-like arrays can only be treated with a single sample (i.e. RNA from a single source). No RT-PCR is necessary. These arrays can potentially have a density of up to 100,000 sequences, each of which being no longer than 40 nucleotides.

3 How is microarray manufactured?

4 How does two-channel microarray work?
Printed microarrays Long probe oligonucleotides (80-100) long are “printed” on the glass chip

5 How does two-channel microarray work?
Printing process introduces errors and larger variance Comparative hybridization experiment

6 How does microarray work?

7 How is microarray manufactured? Affymetrix GeneChip
silicon chip oligonucleiotide probes lithographically synthesized on the array cRNA is used instead of cDNA

8 How does Affymetrix microarray work?

9 How does microarray work?

10 How does microarray work?

11 How does microarray work?

12 How does microarray work?

13 How does microarray work?
Fabrication expense and frequency of error increases with the length of probe, therefore 25 oligonucleotide probes are employed. Problem: cross hybridization Solution: introduce mismatched probe with one position (central) different with the matched probe. The difference gives a more accurate reading.

14 How do we use microarray?
Profiling Clustering

15 Spatial Images of the Microarrays
Data for the same brain voxel but for the untreated control mouse Background levels are much higher than those for the Parkinson’s disearse model mouse There appears to be something non random affecting the background of the green channel of this slide

16 How do we take readings from microarray (measurement)?
cDNA array – ratio, log ratio Affymetrix array

17 How do we process microarray data
(McShane, NCI)

18 How do we process microarray data
Normalization Intensity imbalance between RNA samples Affect all genes Not due to biology of samples, but due to technical reasons Reasons include difference in the settings of the photodetector voltage, imbalance in total amount of RNA in each sample, difference in uptaking of the dyes, etc. The objective is to adjust the gene expression values of all genes so that the ones that are not really differentially expressed have similar values across the array(s).

19 Normalization Two major issues to consider Housekeeping genes
Which genes to use for normalization Which normalization algorithm to use Housekeeping genes Genes involved in essential activities of cell maintenance and survival, but not in cell function and proliferation. These genes will be similarly expressed in all samples but they may be difficult to identify – need to be confirmed. Affymetrix GeneChip provides a set of house keeping genes (but still no guarantee). Spiked controls Genes that are not usually found in the samples (both control and test sample). E.g., yeast gene in human tissue samples. Note: Affy GeneChip protocol includes the spiking of control oligonucleotides into each sample. They are NOT for normalization. Instead, they are for other purposes such as gridding of slide by the image analysis software. Using all genes Simplest approach – use all adequately expressed genes for normalization The assumption is that the majority of genes on the array are housekeeping genes and the proportion of over expressed genes is similar to that of the under expressed genes. If the genes one the chip are specially selected, then this method will not work.

20 Ratio-intensity (RI) or MA plot
Normalization Which normalization algorithm to use For two-color cDNA arrays - Intra-slide normalization Ratio-intensity (RI) or MA plot Scatter plot Slope = 1

21 Linear (global) normalization
Simplest but most consistent Move the median to zero (slope 1 in scatter plot, this only changes the intersection) No clear nonliearity or slope in MA plot

22 Intensity-based (Lowess) normalization
Overall magnitude of the spot intensity has an impact on the relative intensity between the channels. “Straighten” the Lowess fit line in MA plot to horizontal line and move it to zero

23 Intensity-based (Lowess) normalization Nonlinear
Gene-by-gene, could introduce bias Use only when there is a compelling reason (McShane, NCI)

24 Other normalization method
Combination of location and intensity-based normalization Location Quantile

25 Normalization Which normalization algorithm to use Inter-slide normalization Not just for Affymetrix arrays

26 Normalization Box plot Upper quartile Median Low quartile

27 Normalization Linear (global) – the chips have equal median (or mean) intensity Intensity-based (Lowess) – the chips have equal medians (means) at all intensity values Quantile – the chips have identical intensity distribution Quantile is the “best” in term of normalizing the data to desired distribution, however it also changes the gene expression level individually Avoid overfitting Avoid bias

28 Gene Discovery and T-tests
Student’s t-test

29 Gene Discovery and Multiple T-tests
Controlling False Positives Statistical tests to control the false positives Controlling for no false positives (very stringent, e.g., Bonferroni test) Controlling the number of false positives Controlling the proportion of false positives Note that in the screening stage, false positive is better than false negative as the later means missing of possibly important discovery.

30 Microarray Databases Gene Expression Ominbus (GEO) database – NCBI
EMBL-EBI microarray database ArrayExpress Stanford Microarray Database (SMD) Other specialized, regional and aggregated databases

31 Microarray Softwares DChip Open source R, Bioconductor
BRBArray tools (NCI biometric research branch) Affymetrix GeneSpring GX GenePattern

32 How do we use microarray (clustering)?

33 How do we process microarray data (clustering)?
Unsupervised Learning – Hierarchical Clustering

34 ChIP-on-chip, “also known as genome-wide location analysis, is a technique for isolation and identification of the DNA sequences occupied by specific DNA binding proteins in cells.” ( Identify protein binding sites on DNA Study transcriptional factors – identify the genes that controlled by the specific TFs Identify TFs Identify regulatory regions such as promoters, enhancers, repressors, silencing elements, insulators, and boundary elements Determine sequences controlling DNA replication (e.g., histone binding sites)

35 ChIP-on-Chip ChIP – Chromatin immunoprecipitation Chip – Microarray

36 ChIP-on-Chip – Example
Simon I., Barnett J., Hannett N., Harbison C.T., Rinaldi N.J., Volkert T.L., Wyrick J.J., Zeitlinger J., Gifford D.K., Jaakkola T.S., et al. "Serial regulation of transcriptional regulators in the yeast cell cycle", Cell, Volume: 106, (2001), pp Figure 2. Genome-wide Location of the Nine Cell Cycle Transcription Factors(A) 213 of the 800 cell cycle genes whose promoter regions were bound by a myc-tagged version of at least one of the nine cell cycle transcription factors (p < 0.001) are represented as horizontal lines. The weight-averaged binding ratios are displayed using a blue and white color scheme (genes with p value < are displayed in blue). The expression ratios of an α factor synchronization time course from Spellman et al. (1998) are displayed using a red (induced) and green (repressed) color scheme.(B) The circle represents a smoothed distribution of the transcription timing (phase) of the 800 cell cycle genes (Spellman et al., 1998). The intensity of the red color, normalized by the maximum intensity value for each factor, represents the fraction of genes expressed at that point that are bound by a specific activator. The similarity in the distribution of color for specific factors (with Swi4, Swi6, and Mbp1, for example) shows that these factors bind to genes that are expressed during the same time frame

37 ChIP-on-Chip – Example
Simon I., et al. "Serial regulation of transcriptional regulators in the yeast cell cycle", Cell, Volume: 106, (2001), pp

38 ChIP-on-Chip Problem : Probe design Most TF binding sites are not in exon Binding sequences are short Cover entire genome? Signal may be small Tiling array – divide the sequence into chunks, called tiling path. The distance between the center of neighboring chunks is called resolution. A path can be overlapped or spaced. Affymetrix tiling array for yeasts – 5bp resolution, 3.2 million probes Affymetrix tiling array for human – 35bp spacing, 90 million probes

39 Sequencing Solexa http://www.illumina.com/pages.ilmn?ID=203 SOLiD
Mikkelsen et al


Download ppt "Introduction to Microarry and Related High Throughput Analysis BMI 705"

Similar presentations


Ads by Google