Introduction to Microarry and Related High Throughput Analysis BMI 705

Slides:



Advertisements
Similar presentations
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Advertisements

Introduction to Microarry Data Analysis - II BMI 730
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray Data Preprocessing and Clustering Analysis
Chip arrays and gene expression data. With the chip array technology, one can measure the expression of 10,000 (~all) genes at once. Can answer questions.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Introduction to Microarry Data Analysis - II BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Chip arrays and gene expression data. Motivation.
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
Bacterial Physiology (Micr430)
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
Introduce to Microarray
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Analysis of High-throughput Gene Expression Profiling
Introduction to Microarray Data Analysis BMI/IBGP 730 Kun Huang Department of Biomedical Informatics The Ohio State University Autumn 2010.
Analysis of microarray data
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
with an emphasis on DNA microarrays
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Data Type 1: Microarrays
Panu Somervuo, March 19, cDNA microarrays.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Agenda Introduction to microarrays
Microarray - Leukemia vs. normal GeneChip System.
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
ChIP-chip Data. DNA-binding proteins Constitutive proteins (mostly histones) –Organize DNA –Regulate access to DNA –Have many modifications Acetylation,
A Short Overview of Microarrays Tex Thompson Spring 2005.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
Genomics I: The Transcriptome
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
ABC D EF GH I JKL. Supplementary figure S1: Exemplary overview of the quality assessment plots generated by Robin. All plots have been generated using.
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 5-2 What bioinformatics.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Lecture 26 GWAS Based on chapter 9 Functional and Comparative Genomics Copyright © 2010 Pearson Education Inc.
Introduction to Oligonucleotide Microarray Technology
Microarray: An Introduction
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
Microarray - Leukemia vs. normal GeneChip System.
Volume 1, Issue 1, Pages (February 2002)
Getting the numbers comparable
Optimal gene expression analysis by microarrays
Volume 106, Issue 6, Pages (September 2001)
Volume 122, Issue 6, Pages (September 2005)
Biomedical Discovery with DNA Arrays
Presentation transcript:

Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University

What is microarray? Affymetrix-like arrays – single channel (background-green, foreground-red) cDNA arrays – two channel (red, green, yellow) CGH array, DNA methylation array, SNP array, etc. CHIP-on-Chip Tissue microarray Future - Sequencing The difference between the microarrays that are currently available must be understood before proceeding in this tutorial. Microarrays can be generalised into two categories: cDNA (oligonucleotide) spotted arrays High-density synthetic oligonucleotide (Affymetrix-like) arrays Broadly speaking this catergorisation is based mainly on the way the arrays are manufactured and the way they are used. cDNA arrays are mostly used in 2-dye (also known as 2-channel) experiments. This means that 2 samples of RNA are taken from 2 seperate sources and cDNA is created from each by using a technique called RT-PCR (Reverse Transcriptase Polymerase Chain Reaction). Each cDNA sample is labeled with its own specific dye. Both cDNA samples are then washed over a single array, after which dye intensities on each spot is calculated using a fluorescence camera. cDNA arrays are typically less dense than Affymetrix-like arrays, containing approximately 20,000 sequences, each of which is about 60 to 80 nucleotides long. In contrast to the 2-dye approach, Affymetrix-like arrays can only be treated with a single sample (i.e. RNA from a single source). No RT-PCR is necessary. These arrays can potentially have a density of up to 100,000 sequences, each of which being no longer than 40 nucleotides.

How is microarray manufactured?

How does two-channel microarray work? Printed microarrays Long probe oligonucleotides (80-100) long are “printed” on the glass chip

How does two-channel microarray work? Printing process introduces errors and larger variance Comparative hybridization experiment

How does microarray work?

How is microarray manufactured? Affymetrix GeneChip silicon chip oligonucleiotide probes lithographically synthesized on the array cRNA is used instead of cDNA

How does Affymetrix microarray work?

How does microarray work?

How does microarray work?

How does microarray work?

How does microarray work?

How does microarray work? Fabrication expense and frequency of error increases with the length of probe, therefore 25 oligonucleotide probes are employed. Problem: cross hybridization Solution: introduce mismatched probe with one position (central) different with the matched probe. The difference gives a more accurate reading.

How do we use microarray? Profiling Clustering

Spatial Images of the Microarrays Data for the same brain voxel but for the untreated control mouse Background levels are much higher than those for the Parkinson’s disearse model mouse There appears to be something non random affecting the background of the green channel of this slide

How do we take readings from microarray (measurement)? cDNA array – ratio, log ratio Affymetrix array

How do we process microarray data (McShane, NCI)

How do we process microarray data Normalization Intensity imbalance between RNA samples Affect all genes Not due to biology of samples, but due to technical reasons Reasons include difference in the settings of the photodetector voltage, imbalance in total amount of RNA in each sample, difference in uptaking of the dyes, etc. The objective is to adjust the gene expression values of all genes so that the ones that are not really differentially expressed have similar values across the array(s).

Normalization Two major issues to consider Housekeeping genes Which genes to use for normalization Which normalization algorithm to use Housekeeping genes Genes involved in essential activities of cell maintenance and survival, but not in cell function and proliferation. These genes will be similarly expressed in all samples but they may be difficult to identify – need to be confirmed. Affymetrix GeneChip provides a set of house keeping genes (but still no guarantee). Spiked controls Genes that are not usually found in the samples (both control and test sample). E.g., yeast gene in human tissue samples. Note: Affy GeneChip protocol includes the spiking of control oligonucleotides into each sample. They are NOT for normalization. Instead, they are for other purposes such as gridding of slide by the image analysis software. Using all genes Simplest approach – use all adequately expressed genes for normalization The assumption is that the majority of genes on the array are housekeeping genes and the proportion of over expressed genes is similar to that of the under expressed genes. If the genes one the chip are specially selected, then this method will not work.

Ratio-intensity (RI) or MA plot Normalization Which normalization algorithm to use For two-color cDNA arrays - Intra-slide normalization Ratio-intensity (RI) or MA plot Scatter plot Slope = 1

Linear (global) normalization Simplest but most consistent Move the median to zero (slope 1 in scatter plot, this only changes the intersection) No clear nonliearity or slope in MA plot

Intensity-based (Lowess) normalization Overall magnitude of the spot intensity has an impact on the relative intensity between the channels. “Straighten” the Lowess fit line in MA plot to horizontal line and move it to zero

Intensity-based (Lowess) normalization Nonlinear Gene-by-gene, could introduce bias Use only when there is a compelling reason (McShane, NCI)

Other normalization method Combination of location and intensity-based normalization Location Quantile …

Normalization Which normalization algorithm to use Inter-slide normalization Not just for Affymetrix arrays

Normalization Box plot Upper quartile Median Low quartile

Normalization Linear (global) – the chips have equal median (or mean) intensity Intensity-based (Lowess) – the chips have equal medians (means) at all intensity values Quantile – the chips have identical intensity distribution Quantile is the “best” in term of normalizing the data to desired distribution, however it also changes the gene expression level individually Avoid overfitting Avoid bias

Gene Discovery and T-tests Student’s t-test

Gene Discovery and Multiple T-tests Controlling False Positives Statistical tests to control the false positives Controlling for no false positives (very stringent, e.g., Bonferroni test) Controlling the number of false positives Controlling the proportion of false positives Note that in the screening stage, false positive is better than false negative as the later means missing of possibly important discovery.

Microarray Databases Gene Expression Ominbus (GEO) database – NCBI http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed EMBL-EBI microarray database http://www.ebi.ac.uk/Databases/microarray.html ArrayExpress Stanford Microarray Database (SMD) http://genome-www5.stanford.edu/ Other specialized, regional and aggregated databases http://psi081.ba.ars.usda.gov/SGMD/ http://www.oncomine.org/main/index.jsp http://ihome.cuhk.edu.hk/~b400559/arraysoft_public.html …

Microarray Softwares DChip Open source R, Bioconductor BRBArray tools (NCI biometric research branch) Affymetrix GeneSpring GX GenePattern …

How do we use microarray (clustering)?

How do we process microarray data (clustering)? Unsupervised Learning – Hierarchical Clustering

ChIP-on-chip, “also known as genome-wide location analysis, is a technique for isolation and identification of the DNA sequences occupied by specific DNA binding proteins in cells.” (http://www.chiponchip.org) Identify protein binding sites on DNA Study transcriptional factors – identify the genes that controlled by the specific TFs Identify TFs Identify regulatory regions such as promoters, enhancers, repressors, silencing elements, insulators, and boundary elements Determine sequences controlling DNA replication (e.g., histone binding sites)

ChIP-on-Chip ChIP – Chromatin immunoprecipitation Chip – Microarray

ChIP-on-Chip – Example Simon I., Barnett J., Hannett N., Harbison C.T., Rinaldi N.J., Volkert T.L., Wyrick J.J., Zeitlinger J., Gifford D.K., Jaakkola T.S., et al. "Serial regulation of transcriptional regulators in the yeast cell cycle", Cell, Volume: 106, (2001), pp. 697-708. Figure 2. Genome-wide Location of the Nine Cell Cycle Transcription Factors(A) 213 of the 800 cell cycle genes whose promoter regions were bound by a myc-tagged version of at least one of the nine cell cycle transcription factors (p < 0.001) are represented as horizontal lines. The weight-averaged binding ratios are displayed using a blue and white color scheme (genes with p value < 0.001 are displayed in blue). The expression ratios of an α factor synchronization time course from Spellman et al. (1998) are displayed using a red (induced) and green (repressed) color scheme.(B) The circle represents a smoothed distribution of the transcription timing (phase) of the 800 cell cycle genes (Spellman et al., 1998). The intensity of the red color, normalized by the maximum intensity value for each factor, represents the fraction of genes expressed at that point that are bound by a specific activator. The similarity in the distribution of color for specific factors (with Swi4, Swi6, and Mbp1, for example) shows that these factors bind to genes that are expressed during the same time frame

ChIP-on-Chip – Example Simon I., et al. "Serial regulation of transcriptional regulators in the yeast cell cycle", Cell, Volume: 106, (2001), pp. 697-708.

ChIP-on-Chip Problem : Probe design Most TF binding sites are not in exon Binding sequences are short Cover entire genome? Signal may be small Tiling array – divide the sequence into chunks, called tiling path. The distance between the center of neighboring chunks is called resolution. A path can be overlapped or spaced. Affymetrix tiling array for yeasts – 5bp resolution, 3.2 million probes Affymetrix tiling array for human – 35bp spacing, 90 million probes

Sequencing Solexa http://www.illumina.com/pages.ilmn?ID=203 SOLiD Mikkelsen et al