Introduction to Microarray Analysis and Technology Dave Lin - November 5, 2001.

Slides:



Advertisements
Similar presentations
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Advertisements

Microarray Simultaneously determining the abundance of multiple(100s-10,000s) transcripts.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical.
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Microarrays Dr Peter Smooker,
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
Microarray analysis Golan Yona ( original version by David Lin )
The Human Genome Project and ~ 100 other genome projects:
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
HWW Gene Expression Experiments: How? Why? What’s the problem?
Arrays: Narrower terms include bead arrays, bead based arrays, bioarrays, bioelectronic arrays, cDNA arrays, cell arrays, DNA arrays, gene arrays, gene.
Introduce to Microarray
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
By Moayed al Suleiman Suleiman al borican Ahmad al Ahmadi
Analysis of microarray data
Microarray Preprocessing
with an emphasis on DNA microarrays
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
‘Omics’ - Analysis of high dimensional Data
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical.
CDNA Microarrays MB206.
Data Type 1: Microarrays
Panu Somervuo, March 19, cDNA microarrays.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Agenda Introduction to microarrays
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
Microarray - Leukemia vs. normal GeneChip System.
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
Introduction to DNA microarray technologies Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
What Is Microarray A new powerful technology for biological exploration Parallel High-throughput Large-scale Genomic scale.
Labeling with Genisphere Kit Mary Lee S. Ledbetter College of the Holy Cross.
Genomics I: The Transcriptome
MICROARRAY TECHNOLOGY
Computational Biology and Bioinformatics Lab. Songhwan Hwang Functional Genomics DNA Microarray Technology.
Introduction to Microarrays.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Introduction to Microarrays. The Central Dogma.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
DNA Gene A Transcriptional Control Imprinting Histone Acetylation # of copies of RNA? Post Transcriptional Processing mRNA Stability Translational Control.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Lecture 23 – Functional Genomics I Based on chapter 8 Functional and Comparative Genomics Copyright © 2010 Pearson Education Inc.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray: An Introduction
The Central Dogma. Life - a recipe for making proteins DNA protein RNA Translation Transcription.
Using Web-Based Tools for Microarray Analysis
Microarray - Leukemia vs. normal GeneChip System.
The Basics of cDNA Microarray Technology
Introduction to cDNA Microarray Technology
Introduction to Microarrays.
Microarray Data Analysis
Presentation transcript:

Introduction to Microarray Analysis and Technology Dave Lin - November 5, 2001

Overview —Why Biologists care about Genomics —Why statisticians/computer scientists —may care about genomics Preprocessing issues Sources of variability in constructing microarrays Postprocessing issues Analysis of data

What makes one cell different from another? liver vs. brain Cancerous vs. non-cancerous Treatment vs. control

Old Days 100,000 genes in mammalian genome each cell expresses 15,000 of these genes each gene is expressed at a different level estimated total of 100,000 copies of mRNA/cell 1-5 copies/cell - “rare” -~30% of all genes copies/cell - “moderate” 200 copies/cell and up - “abundant”

Cells can be defined by: Complement of Genes (which genes are expressed) How much of each gene is expressed (quantity) What makes one cell different from another? Try and find genes that are differentially expressed Study the function of these genes Find which genes interact with your favorite gene Extremely time-consuming. Huge amounts of effort expended to find individual genes that may differ between two conditions

Genomics. Almost useless term-defines many different concepts and applications. Microarrays -massively parallel analysis of gene expression -screen an entire genome at once -find not only individual genes that differ, but groups of genes that differ. -find relative expression level differences -how quantitative can they be?

Microarrays - Based on old technique many flavors- majority are of two essential varieties cDNA Arrays printing on glass slides miniaturization, throughput fluorescence based detection Affymetrix Arrays in situ synthesis of oligonucleotides will not consider Affymetrix arrays further.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, MASSIVE PCR PCR PURIFICATION and PREPARATION PREPARING SLIDES PRINTING Building the Chip: Full yeast genome = 6,500 reactions IPA precipitation +EtOH washes well format The arrayer: high precision spotting device capable of printing 10,000 products in 14 hrs, with a plate change every 25 mins Polylysine coating for adhering PCR products to glass slides POST PROCESSING Chemically converting the positive polylysine surface to prevent non- specific hybridization

Fabrication of “Spotted Arrays” 20,000 Precipitations 20,000 resuspensions Consolidate for printing Spot on Glass Slides Arrayed Library Normalized/Subtracted 20,000 PCR reactions

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Micro Spotting pin

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Hybing the Chip: ARRAY HYBRIDIZATION PROBE LABELING DATA ANALYSIS Cy3 and Cy5 RNA samples are simultaneously hybridized to chip. Hybs are performed for 5-12 hours and then chips are washed. Two RNA samples are labelled with Cy3 or Cy5 monofunctional dyes via a chemical coupling to AA-dUTP. Samples are purified using a PCR cleanup kit. Ratio measurements are determined via quantification of 532 nm and 635 nm emission values. Data are uploaded to the appropriate database where statistical and other analyses can then be performed.

Labeling of RNAs with Cy3 or Cy5 Two general methods -Dye conjugated nucleotide -Amino-allyl indirect labeling

Direct labeling of RNA AAAAAAA RNA TTTTTTTT CCAACCTATGG T T Cy5-dUTP GGTTGGATACC cDNA cDNA synthesis + or Cy3-dUTP

AAAAAAA TTTTTTTT CCAACCTATGG GGTTGGATACC Indirect labeling of RNA T Modified nucleotide Cy3 GGTTGGATACC addition cDNA synthesis

Dye effect issues Direct method Unequal incorporation of Cy5 vs. Cy3 Very poor overall incorporation of direct-conjugated nucleotide = more starting RNA for labeling. Indirect method Presumably less bias in initial incorporation of activated nucleotide, but not clear if more or less dye is added Both Methods Cy3 fluoresces more brightly than Cy5 labeling is very highly sequence dependent

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Micrograph of a portion of hybridization probe from a yeast mciroarray (after hybridization).

Layout of the cDNA Microarrays —Sequence verified, normalized mouse cDNAs —19,200 spots in two print groups of 9,600 each –4 x 4 grid, each with 25 x24 spots –Controls on the first 2 rows of each grid. 77 pg1pg2

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Practical Problems 1 Comet Tails Likely caused by insufficiently rapid immersion of the slides in the succinic anhydride blocking solution.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Practical Problems 2

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Practical Problems 3 High Background 2 likely causes: –Insufficient blocking. –Precipitation of the labeled probe. Weak Signals

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Practical Problems 4 Spot overlap: Likely cause: too much rehydration during post - processing.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Practical Problems 5 Dust

Pin-specific printing differences

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Normalization - lowess Global lowess Assumption: changes roughly symmetric at all intensities.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Normalisation - print-tip-group Assumption: For every print group, changes roughly symmetric at all intensities.

Pre-processing Issues -Definition of what a real signal is what is a spot, and how to determine what should be included in the analysis? -How to determine background local (surrounding spot) vs. global (across slide) -How to correct for dye effect -How to correct for spatial effect e.g. print-tip, others -How to correct for differences between slides e.g. scale normalization

Experimental Design Issues What is the best means of performing the experiment To obtain the desired answer? Biologists’ assumptions and statisticians’ differ. Biologist viewpoint make everything exactly the same so that differences will stand out Statistician viewpoint make everything as random as possible so that real trends will stand out

Most biologists will ask- what are the differences between two samples? -implicit questions associated with microarrays- What is the best way to determine this? e.g. Design; replicates; conditions. How do I obtain the most reliable results? e.g. measurements, normalization How do I determine what a significant difference is? Do I care about “subtle” changes, or just the extremes? How is information best extracted? Is correlation useful? What type of clustering? How is information combined? How do you model the interactions of 1000s of genes

Design: Two Ways to Do the Comparisons

Advantages of Our Design —Lower variability —Increased precision —Increase in measurement of expression -> increased precision