Introduction to Microarray

Slides:



Advertisements
Similar presentations
MicroArray Image Analysis Robin Liechti
Advertisements

MicroArray Image Analysis
MicroArray Image Analysis Robin Liechti
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Statistics for Microarrays
Image Quantitation in Microarray Analysis More tomorrow...
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
Biological background: Gene Expression and Molecular Laboratory Techniques Class web site: Statistics.
Gene Expression Chapter 9.
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
DNA microarray and array data analysis
DNA Microarray: A Recombinant DNA Method. Basic Steps to Microarray: Obtain cells with genes that are needed for analysis. Isolate the mRNA using extraction.
Additional Powerful Molecular Techniques Synthesis of cDNA (complimentary DNA) Polymerase Chain Reaction (PCR) Microarray analysis Link to Gene Therapy.
Chip arrays and gene expression data. With the chip array technology, one can measure the expression of 10,000 (~all) genes at once. Can answer questions.
The Human Genome Project and ~ 100 other genome projects:
Central Dogma 2 Transcription mRNA Information stored In Gene (DNA) Translation Protein Transcription Reverse Transcription SELF-REPAIRING ARABIDOPSIS,
Information Aspects of Nucleic Acids Measurement Technologies Description of nucleic acid measurement technologies Algorithmic, optimization, data analysis.
Alternative Splicing As an introduction to microarrays.
Arrays: Narrower terms include bead arrays, bead based arrays, bioarrays, bioelectronic arrays, cDNA arrays, cell arrays, DNA arrays, gene arrays, gene.
Gene expression and the transcriptome I. Genomics and transcriptome After genome sequencing and annotation, the second major branch of genomics is analysis.
Introduce to Microarray
Gene Expression BMI 731 Winter 2005 Catalin Barbacioru Department of Biomedical Informatics Ohio State University.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
GeneChips and Microarray Expression Data
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
and analysis of gene transcription
Analysis of microarray data
B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 8 Analyzing Microarray Data Aleppo University Faculty of technical.
with an emphasis on DNA microarrays
Image Quantitation in Microarray Analysis More tomorrow...
Gene expression and the transcriptome I
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Introduction to Gene Chips and Microarray Expression Data
DNA MICROARRAYS WHAT ARE THEY? BEFORE WE ANSWER THAT FIRST TAKE 1 MIN TO WRITE DOWN WHAT YOU KNOW ABOUT GENE EXPRESSION THEN SHARE YOUR THOUGHTS IN GROUPS.
CDNA Microarrays MB206.
Data Type 1: Microarrays
Microarray Technology
Microarray - Leukemia vs. normal GeneChip System.
Scenario 6 Distinguishing different types of leukemia to target treatment.
CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.
Introduction to DNA microarray technologies Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
What Is Microarray A new powerful technology for biological exploration Parallel High-throughput Large-scale Genomic scale.
Analysis and Management of Microarray Data Previous Workshops –Computer Aided Drug Design –Public Domain Resources in Biology –Application of Computer.
Genomics I: The Transcriptome
GeneChip® Probe Arrays
MICROARRAY TECHNOLOGY
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Proteome and Gene Expression Analysis Chapter 15 & 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Lecture 23 – Functional Genomics I Based on chapter 8 Functional and Comparative Genomics Copyright © 2010 Pearson Education Inc.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Introduction to Oligonucleotide Microarray Technology
Microarray: An Introduction
Detecting DNA with DNA probes arrays. DNA sequences can be detected by DNA probes and arrays (= collection of microscopic DNA spots attached to a solid.
Microarray - Leukemia vs. normal GeneChip System.
The Basics of cDNA Microarray Technology
Microarray Technology and Applications
Lecture 11 By Shumaila Azam
Introduction to cDNA Microarray Technology
Microarray Data Analysis
Data Type 1: Microarrays
Presentation transcript:

Introduction to Microarray Dr G. P. S. Raghava

Molecular Biology Overview Nucleus Cell Chromosome Gene (DNA) Protein Gene (mRNA), single strand

Measuring Gene Expression Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein would be more direct, but is currently harder.

(RT)

The Goals Basic Understanding Arrays can take a snap shot of which subset of genes in a cell is actively making proteins Heat shock experiments Medical diagnosis Microarrays can indicate where mutations lie that might be linked to a disease. Still others are used to determine if a person’s genetic profile would make him or her more or less susceptible to drug side effects 1999 – A genechip containing 6800 human genes was used distinguish between myeloid leukemia and lympholastic leukemia using a set of 50 genes that have different activity levels Drug design Pharmaceutical firms are in a rush to translate the human genome results into new products Potential profits are huge First, though, they must figure out what the genes do, how they interact, and how they relate to diseases. Evaluation, Specificity, Response

Microarray Potential Applications Biological discovery new and better molecular diagnostics new molecular targets for therapy finding and refining biological pathways Recent examples molecular diagnosis of leukemia, breast cancer, ... appropriate treatment for genetic signature potential new drug targets

History 1980s: antibody-based assay (protein chip?) ~1991: high-density DNA-synthetic chemistry (Affymetrix/oligo chips) ~1995: microspotting (Stanford Univ/cDNA chips) replacing porous surface with solid surface replacing radioactive label with fluorescent label improvement on sensitivity

What is a DNA Microarray? genes or gene fragments attached to a substrate (glass) Tens of thousands of spots/genes =entire genome in 1 experiment A Revolution in Biology Hybridized slide Two dyes Image analyzed

Gene Expression Microarrays The main types of gene expression microarrays: Short oligonucleotide arrays (Affymetrix); cDNA or spotted arrays (Brown/Botstein). Long oligonucleotide arrays (Agilent Inkjet);

Terms/Jargons Stanford/cDNA chip one slide/experiment one spot 1 gene => one spot or few spots(replica) control: control spots control: two fluorescent dyes (Cy3/Cy5) Affymetrix/oligo chip one chip/experiment one probe/feature/cell 1 gene => many probes (20~25 mers) control: match and mismatch cells.

Affymetrix Microarrays Raw image 1.28cm 50um ~107 oligonucleotides, half Perfectly Match mRNA (PM), half have one Mismatch (MM) Raw gene expression is intensity difference: PM - MM

DNA Microarrays Each probe consists of thousands of strands of identical oglionucleotides The DNA sequences at each probe represent important genes (or parts of genes) Printing Systems Ex: HP, Corning Inc. Printing systems can build lengths of DNA up to 60 nucleotides long 1.28 x 1.28+ cm glass wafer Each “print head” has a ~100 m diameter and are separated by ~100 m. ( 5,000 – 20,000 probes) Photolithographic Chips Ex: Affymetix 1.28 x 1.28 cm glass/silicon wafer 24 x 24 m probe site ( 500,000 probes) Lengths of DNA up to 25 nucleotides long Requires a new set of masks for each new array type GeneChip

10% Biotin-labeled Uracil The Process Cells Poly-A RNA AAAA cDNA L IVT 10% Biotin-labeled Uracil Antisense cRNA Fragment (heat, Mg2+) Labeled fragments Hybridize Wash/stain Scan (In-vitro Transcription)

Hybridization and Staining GeneChip Biotin Labeled cRNA Hybridized Array L L L L + L L L + L L L L L SAPE Streptavidin- phycoerythrin

Microarray Data First, the Problems: The fabrication process is not error free Probes have a maximum length 25-60 nucleotides Biologic processes such as hybridization are stochastic Background light may skew the fluorescence How do we decide if/how strongly a particular gene is being expressed? Solutions to these problems are still in their infancy

Affymetrix “Gene chip” system Uses 25 base oligos synthesized in place on a chip (20 pairs of oligos for each gene) RNA labeled and scanned in a single “color” one sample per chip Can have as many as 20,000 genes on a chip Arrays get smaller every year (more genes) Chips are expensive Proprietary system: “black box” software, can only use their chips

cDNA Microarray Technologies Spot cloned cDNAs onto a glass microscope slide usually PCR amplified segments of plasmids Label 2 RNA samples with 2 different colors of flourescent dye - control vs. experimental Mix two labeled RNAs and hybridize to the chip Make two scans - one for each color Combine the images to calculate ratios of amounts of each RNA that bind to each spot

cDNA microarrays Compare the genetic expression in two samples of cells PRINT cDNA from one gene on each spot SAMPLES cDNA labelled red/green e.g. treatment / control normal / tumor tissue

HYBRIDIZE Add equal amounts of labelled cDNA samples to microarray. SCAN Laser Detector

“Long Oligos” Like cDNAs, but instead of using a cloned gene, design a 40-70 base probe to represent each gene Relies on genome sequence database and bioinformatics Reduces cross hybridization Cheaper and possibly more sensitive than Affy. system

Images from scanner Resolution standard 10m [currently, max 5m] 100m spot on chip = 10 pixels in diameter Image format TIFF (tagged image file format) 16 bit (65’536 levels of grey) 1cm x 1cm image at 16 bit = 2Mb (uncompressed) other formats exist e.g.. SCN (used at Stanford University) Separate image for each fluorescent sample channel 1, channel 2, etc.

Processing of images Addressing or gridding Segmentation Assigning coordinates to each of the spots Segmentation Classification of pixels either as foreground or as background Intensity determination for each spot Foreground fluorescence intensity pairs (R, G) Background intensities Quality measures

Images in analysis software The two 16-bit images (Cy3, Cy5) are compressed into 8-bit images Display fluorescence intensities for both wavelengths using a 24-bit RGB overlay image RGB image : Blue values (B) are set to 0 Red values (R) are used for Cy5 intensities Green values (G) are used for Cy3 intensities Qualitative representation of results

Images : examples Pseudo-colour overlay Cy3 Cy5 Spot colour Signal strength Gene expression yellow Control = perturbed unchanged red Control < perturbed induced green Control > perturbed repressed

Quantification of expression For each spot on the slide we calculate Red intensity = Rfg - Rbg (fg = foreground, bg = background) and Green intensity = Gfg - Gbg and combine them in the log (base 2) ratio Log2( Red intensity / Green intensity)

Gene Expression Data = slide 1 slide 2 slide 3 slide 4 slide 5 … On p genes for n slides: p is O(10,000), n is O(10-100), but growing, Slides slide 1 slide 2 slide 3 slide 4 slide 5 … 1 0.46 0.30 0.80 1.51 0.90 ... 2 -0.10 0.49 0.24 0.06 0.46 ... 3 0.15 0.74 0.04 0.10 0.20 ... 4 -0.45 -1.03 -0.79 -0.56 -0.32 ... 5 -0.06 1.06 1.35 1.09 -1.09 ... Genes 3 Gene expression level of gene 5 in slide 4 = Log2( Red intensity / Green intensity) These values are conventionally displayed on a red (>0) yellow (0) green (<0) scale.

16-bit TIFF files (Rfg, Rbg), (Gfg, Gbg) R, G Biological question Differentially expressed genes Sample class prediction etc. Experimental design Microarray experiment 16-bit TIFF files Image analysis (Rfg, Rbg), (Gfg, Gbg) Normalization R, G Estimation Testing Clustering Discrimination Biological verification and interpretation

Quality control (-> Flag) How good are foreground and background measurements ? Variability measures in pixel values within each spot mask Spot size Circularity measure Relative signal to background intensity Dapple: b-value : fraction of background intensities less than the median foreground intensity p-score : extend to which the position of a spot deviates from a rigid rectangular grid Flag spots based on these criteria

Replication Why? What is it? To reduce variability To increase generalizability What is it? Duplicate spots Duplicate slides Technical replicates Biological replicates

Practical Application of DNA Microarrays DNA Microarrays are used to study gene activity (expression) What proteins are being actively produced by a group of cells? “Which genes are being expressed?” How? When a cell is making a protein, it translates the genes (made of DNA) which code for the protein into RNA used in its production The RNA present in a cell can be extracted If a gene has been expressed in a cell RNA will bind to “a copy of itself” on the array RNA with no complementary site will wash off the array The RNA can be “tagged” with a fluorescent dye to determine its presence DNA microarrays provide a high throughput technique for quantifying the presence of specific RNA sequences

Analysis and Management of Microarray Data Magnitude of Data Experiments 50 000 genes in human 320 cell types 2000 compunds 3 times points 2 concentrations 2 replicates Data Volume 4*1011 data-points 1015 = 1 petaB of Data

Thanks