DNA microarray and array data analysis

Slides:



Advertisements
Similar presentations
Application of available statistical tools Development of specific, more appropriate statistical tools for use with microarrays Functional annotation of.
Advertisements

Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
What if we want to know what allele(s) of beta-globin an individual has?
Introduction to Microarray
Chapter Six Nucleic Acid Hybridization: Principles & Applications 1.Preparation of nucleic acid probes: - DNA: from cell-based cloning or by PCR. Probe.
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
DNA Microarray: A Recombinant DNA Method. Basic Steps to Microarray: Obtain cells with genes that are needed for analysis. Isolate the mRNA using extraction.
Additional Powerful Molecular Techniques Synthesis of cDNA (complimentary DNA) Polymerase Chain Reaction (PCR) Microarray analysis Link to Gene Therapy.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
Central Dogma 2 Transcription mRNA Information stored In Gene (DNA) Translation Protein Transcription Reverse Transcription SELF-REPAIRING ARABIDOPSIS,
Bacterial Physiology (Micr430)
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
5 µm Millions of copies of a specific oligonucleotide probe >5 760,000 different complementary probes ~ targets Single stranded, labeled ‘target’
A snapshot that captures the activity
Introduce to Microarray
Introduction to DNA microarrays DTU - January Hanne Jarmer.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
GeneChips and Microarray Expression Data
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
and analysis of gene transcription
By Moayed al Suleiman Suleiman al borican Ahmad al Ahmadi
with an emphasis on DNA microarrays
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Future data scientists also need to be skilled in statistics, and to be able to tell stories with data, to make it understandable to a variety of people.
DNA MICROARRAYS WHAT ARE THEY? BEFORE WE ANSWER THAT FIRST TAKE 1 MIN TO WRITE DOWN WHAT YOU KNOW ABOUT GENE EXPRESSION THEN SHARE YOUR THOUGHTS IN GROUPS.
Lecture 22 Introduction to Microarray
Data Type 1: Microarrays
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Introduction to DNA microarrays DTU - May Hanne Jarmer.
Microarray - Leukemia vs. normal GeneChip System.
Scenario 6 Distinguishing different types of leukemia to target treatment.
CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.
Introduction to DNA microarray technologies Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
Genomics I: The Transcriptome
GeneChip® Probe Arrays
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
MICROARRAY TECHNOLOGY
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Introduction to Microarrays. The Central Dogma.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Soybean Microarrays Microarray construction An Introduction By Steve Clough November 2005.
Lecture 23 – Functional Genomics I Based on chapter 8 Functional and Comparative Genomics Copyright © 2010 Pearson Education Inc.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Microarray Data Analysis The Bioinformatics side of the bench.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Introduction to Oligonucleotide Microarray Technology
Gene expression and data analysis. RNA Detection by Northern Blotting.
Microarray: An Introduction
Green with envy?? Jelly fish “GFP” Transformed vertebrates.
Detecting DNA with DNA probes arrays. DNA sequences can be detected by DNA probes and arrays (= collection of microscopic DNA spots attached to a solid.
Arrays How do they work ? What are they ?. WT Dwarf Transgenic Other species Arrays are inverted Northerns: Extract target RNA YFG Label probe + hybridise.
Microarray - Leukemia vs. normal GeneChip System.
The Basics of cDNA Microarray Technology
Microarray Technology and Applications
Lecture 11 By Shumaila Azam
Introduction to cDNA Microarray Technology
The Basics of Microarray Image Processing
Microarray Data Analysis
Data Type 1: Microarrays
Presentation transcript:

DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility at CWRU

What is DNA Microarray DNA microarray is a new technology to measure the level of the mRNA gene products of a living cell. A microarray chip is a rectangular chip on which is imposed a grid of DNA spots. These spots form a two dimensional array. Each spot in the array contains millions of copies of some DNA strand, bonded to the chip. Chips are made tiny so that a small amount of RNA is needed from experimental cells.

DNA Microarray Many applications in both basic and clinical research determining the role a gene plays in a pathway, disease, diagnostics and pharmacology, … There are three main platforms for performing microarray analyses. cDNA arrays (generic, multiple manufacturers) Oligonucleotide arrays (genechips) (Affymetrix) cDNA membranes (radioactive detection)

cDNA Microarray Spot cloned cDNAs onto a glass/nylon microscope slide usually PCR amplified segments of plasmids Complementary hybridization -- CTAGCAGG actual gene -- GATCGTCC cDNA (Reverse transcriptase) -- CUAGCAGG mRNA Label 2 mRNA samples with 2 different colors of fluorescent dye -- control vs. experimental Mix two labeled mRNAs and hybridize to the chip Make two scans - one for each color Combine the images to calculate ratios of amounts of each mRNA that bind to each spot

Spotted Microarray Process CTRL TEST

cDNA Array Experiment Movie http://www.bio.davidson.edu/courses/genomics/chip/chip.html

“Long Oligos” Like cDNAs, but instead of using a cloned gene, design a 40-70 base probe to represent each gene Relies on genome sequence database and bioinformatics Reduces cross hybridization Cheaper and possibly more sensitive than Affy. system

Affymetrix Uses 25 base oligos synthesized in place on a chip (20 pairs of oligos for each gene) cRNA labeled and scanned in a single “color” one sample per chip Can have as many as 47,000 probes on a chip (HG-U133 Plus 2.0 Array) Arrays get smaller every year (more genes) Chips are expensive (about $400/chip) Proprietary system: “black box” software, can only use their chips

Affymetrix Genome Arrays

Affymetrix GeneChip® Probe Array

Affymetrix GeneChip® Probe Arrays Hybridized Probe Cell * * GeneChip Probe Array * * * * Single stranded, fluorescently labeled cRNA target Oligonucleotide probe 24~50µm 1.28cm Each probe cell or feature contains millions of copies of a specific oligonucleotide probe Image of Hybridized Probe Array BGT108_DukeUniv

Affymetrix GeneChip Probe: 25 bases long single stranded DNA oligos Probe Set Affymetrix GeneChip Probe: 25 bases long single stranded DNA oligos Probe Cell: Single square-shaped feature on an array containing one type of probe. Contains millions of probe molecules Probe Pair: Perfect Match/Mismatch

Array Design 5’ 3’ Probe Set Perfect Match Mismatch Twenty oligo probes are selected from the last 600 bases from the 3’ end of the gene Perfect Match Mismatch 25 mer DNA oligo For each probe selected, a partner containing a central mutation is also made Perfect Match Mismatch Probe Set Probe Pair PM MM Probe Cell 24m For each gene a total of 20 probe pairs are arrayed on the chip

Probe Sub-types on chips Known genes Specific transcripts Exemplars Consensus Housekeeping genes Expressed sequence tags (ESTs) Spiked control transcripts

IVT cRNA synthesis amplifies and labels transcripts with Biotin cRNA preparation Total RNA (5-8 mg) AAAAAAAAA cDNA Strand 1 synthesis TTTTTTTTTNNNNNNNNN AAAAAAAAA SS II reverse transcriptase T7RNA pol. promoter cDNA Strand 2 synthesis TTTTTTTTTNNNNNNNNN AAAAAAAAA NNNNN E. coli DNA pol. I T7RNA pol. promoter IVT cRNA synthesis amplifies and labels transcripts with Biotin NNNNNNNNNNNNN AAAAAAAAAAAAAAN TTTTTT T UUUUUUUUUU ……….. …… ……. T7 RNA pol. NNNNNNNN SS II reverse transcriptase may not finish the job that’s why the 3’ end of the DNA are chosen as the probes In Vitro Transcription (IVT) Synthesis Fragmented cRNA cRNA is now ready for hybridization to test chip

Post hybridiz-ation washes cRNA labeled targets Post hybridiz-ation washes Non-Specific Binding Specific Binding B cRNA labeled targets B S FL cDNA probes B—biotin S—streptavidin Biotinylated cRNA was then generated from the cDNA by an in vitro transcription reaction in which biotin-11-CTP and biotin-16-UTP were included. FL-- fluorescent

S FL B B S FL B S FL S FL Streptavidin

Microarray experiment cDNA IVT (B-UTP) B Biotin-Labeled cRNA transcript Cells Poly (A)+ RNA AAAA Fragment (heat, Mg2+) Biotin-Labeled cRNA fragments B Hybridize (1-18 hours) Wash Stain Scan

The chip image data file (or “ The chip image data file (or “.dat” file) is the first part of data acquisition and appears on the computer screen upon completion of the laser scan. .dat file Here, we zoom in to see an individual probe set that has been highlighted Probe set

The first image is “sample1. dat The first image is “sample1.dat.” note the pixel to pixel variation within a probe cell A “*.cel.” file is automatically generated when the “*.dat” image first appears on the screen. Note that this derivative file has homogenous signal intensity within its probe cells .cel file

Affymetrix Algorithms 1. Signal 1.1 Adjusting MMs to purge negative values All MMs < PMs, No adjustment necessary Few MMs > PMs, change MMs based on weighted mean of other MMs Most MMs > PMs, change MMs to be slightly lesss than PM

Affymetrix Algorithms Signal Calculation. Calculate the signal PM 1000 5000 430 765 355 98 3005 413 20333 590 MM 900 2000 230 25 331 40 1200 203 6197 230 Having adjusted the MM values, we now calculate the signal PM-MM 100 3000 200 740 24 58 1805 210 14136 360 Unweighted mean = 2063 The PM values. Standard deviations 1 1 2 3 4 5 6 Weight factor The unweighted mean is vulnerable to outlier data. In order to protect against this, we dampen the effect of outliers by using the Tukey bi-weight mean. PM-MM values that are a number of standard deviations away from the mean are given low weights in accordance with the graph shown here. Individual PM-MM data are multiplied by the weight factor before calculation of the mean. The weighted mean is then called the “signal.” The MM values. The PM-MM values are calculated. Using Tukey’s biweight mean = 1780 Signal (expression level) = 1780

.xls file

ALL_vs_AML_train_set_38_sorted.res

ALL_vs_AML_train_set_38_sorted.cls 27 11 38 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 27 11