Download presentation
1
DNA microarray and array data analysis
Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility at CWRU
3
What is DNA Microarray DNA microarray is a new technology to measure the level of the mRNA gene products of a living cell. A microarray chip is a rectangular chip on which is imposed a grid of DNA spots. These spots form a two dimensional array. Each spot in the array contains millions of copies of some DNA strand, bonded to the chip. Chips are made tiny so that a small amount of RNA is needed from experimental cells.
4
DNA Microarray Many applications in both basic and clinical research
determining the role a gene plays in a pathway, disease, diagnostics and pharmacology, … There are three main platforms for performing microarray analyses. cDNA arrays (generic, multiple manufacturers) Oligonucleotide arrays (genechips) (Affymetrix) cDNA membranes (radioactive detection)
5
cDNA Microarray Spot cloned cDNAs onto a glass/nylon microscope slide
usually PCR amplified segments of plasmids Complementary hybridization -- CTAGCAGG actual gene -- GATCGTCC cDNA (Reverse transcriptase) -- CUAGCAGG mRNA Label 2 mRNA samples with 2 different colors of fluorescent dye -- control vs. experimental Mix two labeled mRNAs and hybridize to the chip Make two scans - one for each color Combine the images to calculate ratios of amounts of each mRNA that bind to each spot
6
Spotted Microarray Process
CTRL TEST
7
cDNA Array Experiment Movie
8
“Long Oligos” Like cDNAs, but instead of using a cloned gene, design a base probe to represent each gene Relies on genome sequence database and bioinformatics Reduces cross hybridization Cheaper and possibly more sensitive than Affy. system
9
Affymetrix Uses 25 base oligos synthesized in place on a chip (20 pairs of oligos for each gene) cRNA labeled and scanned in a single “color” one sample per chip Can have as many as 47,000 probes on a chip (HG-U133 Plus 2.0 Array) Arrays get smaller every year (more genes) Chips are expensive (about $400/chip) Proprietary system: “black box” software, can only use their chips
10
Affymetrix Genome Arrays
11
Affymetrix GeneChip® Probe Array
12
Affymetrix GeneChip® Probe Arrays
Hybridized Probe Cell * * GeneChip Probe Array * * * * Single stranded, fluorescently labeled cRNA target Oligonucleotide probe 24~50µm 1.28cm Each probe cell or feature contains millions of copies of a specific oligonucleotide probe Image of Hybridized Probe Array BGT108_DukeUniv
13
Affymetrix GeneChip Probe: 25 bases long single stranded DNA oligos
Probe Set Affymetrix GeneChip Probe: 25 bases long single stranded DNA oligos Probe Cell: Single square-shaped feature on an array containing one type of probe. Contains millions of probe molecules Probe Pair: Perfect Match/Mismatch
14
Array Design 5’ 3’ Probe Set Perfect Match Mismatch
Twenty oligo probes are selected from the last 600 bases from the 3’ end of the gene Perfect Match Mismatch 25 mer DNA oligo For each probe selected, a partner containing a central mutation is also made Perfect Match Mismatch Probe Set Probe Pair PM MM Probe Cell 24m For each gene a total of 20 probe pairs are arrayed on the chip
15
Probe Sub-types on chips
Known genes Specific transcripts Exemplars Consensus Housekeeping genes Expressed sequence tags (ESTs) Spiked control transcripts
16
IVT cRNA synthesis amplifies and labels transcripts with Biotin
cRNA preparation Total RNA (5-8 mg) AAAAAAAAA cDNA Strand 1 synthesis TTTTTTTTTNNNNNNNNN AAAAAAAAA SS II reverse transcriptase T7RNA pol. promoter cDNA Strand 2 synthesis TTTTTTTTTNNNNNNNNN AAAAAAAAA NNNNN E. coli DNA pol. I T7RNA pol. promoter IVT cRNA synthesis amplifies and labels transcripts with Biotin NNNNNNNNNNNNN AAAAAAAAAAAAAAN TTTTTT T UUUUUUUUUU ……….. …… ……. T7 RNA pol. NNNNNNNN SS II reverse transcriptase may not finish the job that’s why the 3’ end of the DNA are chosen as the probes In Vitro Transcription (IVT) Synthesis Fragmented cRNA cRNA is now ready for hybridization to test chip
17
Post hybridiz-ation washes
cRNA labeled targets Post hybridiz-ation washes Non-Specific Binding Specific Binding B cRNA labeled targets B S FL cDNA probes B—biotin S—streptavidin Biotinylated cRNA was then generated from the cDNA by an in vitro transcription reaction in which biotin-11-CTP and biotin-16-UTP were included. FL-- fluorescent
18
S FL B B S FL B S FL S FL Streptavidin
19
Microarray experiment
cDNA IVT (B-UTP) B Biotin-Labeled cRNA transcript Cells Poly (A)+ RNA AAAA Fragment (heat, Mg2+) Biotin-Labeled cRNA fragments B Hybridize (1-18 hours) Wash Stain Scan
20
The chip image data file (or “
The chip image data file (or “.dat” file) is the first part of data acquisition and appears on the computer screen upon completion of the laser scan. .dat file Here, we zoom in to see an individual probe set that has been highlighted Probe set
21
The first image is “sample1. dat
The first image is “sample1.dat.” note the pixel to pixel variation within a probe cell A “*.cel.” file is automatically generated when the “*.dat” image first appears on the screen. Note that this derivative file has homogenous signal intensity within its probe cells .cel file
22
Affymetrix Algorithms 1. Signal
1.1 Adjusting MMs to purge negative values All MMs < PMs, No adjustment necessary Few MMs > PMs, change MMs based on weighted mean of other MMs Most MMs > PMs, change MMs to be slightly lesss than PM
23
Affymetrix Algorithms
Signal Calculation. Calculate the signal PM MM Having adjusted the MM values, we now calculate the signal PM-MM Unweighted mean = 2063 The PM values. Standard deviations 1 Weight factor The unweighted mean is vulnerable to outlier data. In order to protect against this, we dampen the effect of outliers by using the Tukey bi-weight mean. PM-MM values that are a number of standard deviations away from the mean are given low weights in accordance with the graph shown here. Individual PM-MM data are multiplied by the weight factor before calculation of the mean. The weighted mean is then called the “signal.” The MM values. The PM-MM values are calculated. Using Tukey’s biweight mean = 1780 Signal (expression level) = 1780
24
.xls file
25
ALL_vs_AML_train_set_38_sorted.res
26
ALL_vs_AML_train_set_38_sorted.cls 27 11 38 2 1
27 11
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.