Download presentation
Presentation is loading. Please wait.
Published byBlaze Crockford Modified over 10 years ago
1
Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin, nerve), but all arose from a single cell (the fertilized egg) Each cell contains a complete copy of the genome (the program for making the organism), encoded in DNA.
2
DNA DNA molecules are long double-stranded chains; 4 types of bases are attached to the backbone: adenine (A), guanine (G), cytosine (C), and thymine (T). A pairs with T, C with G. A gene is a segment of DNA that specifies how to make a protein. Human DNA has about 30-35,000 genes; Rice -- about 50-60,000, but shorter genes.
3
Exons and Introns: Data and Logic? exons are coding DNA (translated into a protein), which are only about 2% of human genome introns are non-coding DNA, which provide structural integrity and regulatory (control) functions exons can be thought of program data, while introns provide the program logic Humans have much more control structure than rice
4
Gene Expression Cells are different because of differential gene expression. About 40% of human genes are expressed at one time. Gene is expressed by transcribing DNA into single-stranded mRNA mRNA is later translated into a protein Microarrays measure the level of mRNA expression
5
Gene Expression Measurement mRNA expression represents dynamic aspects of cell mRNA expression can be measured with latest technology mRNA is isolated and labeled with fluorescent protein mRNA is hybridized to the target; level of hybridization corresponds to light emission which is measured with a laser
6
Molecular Biology Overview Cell Nucleus Chromosome Protein Graphics courtesy of the National Human Genome Research Institute Gene (DNA) Gene (mRNA), single strand
7
Gene Expression Microarrays The main types of gene expression microarrays: Short oligonucleotide arrays (Affymetrix); cDNA or spotted arrays (Brown/Botstein). Long oligonucleotide arrays (Agilent Inkjet); Fiber-optic arrays...
8
DNA Chip Microarrays Put a large number (~100K) of cDNA sequences or synthetic DNA oligomers onto a glass slide in known locations on a grid. Label an RNA sample and hybridize (Label 2 RNA samples with 2 different colors of flourescent dye - control vs. experimental) Mix two labeled RNAs and hybridize to the chip Measure amounts of RNA bound to each square in the grid Make comparisons –Cancerous vs. normal tissue –Treated vs. untreated –Time course
9
Spot your own Chip (plans available for free from Pat Brown’s website) Robot spotter Ordinary glass microscope slide
10
cDNA Spotted Microarrays
11
Affymetrix “Gene chip” system Uses 25 base oligos synthesized in place on a chip (20 pairs of oligos for each gene) RNA labeled and scanned in a single “color” –one sample per chip Can have as many as 20,000 genes on a chip Arrays get smaller every year (more genes) Chips are expensive Proprietary system: “black box” software, can only use their chips
12
Affymetrix Microarrays 50um 1.28cm ~10 7 oligonucleotides, half Perfectly Match mRNA (PM), half have one Mismatch (MM) Raw gene expression is intensity difference: PM - MM Raw image
15
Data Acquisition Scan the arrays Quantitate each spot Subtract background Normalize Export a table of fluorescent intensities for each gene in the array
16
Normalization Can control for many of the experimental sources of variability (systematic, not random or gene specific) Bring each image to the same average brightness Can use simple math or fancy - –divide by the mean (whole chip or by sectors) –LOESS (locally weighted regression) No sure biological standards
17
Multiple Comparisons In a microarray experiment, each gene (each probe or probe set) is really a separate experiment Yet if you treat each gene as an independent comparison, you will always find some with significant differences –(the tails of a normal distribution)
18
Microarray Potential Applications Biological discovery –new and better molecular diagnostics –new molecular targets for therapy –finding and refining biological pathways Recent examples –molecular diagnosis of leukemia, breast cancer,... –appropriate treatment for genetic signature –potential new drug targets
19
Microarray Data Analysis Types Gene Selection –find genes for therapeutic targets –avoid false positives (FDA approval ?) Classification (Supervised) –identify disease (biomaker study) –predict outcome / select best treatment Clustering (Unsupervised) –find new biological classes / refine existing ones –Understanding regulatory relationship/pathway –exploration …
20
Microarray Data Mining Challenges too few records (samples), usually < 100 too many columns (genes), usually > 1,000 Too many columns likely to lead to False positives for exploration, a large set of all relevant genes is desired for diagnostics or identification of therapeutic targets, the smallest set of genes is needed model needs to be explainable to biologists
21
Microarray Data Classification Prediction: ALL or AML Gene Value D26528_at 193 D26561_cds1_at -70 D26561_cds2_at 144 D26561_cds3_at 33 D26579_at 318 D26598_at 1764 D26599_at 1537 D26600_at 1204 D28114_at 707 Data Mining model New sample Microarray chipsImages scanned by laser Datasets
22
Data Preparation Issues Thresholding: usually min 20, max 16,000 –For older Affy chips (new Affy chips do not have negative values) Filtering - remove genes with insufficient variation –e.g. MaxVal - MinVal < 500 and MaxVal/MinVal < 5 –biological reasons –feature reduction for algorithmic For clustering, normalize each gene (sample) separately to Mean = 0, Std. Dev = 1
23
Normalization issues Within-slide –What genes to use –Location –Scale Paired-slides (dye swap) –Self-normalization Between slides
24
Control RNA Sample Test RNA Sample Hybridization to microarray filters Use Phosphor Imager laser scanner to obtain densities of each spot on filter. radio-labelled cDNA probes Reverse-Transcription Compare densities at each spot to determine if treatment changes gene expression. Compile subset of differentially expressed genes. Gene Control Test A 1X 3X : : : Z 1X 0.5X
25
Normalization continued Intensity-dependent normalization (Yang, YH, 2002 ) –Do M-A plot to check the data distribution, where –Use Lowess function in R to perform normalization where c(A) is the lowess fit to the M-A plot –Transform data by M'=M - c(A). –Locally nonparametric method and is robust to a small number of differentially expressed genes.
26
(R,G) (M,A) Transformation “Observed” data {(R,G)} R = red channel signal G = green channel signal (background corrected or not) Transformed data {(M,A)} M = log 2 (R/G) (ratio), A = log 2 (R·G) 1/2 = 1/2·log 2 (R·G) (intensity) R=(2 2A+M ) 1/2, G=(2 2A-M ) 1/2
27
Normalization Regression normalization: –Fit the linear regression model: –Assumption: all the genes on the array have the same variance (homogeneity) –Test the significance of the intercept . Fit a linear regression without if it is insignificant. –Transform the treatment data: –Problem: assumption may not hold nonlinear trend (the third replicates of RL95 data has a slight quadratic trend).
28
Scatter plot of log intensity before and after regression normalization
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.