Download presentation
Presentation is loading. Please wait.
1
Microarrays
2
Regulation of Gene Expression
3
Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages
4
Where gene regulation takes place Opening of chromatin Transcription Translation Protein stability Protein modifications
5
Transcriptional Regulation Strongest regulation happens during transcription Best place to regulate: No energy wasted making intermediate products However, slow response time After a receptor notices a change: 1.Cascade message to nucleus 2.Open chromatin & bind transcription factors 3.Recruit RNA polymerase and transcribe 4.Splice mRNA and send to cytoplasm 5.Translate into protein
6
Transcription Factors Binding to DNA Transcription regulation: Certain transcription factors bind DNA Binding recognizes DNA substrings: Regulatory motifs
7
RNA Polymerase TBP Promoter and Enhancers Promoter necessary to start transcription Enhancers can affect transcription from afar Enhancer 1 TATA box Gene X DNA binding sites Transcription factors
8
Example: A Human heat shock protein TATA box: positioning transcription start TATA, CCAAT: constitutive transcription GRE: glucocorticoid response MRE: metal response HSE: heat shock element TATASP1 CCAAT AP2 HSE AP2CCAAT SP1 promoter of heat shock hsp70 0 --158 GENE Motifs:
9
The Cell as a Regulatory Network AB Make D C If C then D If B then NOT D If A and B then D D Make B D If D then B C gene D gene B B Promoter D Promoter B
10
DNA Microarrays Measuring gene transcription in a high- throughput fashion
11
Measuring transcription AAAAAAAAA Gene (DNA) Transcript (RNA) RNA polymerase – cellular enzyme AAAAAAAAA TTTTTTTTT Synthetic primer (oligo dT) Reverse transcriptase (RT) – Retroviral enzyme - Flourescence tags Extract RNA Complementary DNA (cDNA) Expression ~ RNA ~ flourescence
12
What is a microarray
13
What is a microarray (2) A 2D array of DNA sequences from thousands of genes Each spot has many copies of same gene Allow mRNAs from a sample to hybridize Measure number of hybridizations per spot
14
How to make a microarray Method 1: Printed Slides (Stanford) –Use PCR to amplify a 1 kb portion of each gene / EST –Apply each sample on glass slide Method 2: DNA Chips (Affymetrix) –Grow oligonucleotides (20bp) on glass –Several words per gene (choose unique words) If we know the gene sequences, Can sample all genes in one experiment!
15
Microarray Experiment RT-PCR LASER DNA “Chip” High glucose Low glucose
16
Raw data – images Red (Cy5) dot – overexpressed or up-regulated Green (Cy3) dot – underexpressed or down- regulated Yellow dot –equally expressed Intensity - “absolute” level cDNA plotted microarray
17
Levels of analysis Level 1: Which genes are induced / repressed? Gives a good understanding of the biology Methods: Factor-2 rule, t-test. Level 2: Which genes are co-regulated? Inference of function. -Clustering algorithms. Level 3: Which genes regulate others? Reconstruction of networks. - Transcriptions factor binding sites.
18
Experiment: time course Time 0 Genes Sample annotations Gene annotations Intensity (Red) Intensity (Green)
19
Experiment: time course Time 0.5 Genes Intensity (Red) Intensity (Green) Time 0
20
Experiment: time course Genes 0 0 0.5 0 2 0 5 0 7 0 9 0 11 0 Time (hours)
21
Gene expression database Genes Gene expression levels Samples Sample annotations Gene annotations Gene expression matrix
22
Gene expression database Samples Genes Gene expression matrix Timeseries, Conditions A, B, … Mutants in genes a, b … Etc.
23
Data normalization expression of gen x in experiment i expression of gen x in reference Logarithm of ratio - treats induction and repression of identical magnitude as numerical equal but with opposite sign. red/green - ratio of expression – 2 - 2x overexpressed – 0.5 - 2x underexpressed log 2 ( red/green ) - “log ratio” – 1 2x overexpressed – -1 2x underexpressed
24
Analysis of multiple experiments Expression of gene x in m experiments can be represented by an expression vector with m elements Z-transformation: If X ~ N( ),
25
Level 1 2-fold rule: Is a gene 2-fold up (or down) regulated? Students t-test: Is the regulation significantly different from background variation? (Needs repeated measurements)
26
T-test X ~ N( ), Cannot reject H 0 Reject H 0 The p-value is the probability of drawing the wrong conclusion by rejecting a null hypothesis
27
Multiple testing In a microarray experiment, we perform 1 test / gene Prob (correct) = 1 – c Prob (globally correct) = (1 – c n Prob (wrong somewhere) = 1 - (1 – c n e = 1 - (1 – c n For small e : c e n Bonferroni correction for multiple testing of independent events Single comparison Experiment comparison
28
Multiple testing GenesTreated 1Treated 2Control 1Control 2p-value Gene 10.6590810.972340.3726750.695110.010362 Gene 20.3411190.1005490.560260.2859650.052948 Gene 30.6671360.295540.4982840.0192790.150739 Gene 40.8807880.8717840.5520850.2081670.20722 Gene 50.0929420.7566290.4882660.845950.358535 Gene 60.079580.7360490.0228730.4064690.391526 Gene 70.5344970.1469250.6597460.9517310.401714 Gene 80.0620870.6780390.9798140.7959040.418683 Gene 90.2241660.170820.6502150.162220.512849 Gene 100.3729980.1847380.3538790.4511970.545602 Gene 110.5376190.8539970.6067660.0831490.556954 Gene 120.2328550.775750.2757460.4386220.58056 Gene 130.7608630.5085160.8239470.0746370.591919 Gene 140.5685070.9327710.723730.0270960.60806 Gene 150.8384370.5493770.926730.1007890.623721 Gene 160.0174070.7237510.3109770.2204520.836162 Gene 170.8936380.2934720.5422730.8862850.840617 Gene 180.5364790.8879430.8595210.3824040.861986 Gene 190.6756220.6046960.4457130.9164730.904506 Gene 200.8366530.3970730.4385220.7787420.986562 0.05 Significance level
29
Clustering Hierachical clustering: - Transforms n (genes) * m (experiments) matrix into a diagonal n * n similarity (or distance) matrix Similarity (or distance) measures: Euclidic distance Pearsons correlation coefficent Eisen et al. 1998 PNAS 95:14863-14868
30
Vectors in space: distances Gene 1 Gene 2 Experiment 1 Experiment 3 Experiment 2 d
31
Distance Measures: Minkowski Metric r r m i ii m m yxyxd yyyy xxxx myx ||),( )( )( 1 21 21 by defined is metric Minkowski The :features have both and objects two Suppose
32
Most Common Minkowski Metrics ||max),( ||),( 1 ||),( 2 1 1 2 2 1 ii m i m i ii m i ii yxyxd r yxyxd r yxyxd r )distance sup"(" 3, distance) (Manhattan 2, ) distance (Euclidean 1,
33
An Example 4 3 x y
34
Similarity Measures: Correlation Coefficient. and :averages )()( ))(( ),( 1 1 1 1 11 22 1 m i i m m i i m m i m i ii m i ii yyxx yyxx yyxx yxs
35
Similarity Measures: Correlation Coefficient Time Gene A Gene B Gene A Time Gene B Expression Level Time Gene A Gene B
36
Clustering of Genes and Conditions Unsupervised: –Hierarchical clustering –K-means clustering –Self Organizing Maps (SOMs)
37
Ordered dendrograms Hierachical clustering: Hypothesis: guilt-by-association Common regulation -> common function Eisen98
38
Hierarchical Clustering Given a set of n items to be clustered, and an n*n distance (or similarity) matrix, the basic process hierarchical clustering is this: 1.Start by assigning each item to its own cluster, so that if you have n items, you now have n clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain. 2.Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster. 3.Compute distances (similarities) between the new cluster and each of the old clusters. 4.Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.
39
Merge two clusters by: Single-Link Method / Nearest Neighbor (NN): minimum of pairwise dissimilarities Complete-Link / Furthest Neighbor (FN): maximum of pairwise dissimilarities Unweighted Pair Group Method with Arithmetic Mean (UPGMA): average of pairwise dissimilarities
40
Single-Link Method Diagonal n*n distance Matrix Euclidean Distance b a cd (1) cd a,b (2) a,b,c d (3) a,b,c,d
41
Complete-Link Method b a Distance Matrix Euclidean Distance (1) (2) (3) a,b ccd d c,d a,b,c,d
42
Compare Dendrograms 2 4 6 0 Single-LinkComplete-Link
43
Serum stimulation of human fibroblasts (24h) Cholesterol biosynthesis Celle cyclus I-E response Signalling/ Angiogenesis Wound healning
44
Partitioning k-means clustering Self organizing maps (SOMs)
45
k-means clustering Tavazoie et al. 1999 Nature Genet. 22:281-285
46
k-Means Clustering Algorithm 1) Select an initial partition of k clusters 2) Assign each object to the cluster with the closest centre 3) Compute the new centres of the clusters 4) Repeat step 2 and 3 until no object changes cluster
48
1. centroide
49
2. centroide 3. centroide 4. centroide 5. centroide 6. centroide k = 6
50
1. centroide 2. centroide 3. centroide 5. centroide 6. centroide k = 6
51
1. centroide 2. centroide 3. centroide 4. centroide 5. centroide 6. centroide k = 6
52
Self organizing maps Tamayo et al. 1999 PNAS 96:2907-2912
54
1. centroide2. centroide3. centroide 4. centroide 5. centroide6. centroide k = (2,3) = 6
55
k = 6
58
Cluster Co-regulation (DeRisi et al, 1997)
59
Cluster of co-expressed genes, pattern discovery in regulatory regions 600 basepairs Expression profiles Upstream regions Retrieve Pattern over-represented in cluster
60
Some Discovered Patterns Pattern Probability ClusterNo.Total ACGCG 6.41E-3996751088 ACGCGT 5.23E-389452 387 CCTCGACTAA 5.43E-382718 23 GACGCG 7.89E-318640 284 TTTCGAAACTTACAAAAAT 2.08E-292614 18 TTCTTGTCAAAAAGC 2.08E-292614 18 ACATACTATTGTTAAT 3.81E-282213 18 GATGAGATG 5.60E-286824 83 TGTTTATATTGATGGA 1.90E-272413 18 GATGGATTTCTTGTCAAAA 5.04E-271812 18 TATAAATAGAGC 1.51E-262713 18 GATTTCTTGTCAAA 3.40E-262012 18 GATGGATTTCTTG 3.40E-262012 18 GGTGGCAA 4.18E-264020 96 TTCTTGTCAAAAAGCA 5.10E-262913 18 Vilo et al. 2001
61
The " GGTGGCAA " Cluster
62
Clustering and promoter elements Harmer et al. 2000 Science 290:2110-2113
63
Microarray and cancer Alizadeh et al. 2000 Nature 403:505-5011
64
Diffuse large B-cell lymphoma
65
Human tumor patient and normal cells; various conditions Cluster genes across tumors Classify tumors according to genes
67
Regulatory pathways: KEGG
68
Regulatory pathway reconstruction Ideker et al Science 2001
70
Perturbations Selected genes are deleted. RNA is extracted from -strains and from WT under +/- Galactose conditions Repeated measurements enable estimation of statistical significance Compare data – model –Design new experiments Clustering : Self Organizing Maps Protein – mRNA correlations Network correlations –Protein-DNA (Promoter analysis) –Protein-Protein
72
Correlation mRNA – protein levels Mass-spectrometry
73
ICAT reagent Isotope coded affinity tags
74
ICAT procedure
76
Mapping of gene expression changes onto interaction network Yellow: Protein-DNA Blue: Protein-protein
77
Hierarchical clustering of -perturbations
78
Conclusion Significance Database (matrix), data normalization Distances HCL, SOM, k-means Two-sided clustering Promoter elements Metabolic / regulatory pathways Deletion mutants ICAT technology; MS/MS
79
Two sided Clustering
80
-Deletion mutations Vector Chromosomes Homologous recombination
81
Transcriptional profiling of mutants -Mutants Genes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.