Identification of regulatory elements
Transcriptional Regulation Strongest regulation happens during transcription Best place to regulate: No energy wasted making intermediate products However, slow response time After a receptor notices a change: 1.Cascade message to nucleus 2.Open chromatin & bind transcription factors 3.Recruit RNA polymerase and transcribe 4.Splice mRNA and send to cytoplasm 5.Translate into protein
Transcription Factors Binding to DNA Transcription regulation: Certain transcription factors bind DNA Binding recognizes DNA substrings: Regulatory motifs
RNA Polymerase TBP Promoter and Enhancers Promoter necessary to start transcription Enhancers can affect transcription from afar Enhancer 1 TATA box Gene X DNA binding sites Transcription factors
Example: A Human heat shock protein TATA box: positioning transcription start TATA, CCAAT: constitutive transcription GRE: glucocorticoid response MRE: metal response HSE: heat shock element TATASP1 CCAAT AP2 HSE AP2CCAAT SP1 promoter of heat shock hsp GENE Motifs:
The Cell as a Regulatory Network AB Make D C If C then D If B then NOT D If A and B then D D Make B D If D then B C gene D gene B B Promoter D Promoter B
Cluster Co-regulation (DeRisi et al, 1997)
Cluster of co-expressed genes, pattern discovery in regulatory regions 600 basepairs Expression profiles Upstream regions Retrieve Pattern over-represented in cluster
Some Discovered Patterns Pattern Probability ClusterNo.Total ACGCG 6.41E ACGCGT 5.23E CCTCGACTAA 5.43E GACGCG 7.89E TTTCGAAACTTACAAAAAT 2.08E TTCTTGTCAAAAAGC 2.08E ACATACTATTGTTAAT 3.81E GATGAGATG 5.60E TGTTTATATTGATGGA 1.90E GATGGATTTCTTGTCAAAA 5.04E TATAAATAGAGC 1.51E GATTTCTTGTCAAA 3.40E GATGGATTTCTTG 3.40E GGTGGCAA 4.18E TTCTTGTCAAAAAGCA 5.10E Vilo et al. 2001
The " GGTGGCAA " Cluster
Pattern discovery strategies Sequence based: Suffix tree scanning Alignment based: –MEME (Expectation Maximization) –GibbsMotif (Gibbs Sampler) –CisModule (Gibbs Sampler for combinations of several modules)
Circadian rhythm Follow the daily cycle (Circa diem: about a day) Many biological systems follow circadian rhythms
Circadian rhythm
Microarray analysis of circadian rhythm Grow plants in 12 hours light and 12 hours dark Switch off light, and start collecting plants every 4 th hour. light dark Start experiment Collect RNA every 4 th hour
continued Extract RNA Label and hybridize to microarray chip Cluster by SOM Identify cluster with circadian memory Retrieve promoter of genes Look for overrepresented words
Clustering and promoter elements Harmer et al Science 290:
Bayesian Networks Analysis Friedman et al J. Comp. Biol., 7:
- Can only represent acyclic relations.
From Gifford 2001 Science 293: genes, 140 experiments
Chromatin IP Chip (ChIP-chip) Iver et al. 2000
Chromatin Immuno Precipitation - Chip
Assembling motifs into networks 1.Define set of genes G that are bound by a set of regulators S at high significance 2.Find coexpressed subset of G 3.Establish core expression profile 4.Drop genes of G that do not match this profile 5.Include new genes that do, even if the binding by regulator is less significant 6.Repeat 4 and 5 until all combinations have been considered
Heterochromatin Dark staining regions, remain condensed in interphase (centromeres and telomeres) Heterochromatic knobs –Discovered in maize by McClintock (1951) In heterochromatin, there is heavy methylation of –Histones (H3mK9) –DNA (5mC) In euchromatin, DNA is not methylated, histones are not K9 methylated, but K4 instead (H3mK4) and acetylated
DDM1 Decrease in DNA methylation 1 Encodes a chromatin remodelling ATPase Pleiotrophic phenotype –Late flowering
siRNA Changes in DNA methylation
1.5 Mb hk4S Gene island
Chromatin IP – Chip (ChIP – Chip) Antibodies against –Histones (H3mK4) –Histones (H3mK9) –DNA (5mC) are used to immunopreticipate fragmented DNA from DDM1 and WT. Labeled red – green Hybridized to chip with 1 kb PCR products spanning the hk4S region Tiling path:
Gene traps