Sequence Analysis
Programme 1.A Motif-based Framework for Recognizing Sequence Families Sharan, Myers 9:45-10:10am 10:10-10:40am Coffee Break 2.An HMM Posterior Decoder for Sequence Feature Prediction that Includes Homology Information Käll, Krogh, Sonnhammer 10:40-11:05am 3.Self-Organized Clustering Methods for Familial Binding Profiles Mahony, Golden, Smith, Benos 11:05-11:30am 12:15-1:30pm ISCB Open Business Meeting 4.Statistics of Local Multiple Alignments Prakash, Tompa 1:45-2:10pm 5.Computing the P-value of the Information Content from an Alignment of Multiple Sequences Nagarajan, Jones, Keich2:10-2:35pm
What Controls Gene Expression? Transcription factors Regulatory RNAs –miRNA –smRNA –siRNA Methylation Chromatin
Wasserman and Sandelin, (2004) Applied Bioinformatics for the identification of regulatory elements Nature Reviews Genetics (5):
Transcription Factors Proteins which bind DNA Enhance or repress gene expression Families e.g.: –Homeodomain (Hox) –POU domain (Oct-1) –Helix-loop-Helix (c-Myc) –Zinc Fingers (TFIIIA) –Leucine Zipper (c/EBP) –Winged Helix (Fox family) Approx 10% of genes in Human genome are TF’s
TF Noise 577 TFBS
TF Problems TFBS are small and degenerate TGTGGTAML-1a NNNWAAAYAAAYANNNNN FOXJ2_1 AYMAYAATATTTKN FOXJ2_2 TYAAGTG NKX2-5 Upstream sequences (even conserved) are large
Wasserman and Sandelin, (2004)
Conserved Sites 577 TFBS 101 TFBS
Motifs Collections? Databases/experimental data –Transfac –Jaspar De novo searches/motif finding –Xiaohui Xie, Jun Lu, EJ. Kulbokas, Todd Golub, Vamsi Mootha, Kerstin Lindblad-Toh, Eric Lander, Manolis Kellis (2005) Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals Nature, 2005 Feb 27, doi: /nature03441
Motif Finding From unaligned DNA? –Pattern finding –local multiple alignment Benchmark test sets –M. Tompa, N. Li, T. L. Bailey, G. M. Church, B. De Moor, E. Eskin, A. V. Favorov, M. C. Frith, Y. Fu, W. J. Kent, V. J. Makeev, A. A. Mironov, W. S. Noble, G. Pavesi, G. Pesole, M. Regnier, N. Simonis, S. Sinha, G. Thijs, J. van Helden, M. Vandenbogaert, Z. Weng, C. Workman, C. Ye, and Z. Zhu (2005) Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites. Nature Biotechnology, vol. 23, no. 1, –Compared 13 motif finders: AlignACE, ANN-Spec, Consensus, GLAM, Improbizer, MEME, MEME3, MITRA, MotifSampler, oligo/dyad- analysis, QuickScore, SeSiMCMC, Weeder, YMF
Cre-bp1_c_Jun 7.7 HSF213.0 Cart ER22.1 : HSF80.4 SP186.7 …so how do we determine significance? TFBS frequency?