Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor
Biological Motives A large number of biological units with common functions tend to exhibit similarities at the sequence level. These include very short “motives”, such as gene splice sites, DNA regulatory binding sites, recognized by transcription factors (proteins that bind to the promoter and control gene expression), microRNAs, and all the way to protein families. Often it is desirable to model such motives, to enable searching for new ones. Probabilistic models are very useful. Today we deal with PSSM - the simplest.
Promoter…
Regulation of Genes Gene Regulatory Element RNA polymerase (Protein) Transcription Factor (Protein) DNA
Gene RNA polymerase Transcription Factor (Protein) Regulatory Element DNA Regulation of Genes
Gene RNA polymerase Transcription Factor Regulatory Element DNA New protein Regulation of Genes
Motif Logo Motifs can mutate on less important bases. The five motifs at top right have mutations in position 3 and 5. Representations called motif logos illustrate the conserved regions of a motif TGGGGGA TGAGAGA TGGGGGA TGAGAGA TGAGGGA Position:
Example: Calmodulin-Binding Motif (calcium-binding proteins)
PSSM Starting Point A gap-less MSA of known instances of a given motif. Representing the motif by either: 1.Consensus. 2.Position Specific Scoring Matrix (PSSM). Consider now a specific “motives server”, called Consite.
Sequence logos: Visualizing PSSMs
Sequence logos: Visualizing PSSMs (2)