Download presentation
1
Tutorial 5 Motif discovery
2
Multiple sequence alignments and motif discovery
MEME MAST TOMTOM GOMO PROSITE
3
Can we find motifs using multiple sequence alignment?
A widespread pattern with a biological significance ..YDEEGGDAEE.. ..YGEEGADYED.. ..YDEEGADYEE.. ..YNDEGDDYEE.. ..YHDEGAADEE.. 1 2 3 4 5 6 7 8 9 10 A 3/6 1/6 2/6 D 5/6 E 4/6 G 1/3 H N Y
4
Can we find motifs using multiple sequence alignment (MSA)?
YES! NO
5
Using MSA for motif discovery
Can only work if things align nicely alone For most motifs this is not the case!
6
ClustalW - Input Input sequences Scoring matrix Gap scoring
Input sequences Scoring matrix Gap scoring Output format address
7
Muscle Input sequences Output format Email address
Input sequences Output format address
8
Motif search: from de-novo motifs to motif annotation
gapped motifs Large DNA data
9
MEME – Multiple EM* for Motif finding
Motif discovery from unaligned sequences Genomic or protein sequences Flexible model of motif presence (Motif can be absent in some sequences or appear several times in one sequence) *Expectation-maximization
10
How many times in each sequence? Input file (fasta file)
MEME - Input address How many times in each sequence? Input file (fasta file) Range of motif lengths How many motifs? How many sites?
11
MEME - Output Motif score
12
MEME - Output Motif score Motif length Number of times
13
High information content
MEME - Output Low uncertainty = High information content
14
MEME - Output Multilevel Consensus
15
Patterns can be presented as regular expressions
[AG]-x-V-x(2)-{YW} [] - Either residue x - Any residue x(2) - Any residue in the next 2 positions {} - Any residue except these Examples: AYVACM, GGVGAA
16
MEME - Output Position in sequence Strength of match Sequence names
Motif within sequence
17
Motif location in the input sequence Overall strength of motif matches
MEME - Output Sequence names Motif location in the input sequence Overall strength of motif matches
18
What can we do with motifs?
MAST - Search for them in non annotated sequence databases (protein and DNA) TOMTOM - Find the protein who binds the DNA motifs. GOMO - Find putative target genes (DNA) of motifs and analyze their associated annotation terms. PROSITE - Search for them in annotated protein sequence databases.
19
MAST Searches for motifs (one or more) in sequence databases:
Searches for motifs (one or more) in sequence databases: Like BLAST but motifs for input Similar to iterations of PSI-BLAST Profile defines strength of match Multiple motif matches per sequence Combined E value for all motifs MEME uses MAST to summarize results: Each MEME result is accompanied by the MAST result for searching the discovered motifs on the given sequences.
20
MAST - Input address Database Input file (motifs)
21
Presence of the motifs in a given database
MAST - Output Input motifs Presence of the motifs in a given database
22
TOMTOM Searches one or more query DNA motifs against one or more databases of target motifs, and reports for each query a list of target motifs, ranked by p-value. The output contains results for each query, in the order that the queries appear in the input file.
23
Background frequencies
TOMTOM - Input Input motif Background frequencies Database
24
DNA IUPAC* code Example: YCAY = [TC]CA[TC]
A --> adenosine M --> A C (amino) C --> cytidine S --> G C (strong) G --> guanine W --> A T (weak) T --> thymidine B --> G T C D --> G A T R --> G A (purine) H --> A C T Y --> T C (pyrimidine) V --> G C A K --> G T (keto) N --> A G C T (any) Example: YCAY = [TC]CA[TC] *IUPAC = International Union of Pure and Applied Chemistry
25
TOMTOM - Output Input motif Matching motifs
26
TOMTOM – Output Wrong input, ok results
27
JASPAR Profiles Open data accesss Transcription factor binding sites
Multicellular eukaryotes Derived from published collections of experiments Open data accesss
28
logo Name of gene/protein organism score
29
GOMO GOMO takes DNA binding motifs to find putative target genes and analyze their associated GO terms. A list of significant GO terms that can be linked to the given motifs will be produced. GOMO returns a list of GO-terms that are significantly associated with target genes of the motif. Gene Ontology provides a controlled vocabulary to describe gene and gene product attributes in any organism.
30
GOMO - Input address Database Input file (motifs)
31
GOMO - Output MF - Molecular function BP - Biological process
Input motifs GO annotation MF - Molecular function BP - Biological process CC - Cellular compartment
32
Prosite ProSite is a database of protein domains and motifs that can be searched by either regular expression patterns or sequence profiles.
34
Input motif a regular expression
Prosite - input Database Filters
35
Location in the protein sequence
Input motif Prosite - Output Location in the protein sequence protein
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.