Download presentation
Presentation is loading. Please wait.
1
Exploring Protein Sequences Tutorial 5
2
Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar
3
More than two sequences –DNA –Protein Evolutionary relation –Homology Phylogenetic tree –Detect motif Multiple Sequence Alignment GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC A DB C GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC
4
Dynamic Programming –Optimal alignment –Exponential in #Sequences Progressive –Efficient –Heuristic Multiple Sequence Alignment GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC A DB C GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC
5
ClustalW “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al
6
Progressive –At each step align two existing alignments or sequences –Gaps present in older alignments remain fixed ClustalW GTCGTAGTCG-GC-T GTC-TAG-CGAGCGT GC-GAAG-AG-GCG- GCCGTCG-CG-TCGT GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC
7
ClustalW - Input Scoring matrix Gap scoring Input sequences
8
ClustalW - Output
9
Input sequences Pairwise alignment scores Building alignment Final score
10
ClustalW - Output
11
ClustalW Output Sequence namesSequence positions Match strength in decreasing order: * :.
12
http://http://www.megasoftware.net/
13
Can we find motifs using multiple sequence alignment? 12345678910 A000000.51/61/300 D00.51/3001/65/61/60 E002/31000015/6 G01/60011/30000 H01/600000000 N0 00000000 Y1000000.5 00 1 3 5 7 9..YDEEGGDAEE....YGEEGADYED....YDEEGADYEE....YNDEGDDYEE....YHDEGAADEE.. * :** *: Motif A widespread pattern with a biological significance
14
Can we find motifs using multiple sequence alignment? YES! NO
15
MEME – Multiple EM for Motif finding http://meme.sdsc.edu/ Motif discovery from unaligned sequences –Genomic or protein sequences Flexible model of motif presence (Motif can be absent in some sequences or appear several times in one sequence)
16
MEME - Input Email addressMultiple input sequences How many times in each sequence? How many motifs? How many sites? Range of motif lengths
17
MEME - Output Motif length Number of times Like BLAST
18
MEME - Output Probability * 10 ‘a’=10, ‘:’=0
19
MEME - Output Low uncertainty = High information content
20
MEME - Output Multilevel Consensus
21
Sequence names Reverse complement (genomic input only) Position in sequence Strength of match Motif within sequence MEME - Output
22
Overall strength of motif matches sequence lengths Motif instance MEME - Output ‘-’=Other strand
23
MAST Searches for motifs (one or more) in sequence databases: –Like BLAST but motifs for input –Similar to iterations of PSI-BLAST Profile defines strength of match –Multiple motif matches per sequence –Combined E value for all motifs MEME uses MAST to summarize results: –Each MEME result is accompanied by the MAST result for searching the discovered motifs on the given sequences.
24
JASPARJASPAR Profiles –Transcription factor binding sites –Multicellular eukaryotes –Derived from published collections of experiments Open data accesss
25
JASPARJASPAR profiles –Modeled as matrices. –can be converted into PSSM for scanning genomic sequences. 12345678910 A000000.51/61/300 D00.51/3001/65/61/60 E002/31000015/6 G01/60011/30000 H01/600000000 N0 00000000 Y1000000.5 00
26
Search profile http://jaspar.cgb.ki.se/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.