Download presentation
Presentation is loading. Please wait.
Published byDarrell Norman Modified over 9 years ago
1
Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search with more sequences from the family together –Consensus sequences (regular expressions) Regular expression Ex. A-[FR]-X(2,3)-M GARCCMH LCAFARLMLMA –Weight matrices or position-specific scoring matrices Not considering gaps – Profiles – Profiles as Hidden Markov Models
2
Chapter 6 - Profiles2 Search with a family of sequences 1.Align the sequences (multiple) 2.Make a profile from part of the alignment 3.Search in the database with the profile 4.As an option, revise the profile, and search again (iteratively)
3
Chapter 6 - Profiles3 Multiple alignments and profiles What weight does amino acid a have in position r in the profile
4
Chapter 6 - Profiles4 Example Clustal X (1.64b) multiple sequence alignment XENLA1 ALVSGPQD------NELDG--MQL XENLA2 AQVNGPQD------NELDG--MQF MOUSE1 PQVEQLEL------GGSP---GDL RAT1 PQVPQLEL------GGGPEA-GDL MOUSE2 PQVAQLEL------GGGPGA-GDL RAT2 PQVAQLEL------GGGPGA-GDL Removed CRILO PQVAQLEL------GGGPGA-DDL RABIT LQVGQAEL------GGGPGA-GGL BOVIN PQVGALEL------AGGPG----- SHEEP PQVGALEL------AGGPG----- Removed PIG PQAGAVEL------GGGLGG---L CANFA LQVRDVEL------AGAPGE-GGL HUMAN LQVGQVEL------GGGPGA-GSL CHICK P-LVSSPL------RGEAGV-LPF ORENI LLGFLPPKAGGAVVQGGEN---EV VERMO LLGFLPAKSGGAAAGG-ENEVAEF 12345678******567890*234 * means removed Cons A B C D E F G H I K L M N P Q R S T V W X Y Z Gap Le 1 P 1 0 -18 -17 -12 -14 -21 -13 -3 -10 1 -2 -15 26 -6 -12 -3 -2 -1 -32 0 -18 0 100 100 2 q -4 0 -18 -5 2 -10 -17 2 -3 3 0 1 -3 -7 11 3 -4 -3 -4 -17 0 -10 0 50 100 3 V 1 0 -5 -23 -17 -6 -15 -17 15 -15 9 7 -17 -16 -13 -17 -7 -3 18 -26 0 -14 0 100 100 4 G 0 0 -12 -8 -7 -14 0 -5 -13 -6 -14 -10 -2 -9 -5 -6 -1 -3 -8 -22 0 -11 0 100 100 5 Q 2 0 -15 1 1 -25 4 -3 -17 -1 -15 -11 1 -7 3 -2 3 -1 -12 -30 0 -20 0 100 100 6 P 1 0 -13 -17 -11 -14 -21 -13 0 -10 0 -1 -13 18 -7 -13 -1 0 3 -32 0 -17 0 100 100 7 E 0 0 -29 12 19 -36 -10 0 -25 7 -24 -19 3 20 13 2 2 0 -17 -41 0 -26 0 100 100 8 L -8 0 -20 -15 -10 -1 -29 -10 7 -7 14 9 -13 -17 -6 -10 -12 -8 3 -20 0 -8 0 100 100 5 g 3 0 -16 5 2 -36 21 0 -28 3 -28 -21 10 -8 4 5 4 -2 -20 -32 0 -25 0 34 34 6 G 4 0 -21 6 0 -49 51 -10 -41 -6 -40 -32 4 -13 -4 -7 3 -9 -30 -40 0 -37 0 100 100 7 G 3 0 -16 -3 -4 -31 23 -11 -22 -8 -20 -16 -2 -12 -5 -9 0 -6 -16 -33 0 -27 0 100 100 8 P 3 0 -24 7 6 -32 -10 -5 -21 -1 -20 -17 0 27 2 -6 2 0 -14 -43 0 -25 0 100 100 9 g 3 0 -19 5 -2 -45 49 -8 -39 -6 -38 -30 9 -13 -5 -6 4 -7 -28 -37 0 -33 0 50 78 0 a 5 0 -3 -2 0 -12 0 -5 -3 -3 -6 -3 -2 -3 -1 -4 1 0 0 -19 0 -12 0 50 78 2 g -1 0 -11 -9 -9 -12 7 -9 -6 -9 -4 0 -6 -13 -7 -10 -4 -6 -6 -18 0 -14 0 50 78 3 q 0 0 -22 13 11 -33 4 0 -26 3 -25 -19 6 6 7 0 3 0 -19 -36 0 -23 0 50 78 4 L -12 0 -10 -37 -28 28 -42 -13 22 -22 29 21 -27 -24 -17 -23 -20 -12 15 1 0 10 0 100 100 * 17 0 0 10 17 3 52 0 0 1 36 2 4 22 21 2 5 0 16 0 0 0 0
5
Chapter 6 - Profiles5 What to take into account when creating a profile? 1. The observed amino acids in position r in the alignment. 2. The number of independent ‘observations’ that has been used for constructing the alignment of position r (for example number of different a.a. in the column) 3. The similarity of a to the amino acids observed in column r, to allow for not yet observed amino acids. Amino acid a is more likely to occur in unknown family members if there are many amino acids similar to a in the known sequences. Thus a ‘background’ scoring matrix should be used. 4. The background (a priori) distribution of the amino acids. 5. The diversity and similarity of the sequences, resulting in the importance (or weight) of each sequence. The known sequences are normally not uniformly distributed in the ‘family space’, and should have different weights in the calculation. 6. The number of gaps over column r and the neighbouring columns. These points are not independent. How these aspects are treated varies with the different methods for profile construction.
6
Chapter 6 - Profiles6 Database search with a profile
7
Chapter 6 - Profiles7 Notations
8
Chapter 6 - Profiles8 Position weight No sequence weight considered now 1.All a.a. In the column count equally 2.A.a occurring many times are favored 3.A.a. Occurring many times are ’punished’
9
Chapter 6 - Profiles9 PSI-BLAST
10
Chapter 6 - Profiles10 Hidden Markov Model
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.