Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparing Protein Sequences Tutorial 4. Comparing Protein Sequences Substitution Matrices –PAM –BLOSUM Advance comparison tools –Psi-BLAST –Phi-BLAST.

Similar presentations


Presentation on theme: "Comparing Protein Sequences Tutorial 4. Comparing Protein Sequences Substitution Matrices –PAM –BLOSUM Advance comparison tools –Psi-BLAST –Phi-BLAST."— Presentation transcript:

1 Comparing Protein Sequences Tutorial 4

2 Comparing Protein Sequences Substitution Matrices –PAM –BLOSUM Advance comparison tools –Psi-BLAST –Phi-BLAST

3 Substitution Matrix Scoring matrix S –20x20 for protein alignment (Amino-acid) S i,j represents the gain/penalty due to substituting AA i by AA j –Based on likelihood this substitution is found in nature –Computed differently in PAM and BLOSUM

4 Substitution Matrix Computing S i,j * log S Different in PAM and BLOSUM

5 Computing probability of Mutation (M i,j ) PAM –Based on closely related proteins –Matrices for comparison of divergent proteins computed BLOSUM –Based on conserved blocks bounded in similarity (at least X% identical) –Matrices for divergent proteins are derived using appropriate X%

6 PAM-1 Captures mutation rates between close proteins –1% divergence –M i,j = A  B / #A Problematic when comparing far proteins –The 1% divergence does not capture more sporadic mutations –PAM250 is theoretical (extrapolation based)

7 PAM & BLOSUM PAM matrices are based on global alignments of closely related proteins. The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence. Other PAM matrices are extrapolated from PAM1. BLOSUM matrices are based on local alignments. BLOSUM 62 is a matrix calculated from comparisons of sequences with at least 62% identity in the blocks. All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins.

8 PAM100 ~ BLOSUM90 Closely Related PAM120 ~ BLOSUM80 PAM160 ~ BLOSUM60 PAM200 ~ BLOSUM52 PAM250 ~ BLOSUM45 Highly Divergent Query lengthMatrixGap costs <35PAM309,1 35-50PAM7010,1 50-85BLOSUM8010,1 >85BLOSUM6211,1 Use Recommendations

9 Example Query: >ADRM1_HUMAN (A glycosylated plasma membrane protein which promotes cell adhesion Data Base: nr on Human genome. Blast Program: BLASTP Matrices: PAM30,BLOSUM45

10 PAM 30 BLOSUM45 With BLOSUM45 we found related and divergent sequences. With PAM30 we found only related sequences. What difference do we observe?

11 PAM 30 BLOSUM45 With BLOSUM45 we can discover interesting relations between proteins...... Mucin-13:a glycosylated membrane protein that protects the cell by binding to pathogens

12 With PAM 30 With BLOSUM45 Using different scoring matrices can produce slightly Different alignments:

13 A same alignment can be solved in many ways, specially when using a matrix for highly divergent sequences (BLOSUM45):

14 PSI-BLAST Position Specific Iterative BLAST We will analyze the following Archeal uncharacterized protein: >gi|2501594|sp|Q57997|Y577_METJA PROTEIN MJ0577 MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVI DEREIKKRDIFSLLLGVAGLNKSVEEFENELKNKLTEEAKNK MENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIM GSHGKTNLKEILLGSVTENVIKKSNKPVLVVKRKNS

15

16 Threshold for initial BLAST Search (default:10) Threshold for inclusion in PSI-BLAST iterations (default:0.005)

17 The query itself Orthologou s sequences in two other archaeal species Other homologous sequences

18

19 .................. Is MJ0577 a filament protein? Is MJ0577 a cationic amino transporter? Is MJ0577 a universal stress protein?

20 Pattern Hit Initiated BLAST PHI-BLAST A-T-X-[AVG]R-S

21 Pattern symbols []= For grouping up aminoacids that can happen at a given position ()= For numbers, when a residue (or group of residues) is repited - = For separating between positions

22 Making a pattern [LIVM](2)-D-E-A-D-[RKEN]-x-[LI] …LIDEADKTT… …IMDEADEFL… …LLDEADKCL… …ILDEADRIL… …VVDEADNFI… …LVDEADKGI… …LMDEADEFL… …MLDEADRSI… …LIDEADKML… …MLDEADNWI… …LVDEADRFL…

23 Example >gi|71154193|sp|P0A9P6|DEAD_ECOLI Cold-shock DEAD box protein A (ATP-dependent RNA helicase deaD) MAEFETTFADLGLKAPILEALNDLGYEKPSPIQAECIPHLLNGRDVLGMAQTGSGKTAAFSLPLLQNLDP ELKAPQILVLAPTRELAVQVAEAMTDFSKHMRGVNVVALYGGQRYDVQLRALRQGPQIVVGTPGRLLDHL KRGTLDLSKLSGLVLDEADEMLRMGFIEDVETIMAQIPEGHQTALFSATMPEAIRRITRRFMKEPQEVRI QSSVTTRPDISQSYWTVWGMRKNEALVRFLEAEDFDAAIIFVRTKNATLEVAEALERNGYNSAALNGDMN QALREQTLERLKDGRLDILIATDVAARGLDVERISLVVNYDIPMDSESYVHRIGRTGRAGRAGRALLFVE NRERRLLRNIERTMKLTIPEVELPNAELLGKRRLEKFAAKVQQQLESSDLDQYRALLSKIQPTAEGEELD LETLAAALLKMAQGERTLIVPPDAPMRPKREFRDRDDRGPRDRNDRGPRGDREDRPRRERRDVGDMQLYR IEVGRDDGVEVRHIVGAIANEGDISSRYIGNIKLFASHSTIELPKGMPGEVLQHFTRTRILNKPMNMQLL GDAQPHTGGERRGGGRGFGGERREGGRNFSGERREGGRGDGRRFSGERREGRAPRRDDSTGRRRFGGDA The DEAD box pattern: [LIVM](2)-D-E-A-D-[RKEN]-x-[LI]


Download ppt "Comparing Protein Sequences Tutorial 4. Comparing Protein Sequences Substitution Matrices –PAM –BLOSUM Advance comparison tools –Psi-BLAST –Phi-BLAST."

Similar presentations


Ads by Google