Download presentation
Presentation is loading. Please wait.
Published byDerek Shaw Modified over 9 years ago
1
(PSI-)BLAST & MSA via Max-Planck
2
Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt or Uniprot (recommended!) How many? As many as possible, as long as the MSA looks good (next week…) General Issues
3
How long? (length of homologues) Fragments- short homologues (less than 50,60% the query’s length) = bad alignment Ensure your sequences exhibit the wanted domain(s) N/C terminal tend to vary in length between homologues How close? (distance from query sequence) All too close- no information Too many too far- bad alignment Ensure that you have a balanced collection! General Issues
4
From who? (which species the sequence belongs to) Don’t care, all homologues are welcome Orthologues/paralogues may be helpful Sequences from distant/close species provide different types of information Which method? (BLAST/PSI-BLAST) Depends on the protein, available homologues, the goal in mind… General Issues
5
Rules For Choosing Sequences Very similar sequences have little information Very different sequences cause trouble…<30% identical with more than half of the other sequences in the set Choose sequences as distantly related as possible Sequence between 30-80% identical with more than half of the sequences in the set The more sequences the better General Issues
6
Overall work steps 1.Run the search- 1.Select database 2.E-value threshold 3.BLAST or PSI-BLAST- how many rounds? 2.Take out sequences- HSP (slider region) or full sequences 3.Align sequences- choose alignment program 4.View alignment with BioEdit tor another program 5.Calculate trees, conservation scores (ConSurf) etc…
7
(PSI-)BLAST via Max-Planck http://toolkit.tuebingen.mpg.de/sections/search Databases- swissprot, tremble, NR, env, pdb or any combination for proteins, but only NT for DNA. All BLAST programs Main advantage- you can easily extract and filter the HSPs, on top of full sequences
8
The Query Protein Name: Dihydrodipicolinate reductase Enzyme reaction: Molecular process: Lysine biosynthesis (early stages) Organism: E. coli Sequence length: 273 aa
9
Query: DAPB_ECOLI >DAPB_ECOLI MHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGVTVQSSLDAV KDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLL EKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATV RAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNNL The Query Protein
10
(PSI-)BLAST via Max-Planck http://toolkit.tuebingen.mpg.de/psi_blast/ Choose database or databases (selecting a few using CTRL) Upload sequence or MSA
11
(PSI-)BLAST via Max-Planc
15
(PSI-)BLAST via Max-Planck E-value threshold can be assessed using the distribution
16
Forward results to MSA http://toolkit.tuebingen.mpg.de/sections/alignment
17
Forward results to MSA All marked hits or filter by e-value HSP (sider region) or full sequences
18
Forward results to MSA
19
Align via Max-Planck Alignment results: Save the alignment
20
Alignmen viewing & editing BioEdit http://www.mbio.ncsu.edu/BioEdit/BioEdit.html Easy-to-use sequence alignment editor View and manipulate alignments up to 20,000 sequences. F our modes of manual alignment: select and slide, dynamic grab and drag, gap insert and delete by mouse click, and on-screen typing which behaves like a text editor. Reads and writes Genbank, Fasta, Phylip 3.2, Phylip 4, and NBRF/PIR formats. Also reads GCG and Clustal formats
21
Easiest Using Bioedit http://www.mbio.ncsu.edu/BioEdit/bioedit.html Alignment viewing & editing
22
Easiest Using Bioedit http://www.mbio.ncsu.edu/BioEdit/bioedit.html Find a specific sequence: “Edit-> search -> in titles” Erase\add sequences: “Edit-> cut\paste\delete sequence” “Sequence Identity matrix” under “Alignment”- useful for a rough evaluation of distances within the alignment. After taking out sequences, “Minimize Alignment” under “Alignment” takes out unessential gaps. Can save an image using: “File -> Graphic View” & then “Edit -> Copy page as BITMAP” Alignment viewing & editing
23
A little of ConSurf Compute Conservation Scores Give an MSA or will compute one for you (given a FASTA sequence, BLAST & MSA) Main advantage: filters short HSPs, removes redundant sequences Shows conservation scores on sequence or on a protein structure (if available)
24
ConSurf http://consurf.tau.ac.il/
25
ConSurf
26
http://consurf.tau.ac.il/results/1321532763/output.php
27
ConSurf http://consurf.tau.ac.il/results/1321532763/output.php
28
ConSurf MSA colored by conservation PSI-BLAST result MSA Phylogenetic tree Sequences used Sequence conservation
29
ConSurf
30
Jmol- Easy web-based viewer
31
WebLogo http://weblogo.berkeley.edu/logo.cgi
32
WebLogo http://weblogo.berkeley.edu/logo.cgi
33
Each sequence is a different story adjust parameters: BLAST- E-value, substitution matrix, gap penalties, database, minimum length, redundancy level, fragment overlap… PSI-BLAST- BLAST parameters + PSSM inclusion threshold (or chose manually), number of rounds… Try using HSP or full sequences, different MSA programs… No “Miracle solution”
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.