Download presentation
Presentation is loading. Please wait.
Published byEmerald Georgina Clark Modified over 9 years ago
1
database search
2
Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching
3
FastA History : FastA was developed by Lipman and Pearson in 1985, which is the first database search software. EBI provides fastA service, available at http://www.ebi.ac.uk/Tools/fasta/ Idea: identify the short substring matching with the target sequence.
4
other software commonly used http://www.ebi.ac.uk/Tools/sss/
5
example: protein sequence : EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP parameters input sequence select database
7
results 100% identity 17/28=60.7% (identity) 28 aa overlap
8
BLAST Basic Local Alignment Search Tool (BLAST). BLAST was developed by NCBI. BLAST finds regions of similarity between biological sequences.
9
Basic BLAST ProgramSequencedatabaseProgram description BlastnNucleotide Search a nucleotide database using a nucleotide query Algorithms: blastn, megablast, discontiguous megablast BlastpProtein Search protein database using a protein query Algorithms: blastp, psi-blast, phi-blast, delta-blast BlastxNucleotideprotein Search protein database using a translated nucleotide query TblastnProteinNucleotide Search translated nucleotide database using a protein query TblastxNucleotide Search translated nucleotide database using a translated nucleotide query T:translation, n: nucleotide, p:protein ; x: cross
10
BLASTALL Query Sequence Amino acid SequenceDNA Sequence TBLASTxBLASTxBLASTnTBLASTnBLASTp Nucleotide Database Protein Database Nucleotide Database Nucleotide Database Protein Database Translated
11
Blast source 1. NCBI : http://blast.ncbi.nlm.nih.gov/Blast.cgi/ (online version) ftp://ftp.ncbi.nih.gov/blast/ (stand alone) 2.other websites : http://life.zsu.edu.cn/blast/ http://www.fruitfly.org/blast/ http://www.mcgb.uestc.edu.cn/blast/blast.html …
13
BLAST 1. online : from website 2. stand alone : download the software
14
comparison between them web server advantages : 1. easy. 2. update. 4. database download is no need. disadvantages : 1. not suitable for large data. 2. cannot define your own database.
15
Web Blast provided by NCBI Blastn for nucleotide Blastp for protein http://blast.ncbi.nlm.nih.gov/Blast.cgi
16
An example : 1. cctggcgataaccgtcttgtcggcggttgcgctgacgttgcgtcgtgatatcatcagggcAgaccggttacatccccctaa 2. gatcgaaaaacgcttgtgttaaaaatttgctaaattttgccaatttggtaaaacagttgcAtcacaacaggagatagcaat
17
the first sequence
18
The second sequence sequence range software similarity from high to low results shown in new window
19
results of pairwise alignment No significant similarity found information of the two sequences parameters selected
20
Why we need the standalone version of BLAST ? 1. specific database 2. privacy 3. batch processing Blast (standalone version)
21
How to download BLAST ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release blast-2.2.23-ia32-win32.exe
23
unzip, we can get three folders bin: all the exe files data : data for BLAST doc : readme
24
We need to format the database for BLAST. First, save your database as Fasta format; Second, use formatdb provided in BLAST package to format the database. dos command : formatdb –i sequence.fa –p T/F –o T/F –n db_name Blast (standalone version)
25
An example 1. There are 13 proteins in the file “Delta.txt” as the database. 2. 1 protein is selected as the query sequence, and stored in file “seq.txt” ;
26
1. format Delta.txt : formatdb –i Delta.txt –p T parameter : 1. –i: database 2. –p: T-protein , F-nucleotide
27
2. search Delta.txt by using BLAST : Blastall –p blastp –d Delta.txt –i seq.txt –o out.txt parameter : 1. –p: program name : blastp , blastn , blastx , tblastn , tblastx 2. –d: database name 3. –i: query sequences 4. –o: output file
28
3. To read other parameters just type blastall
29
4. Results : Score E Sequences producing significant alignments: (bits) Value P83301|CXO_CONVE 69 1e-017 P69749|CXD6A_CONBU 20 0.009 P69750|CXD6A_CONCN 18 0.036 P24159|CXDB_CONTE P18511|CXDA_CONTE 18 0.042 P60179|CXD66_CONAA 17 0.066 P60513|CXD6A_CONER 17 0.11 P69751|CXD6E_CONCT P69748|CXD6A_CONAI 16 0.19 P69754|CXD6B_CONMA P69753|CXD6A_CONMA 14 0.56 P69752|CXD6B_CONER P58913|CXD6A_CONPU 14 0.62 P69756|CXD6D_CONMA P69755|CXD6C_CONMA 13 0.89 Q9XZK5|CXSO6_CONST P69757|CXD6A_CONSE 12 2.6
30
>P83301|CXO_CONVE Length = 33 Score = 69.3 bits (168), Expect = 1e-017, Method: Compositional matrix adjust. Identities = 33/33 (100%) Query: 1 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP 33 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP Sbjct: 1 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP 33 >P69749|CXD6A_CONBU Length = 27 Score = 20.0 bits (40), Expect = 0.009, Method: Compositional matrix adjust. Identities = 13/30 (43%), Gaps = 6/30 (20%) Query: 1 EDCIAVGQLCVFWNIGRP CCSGLCVFAC 28 C A G C RP CCS C FAC Sbjct: 1 DECSAPGAFCLI RPGLCCSEFCFFAC 26
31
5. pairwise alignment : bl2seq –p blastp –i seq.txt –j 1.txt –o out.txt parameter : 1.–p: program name : blastp , blastn…… 2. –i: first sequence 3. –j: second sequence 4. –o: output files To read other parameter, just type bl2seq
32
6. database can be downloaded from : ftp://ftp.ncbi.nih.gov/blast/db/ scoring matrices can be downloaded from : ftp://ftp.ncbi.nih.gov/blast/matrices/
33
PSI-blast Position specific iterative BLAST (PSI- BLAST). Altschul et al. (1997). Gapped Blast and PSI-Blast: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389-3402 target: only proteins
34
PSI-blast Position specific iterative BLAST (PSI-BLAST) refers to a feature of BLAST 2.0 in which a profile is automatically constructed from the first set of BLAST alignments. PSI- BLAST is similar to NCBI BLAST2 except that it uses position-specific scoring matrices derived during the search, this tool is used to detect distant evolutionary relationships.
35
online source : http://npsa-pbil.ibcp.fr/cgi- bin/npsa_automat.pl?page=/NPSA/npsa_psiblast.ht ml http://blast.ncbi.nlm.nih.gov/Blast.cgi http://blast.ncbi.nlm.nih.gov/Blast.cgi http://www.ebi.ac.uk/Tools/blastpgp/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.