Download presentation
Presentation is loading. Please wait.
Published byGloria Kelley Barber Modified over 9 years ago
1
Identifying the ortholog of TNF (Tumor necrosis factor) in mosquito genomes Pet Projects:
2
Practice – Install the blast program (1) 1.Download one of the BLAST executable file from NCBI according to the OS of your computer. 2.Save the file in a folder, such as c:\GMS6014\blast\ 3.Run the installation program by double click. Note that three folders will be add to the blast\ folder, bin, data, and doc. 4.Add three more folders to your blast\ directory, “query”, “dbs”, and “out”.
3
Practice – Install the blast program (2) 5.Inspect the contents of the doc, data, and bin folder. Move the programs from blast\bin to the blast folder. 6.Bring a command (cmd) window by typing “cmd” in the Start Run box. 7.Go to the blast folder by typing “cd C:\GMS6014\blast” 8.Try to run the program by typing “blastall”, read the output.
4
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder, rename to elephant141p 2.At the prompt “C:\seqtools\blast >” type the command “bin\makeblastdb –in dbs\Aedes –dbtype nucl” -- format the dataset for the program. 3.Compose the query sequence save as “3TNF.txt” in the “blast\query\” folder. 4.Initiated the search by typing “tblastn –db dbs\Aedes – query query\3TNF.txt –o out\3TNF_Aedes.html –html”
5
What’s in a command? Makeblastdb –in dbs\Aedes –dbtype nucl Program – format database for search. Feed me the input file name Tell me is it a protein sequence file? For more info, refer to the “user manual” file in the blast\doc folder.
6
Advantages of Running BLAST at Your Own Machine Do it at any time, no waiting on the line. Search for multiple sequences at once. Search a defined data set. Automate Blast analysis. Combine Blast with other analysis. …..
7
BLAST is a program implemented in C/C++ void BlastTickProc(Int4 sequence_number, BlastThrInfoPtr thr_info) { if(thr_info->tick_callback && (sequence_number > (thr_info->last_db_seq + thr_info->db_incr))) { NlmMutexLockEx(&thr_info->callback_mutex); thr_info->last_db_seq += thr_info->db_incr; thr_info->tick_callback(sequence_number, thr_info->number_of_pos_hits); thr_info->last_tick = Nlm_GetSecs(); NlmMutexUnlock(thr_info->callback_mutex); } return; } /* Sends out a message every PERIOD (i.e., 60 secs.) for the index. THis function runs as a separate thread and only runs on a threaded platform. Should I care ?
8
Programming language comparison /* TRANSLATION: 3 or 6 frame translate cDNA sequences */ //--------------------------------------------------------------------------- #include "translation.hpp" int main(int argc, char **argv) { int num_seq=0; char string[MAXLINE]; DSEQ * dseq; infile.getline (string,MAXLINE); if (string[0]=='>') strncpy (dbname,string,MAXLINE); while (!infile.eof()) { dseq=Get_Lib_Seq (); if (dseq->reverse==0) Translation (&dseq->name[1], dseq->seq); else Translation (&dseq->name[1], dseq->r_seq); num_seq++; if (num_seq%1000==0) { cout<<num_seq<<endl; cout name<<endl; } delete dseq; } infile.close(); outfile.close(); cout<<num_seq<<" translated"<<endl; getch(); return 0; } DSEQ* Get_Lib_Seq() { int i,n; char str[MAXLINE]; DSEQ* dseq; n = 0; dseq=new DSEQ; strcpy (dseq->name, dbname); while(infile.getline(str,MAXLINE)) {if (str[0] == '>') { strcpy( dbname, str); break; } for(i=0;i<strlen(str);i++) {if(n==MAXSEQ) break; dseq->seq[n++] = str[i]; } dseq->seq[n]='\0'; if(n==MAXSEQ) cout<<"WARNING: sequence"<<dbname<<"too long!"<<endl; dseq->len=n; if (dseq->name[9]=='3') Reverse (dseq); else dseq->reverse=0; return dseq; } void Reverse (DSEQ* dseq) //Reverse dseq {int i,j; j=0; for (i=(dseq->len-1);i>0;i--) { if (dseq->seq[i]=='A'||dseq->seq[i]=='a') dseq->r_seq[j++]='T'; if (dseq->seq[i]=='C'||dseq->seq[i]=='c') dseq->r_seq[j++]='G'; if (dseq->seq[i]=='G'||dseq->seq[i]=='g') dseq->r_seq[j++]='C'; if (dseq->seq[i]=='T'||dseq->seq[i]=='t') dseq->r_seq[j++]='A'; if (dseq->seq[i]=='N'||dseq->seq[i]=='n') dseq->r_seq[j++]='N'; } dseq->r_seq[j++]='\0'; dseq->reverse=1; } void Translation (char name[], char seq[]) { char ppseq[MAXSEQ/3]; for (int f=0; f<3; f++) { outfile "<<"F_"<<f<<name<<endl; int j=0; int len=strlen(seq); for( int i=f; i<len; i=i+3) ppseq[j++]=Translate(&seq[i]); ppseq[j++]='\0'; int m=strlen(ppseq)/50; // output 50 aa per line for (int n=0; n<=m; n++) {for (int i=n*50; i<50*(n+1); i++) {outfile<<ppseq[i]; if (ppseq[i]=='\0') break; } outfile<<endl; } char Translate(char s[]) { int c1,c2,c3; char P, code[3]; //***standard translation table, A(0),C(1), G(2), T(3)***** char table [4][4][4]= {{{'K','N','K','N'},{'T','T','T','T'},{'R','S','R','S'},{'I','I','M','I'}}, {{'Q','H','Q','H'},{'P','P','P','P'},{'R','R','R','R'},{'L','L','L','L'}}, {{'E','D','E','D'},{'A','A','A','A'},{'G','G','G','G'},{'V','V','V','V'}}, {{'*','Y','*','Y'},{'S','S','S','S'},{'*','C','W','C'},{'L','F','L','F'}}}; //*********** table2 for n at 3rd position******************** char table2 [4][4]={{'X','T','X','X'},{'X','P','R','L'}, {'X','A','G','V'},{'X','S','X','X'}}; strncpy (code, s, 3); c1=Convert(code[0]); c2=Convert(code[1]); c3=Convert(code[2]); if (c1>=4 || c2>=4) P='X'; //can be Optimized further here by considering.... else { if (c3>=4) P=table2[c1][c2]; else P=table[c1][c2][c3]; //P=table[Convert(code[0])][Convert(code[1])][Conve rt(code[2])]; } return (P); } int Convert (char c) { char s=c; if (s=='A'||s=='a') return (0); if (s=='C'||s=='c') return (1); if (s=='G'||s=='g') return (2); if (s=='T'||s=='t'||s=='U'||s=='u') return (3); if (s=='N'||s=='n') return (4); else return (5); } f#Translation -- read from fasta DNA file and translate into three frames # import string from Bio import Fasta from Bio.Tools import Translate from Bio.Alphabet import IUPAC from Bio.Seq import Seq ifile = "S:\\Seq\\test.fasta" parser = Fasta.RecordParser() file =open (ifile) iterator = Fasta.Iterator (file, parser) cur_rec = iterator.next() cur_seq = Seq (cur_rec.sequence,IUPACUnambiguousDNA()) translator = Translate.unambiguous_dna_by_id[1] translator.translate (cur_seq) Translation : C Translation : Python
9
Observe: programming is not that difficult Example: Python and bioPython. 1.Simple python scripts. 2.Batch Blast with a Python script.
10
Blast outputoutput
11
Practice – Retrieve the sequence with fastacmd 1.To retrieve the whole sequence for “ AAEL006022- RA ”, type “fastacmd –d dbs\Aedes.cDNA –s “ AAEL006022-RA” –o out\006022.txt” 2.View the sequence file with wordpad,
12
Questions after the Blast search? Questions: Is this a expressed gene in the Aedes mosquito? - Gene prediction & gene structure Is this the true ortholog of TNF? - Fundamentals of sequence comparison What can we learn from the comparison of sequences? -- protein dommains/motifs.
13
Sequence comparison – dot matrix alignment A T G G T A G G G C A T 1 1 A T G C T A G G C C A T Window = 1 Threshold = 1
14
Sequence comparison – dot matrix alignment A T G G T A G G G C A T 1 1 A T G C A A G G C C A T Window: 4 Threshold: 3
15
Practice: dotplot Copy /past the two DNA sequences to the DNA number 1 and 2 windows, respectively. First choose windows size =1 and mismatch limit =0, click the “Make plot” button. Change window size to 3. make a plot Change window size to 5, make a plot. Change mismatch limit to 1, make a plot
16
Sequence comparison – dot matrix alignment
17
How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P P Seq_B: M P P W I
18
Scoring matrix –BLOSUM 62
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.