Identifying the ortholog of TNF (Tumor necrosis factor) in mosquito genomes Pet Projects:

Slides:



Advertisements
Similar presentations
JavaScript I. JavaScript is an object oriented programming language used to add interactivity to web pages. Different from Java, even though bears some.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM.
Linux Platform  Download the source tar ball from the BLAST source code link  ncbi-blast src.tar.gz  Compilation  cd /BLASTdirectory/c++ ./configure.
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
Psi-Blast: Detecting structural homologs Psi-Blast was designed to detect homology for highly divergent amino acid sequences Psi = position-specific iterated.
What is Blast What/Why Standalone Blast Locating/Downloading Blast Using Blast You need: Your sequence to Blast and the database to search against.
Public Resources (II) – Analysis tools  Web-based analysis tools – easy to use, but often with less customization options.  Stand-alone analysis tools.
Python programs How can I run a program? Input and output.
Streaming Twitter. Install pycurl library Use a lab computer From the course website Download the links from pycurl and twitter streamer Extract site-packages.zip,
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
CSE 305 Theory of Database Tutorial on Connecting with Sybase from Java program and Developing GUI Jalal Mahmud, TA, CSE 305.
BioPython Workshop Gershon Celniker Tel Aviv University.
PMS /134/182 HEX 0886B6 PMS /39/80 HEX 5E2750 PMS /168/180 HEX 00A8B4 PMS /190/40 HEX 66CC33 By Adrian Gardener Date 9 July 2012.
Public Resources for Bioinformatics Databases : how to find relevant information. Analysis Tools.
IST 210: PHP BASICS IST 210: Organization of Data IST210 1.
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
1 Data Bound Controls II Chapter Objectives You will be able to Use a Data Source control to get data from a SQL database and make it available.
© 2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved. Data Structures for Java William H. Ford William R. Topp Appendix E The EZJava.

Adding GO GO Workshop 3-6 August GOanna results and GOanna2ga 2. gene association files 3. getting GO for your dataset 4. adding more GO (introduction)
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
Assignment feedback Everyone is doing very well!
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
I Power Higher Computing Software Development Development Languages and Environments.
1 Command-Line Processing In many operating systems, command-line options are allowed to input parameters to the program SomeProgram Param1 Param2 Param3.
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
A brief introduction to javadoc and doxygen. What’s in a program file? 1. Comments 2. Code.
Using Local Tools: BLAST
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
2016/1/27Summer Course1 Pattern Search Problems Part I: Fundament Concept.
© 2012 LogiGear Corporation. All Rights Reserved FitNesseFitNesse Authors: Nghia Pham 1.
1 Getting Started with C++ Part 1 Windows. 2 Objective You will be able to create, compile, and run a very simple C++ program on Windows, using Microsoft.
Automatic and manual sequence alignment Inferring phylogenetic trees Mining web-based databases Estimating rates of molecular evolution Testing evolutionary.
Practice – file types (Cont.) Load the “Mysequence.doc” file to Webcutter using “Choose file” and then “Upload sequence file”. -Notice that the “sequence”
Copyright OpenHelix. No use or reproduction without express written consent1.
Important modules: Biopython, SQL & COM. Information sources  python.org  tutor list (for beginners), the Python Package index, on-line help, tutorials,
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Stand-alone tools 2. 1.Download the zip file to the GMS6014 folder. 2.Unzip the files to a folder named “clustalx”. 3.Edit the MDM2_isoforms_5.fasta file.
IST 210: PHP Basics IST 210: Organization of Data IST2101.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
DNA / protein sequence analysis 第九組成員: 吳宇軒 侯卜夫 朱子豪 王俊偉
Biopython 1. What is Biopython? tools for computational molecular biology to program in python and want to make it as easy as possible to use python for.
Python is Awesome! (and cooler than R). My Research.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Using Local Tools: BLAST
Install external command line softwares
EMBL-EBI, programmatically - take a REST from manual searching: Sequence analysis tools Web Production Team Anna Foix Joon Lee.
Basics of BLAST Basic BLAST Search - What is BLAST?
Mirela Andronescu February 22, 2005 Lab 8.3 (c) 2005 CGDN.
Introduction to Programming
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Tutorial for using Case It for bioinformatics analyses
Introduction to javadoc
Comparative Genomics.
Python programming exercise
Introduction to javadoc
Basic Local Alignment Search Tool (BLAST)
Using Local Tools: BLAST
Using Local Tools: BLAST
Supporting High-Performance Data Processing on Flat-Files
Basic Local Alignment Search Tool
Presentation transcript:

Identifying the ortholog of TNF (Tumor necrosis factor) in mosquito genomes Pet Projects:

Practice – Install the blast program (1) 1.Download one of the BLAST executable file from NCBI according to the OS of your computer. 2.Save the file in a folder, such as c:\GMS6014\blast\ 3.Run the installation program by double click. Note that three folders will be add to the blast\ folder, bin, data, and doc. 4.Add three more folders to your blast\ directory, “query”, “dbs”, and “out”.

Practice – Install the blast program (2) 5.Inspect the contents of the doc, data, and bin folder. Move the programs from blast\bin to the blast folder. 6.Bring a command (cmd) window by typing “cmd” in the Start  Run box. 7.Go to the blast folder by typing “cd C:\GMS6014\blast” 8.Try to run the program by typing “blastall”, read the output.

Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder, rename to elephant141p 2.At the prompt “C:\seqtools\blast >” type the command “bin\makeblastdb –in dbs\Aedes –dbtype nucl” -- format the dataset for the program. 3.Compose the query sequence save as “3TNF.txt” in the “blast\query\” folder. 4.Initiated the search by typing “tblastn –db dbs\Aedes – query query\3TNF.txt –o out\3TNF_Aedes.html –html”

What’s in a command? Makeblastdb –in dbs\Aedes –dbtype nucl Program – format database for search. Feed me the input file name Tell me is it a protein sequence file? For more info, refer to the “user manual” file in the blast\doc folder.

Advantages of Running BLAST at Your Own Machine  Do it at any time, no waiting on the line.  Search for multiple sequences at once.  Search a defined data set.  Automate Blast analysis.  Combine Blast with other analysis.  …..

BLAST is a program implemented in C/C++ void BlastTickProc(Int4 sequence_number, BlastThrInfoPtr thr_info) { if(thr_info->tick_callback && (sequence_number > (thr_info->last_db_seq + thr_info->db_incr))) { NlmMutexLockEx(&thr_info->callback_mutex); thr_info->last_db_seq += thr_info->db_incr; thr_info->tick_callback(sequence_number, thr_info->number_of_pos_hits); thr_info->last_tick = Nlm_GetSecs(); NlmMutexUnlock(thr_info->callback_mutex); } return; } /* Sends out a message every PERIOD (i.e., 60 secs.) for the index. THis function runs as a separate thread and only runs on a threaded platform. Should I care ?

Programming language comparison /* TRANSLATION: 3 or 6 frame translate cDNA sequences */ // #include "translation.hpp" int main(int argc, char **argv) { int num_seq=0; char string[MAXLINE]; DSEQ * dseq; infile.getline (string,MAXLINE); if (string[0]=='>') strncpy (dbname,string,MAXLINE); while (!infile.eof()) { dseq=Get_Lib_Seq (); if (dseq->reverse==0) Translation (&dseq->name[1], dseq->seq); else Translation (&dseq->name[1], dseq->r_seq); num_seq++; if (num_seq%1000==0) { cout<<num_seq<<endl; cout name<<endl; } delete dseq; } infile.close(); outfile.close(); cout<<num_seq<<" translated"<<endl; getch(); return 0; } DSEQ* Get_Lib_Seq() { int i,n; char str[MAXLINE]; DSEQ* dseq; n = 0; dseq=new DSEQ; strcpy (dseq->name, dbname); while(infile.getline(str,MAXLINE)) {if (str[0] == '>') { strcpy( dbname, str); break; } for(i=0;i<strlen(str);i++) {if(n==MAXSEQ) break; dseq->seq[n++] = str[i]; } dseq->seq[n]='\0'; if(n==MAXSEQ) cout<<"WARNING: sequence"<<dbname<<"too long!"<<endl; dseq->len=n; if (dseq->name[9]=='3') Reverse (dseq); else dseq->reverse=0; return dseq; } void Reverse (DSEQ* dseq) //Reverse dseq {int i,j; j=0; for (i=(dseq->len-1);i>0;i--) { if (dseq->seq[i]=='A'||dseq->seq[i]=='a') dseq->r_seq[j++]='T'; if (dseq->seq[i]=='C'||dseq->seq[i]=='c') dseq->r_seq[j++]='G'; if (dseq->seq[i]=='G'||dseq->seq[i]=='g') dseq->r_seq[j++]='C'; if (dseq->seq[i]=='T'||dseq->seq[i]=='t') dseq->r_seq[j++]='A'; if (dseq->seq[i]=='N'||dseq->seq[i]=='n') dseq->r_seq[j++]='N'; } dseq->r_seq[j++]='\0'; dseq->reverse=1; } void Translation (char name[], char seq[]) { char ppseq[MAXSEQ/3]; for (int f=0; f<3; f++) { outfile "<<"F_"<<f<<name<<endl; int j=0; int len=strlen(seq); for( int i=f; i<len; i=i+3) ppseq[j++]=Translate(&seq[i]); ppseq[j++]='\0'; int m=strlen(ppseq)/50; // output 50 aa per line for (int n=0; n<=m; n++) {for (int i=n*50; i<50*(n+1); i++) {outfile<<ppseq[i]; if (ppseq[i]=='\0') break; } outfile<<endl; } char Translate(char s[]) { int c1,c2,c3; char P, code[3]; //***standard translation table, A(0),C(1), G(2), T(3)***** char table [4][4][4]= {{{'K','N','K','N'},{'T','T','T','T'},{'R','S','R','S'},{'I','I','M','I'}}, {{'Q','H','Q','H'},{'P','P','P','P'},{'R','R','R','R'},{'L','L','L','L'}}, {{'E','D','E','D'},{'A','A','A','A'},{'G','G','G','G'},{'V','V','V','V'}}, {{'*','Y','*','Y'},{'S','S','S','S'},{'*','C','W','C'},{'L','F','L','F'}}}; //*********** table2 for n at 3rd position******************** char table2 [4][4]={{'X','T','X','X'},{'X','P','R','L'}, {'X','A','G','V'},{'X','S','X','X'}}; strncpy (code, s, 3); c1=Convert(code[0]); c2=Convert(code[1]); c3=Convert(code[2]); if (c1>=4 || c2>=4) P='X'; //can be Optimized further here by considering.... else { if (c3>=4) P=table2[c1][c2]; else P=table[c1][c2][c3]; //P=table[Convert(code[0])][Convert(code[1])][Conve rt(code[2])]; } return (P); } int Convert (char c) { char s=c; if (s=='A'||s=='a') return (0); if (s=='C'||s=='c') return (1); if (s=='G'||s=='g') return (2); if (s=='T'||s=='t'||s=='U'||s=='u') return (3); if (s=='N'||s=='n') return (4); else return (5); } f#Translation -- read from fasta DNA file and translate into three frames # import string from Bio import Fasta from Bio.Tools import Translate from Bio.Alphabet import IUPAC from Bio.Seq import Seq ifile = "S:\\Seq\\test.fasta" parser = Fasta.RecordParser() file =open (ifile) iterator = Fasta.Iterator (file, parser) cur_rec = iterator.next() cur_seq = Seq (cur_rec.sequence,IUPACUnambiguousDNA()) translator = Translate.unambiguous_dna_by_id[1] translator.translate (cur_seq) Translation : C Translation : Python

Observe: programming is not that difficult Example: Python and bioPython. 1.Simple python scripts. 2.Batch Blast with a Python script.

Blast outputoutput

Practice – Retrieve the sequence with fastacmd 1.To retrieve the whole sequence for “ AAEL RA ”, type “fastacmd –d dbs\Aedes.cDNA –s “ AAEL RA” –o out\ txt” 2.View the sequence file with wordpad,

Questions after the Blast search? Questions: Is this a expressed gene in the Aedes mosquito? - Gene prediction & gene structure Is this the true ortholog of TNF? - Fundamentals of sequence comparison What can we learn from the comparison of sequences? -- protein dommains/motifs.

Sequence comparison – dot matrix alignment A T G G T A G G G C A T 1 1 A T G C T A G G C C A T Window = 1 Threshold = 1

Sequence comparison – dot matrix alignment A T G G T A G G G C A T 1 1 A T G C A A G G C C A T Window: 4 Threshold: 3

Practice: dotplot Copy /past the two DNA sequences to the DNA number 1 and 2 windows, respectively. First choose windows size =1 and mismatch limit =0, click the “Make plot” button. Change window size to 3. make a plot Change window size to 5, make a plot. Change mismatch limit to 1, make a plot

Sequence comparison – dot matrix alignment

How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P P Seq_B: M P P W I

Scoring matrix –BLOSUM 62