MCB 5472 Assignment #5: RBH Orthologs and PSI-BLAST February 19, 2014.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

MCB 5472 Blast, Psi BLAST, Perl: Arrays, Loops J. Peter Gogarten Office: BPB 404 phone: ,
Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST, PSI-BLAST and position- specific scoring matrices Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Types of homology BLAST
Computer lab exercises #8. Comments on projects worth sharing: 1.Use BLINK whenever possible. It can save a lot of waiting and greatly accelerates explorations.
Linux Platform  Download the source tar ball from the BLAST source code link  ncbi-blast src.tar.gz  Compilation  cd /BLASTdirectory/c++ ./configure.
Run BLAST in command line mode Yanbin Yin Fall
MCB 5472 Psi BLAST, Perl: Arrays, Loops J. Peter Gogarten Office: BPB 404 phone: ,
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
12ex.1. 12ex.2 The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science.
MCB 5472 Psi BLAST, Perl: Arrays, Loops J. Peter Gogarten Office: BPB 404 phone: ,
Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.
Psi-Blast: Detecting structural homologs Psi-Blast was designed to detect homology for highly divergent amino acid sequences Psi = position-specific iterated.
MCB 372 PSI BLAST, scalars J. Peter Gogarten Office: BPB 404 phone: ,
What is Blast What/Why Standalone Blast Locating/Downloading Blast Using Blast You need: Your sequence to Blast and the database to search against.
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
BLAST What it does and what it means Steven Slater Adapted from pt.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
MCB 5472 Psi BLAST, Perl: Arrays, Loops, Hashes J. Peter Gogarten Office: BPB 404 phone: ,
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
BLAST : Basic local alignment search tool B L A S T !
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Blast 1. Blast 2 Low Complexity masking >GDB1_WHEAT MKTFLVFALIAVVATSAIAQMETSCISGLERPWQQQPLPPQQSFSQQPPFSQQQQQPLPQ QPSFSQQQPPFSQQQPILSQQPPFSQQQQPVLPQQSPFSQQQQLVLPPQQQQQQLVQQQI.
MCB 5472 Lecture #4: Probabilistic models of homology: Psi-BLAST and HMMs February 17, 2014.
Large scale genomes comparisons Practical sessions Fredj Tekaia Institut Pasteur EMBO Bioinformatic and Comparative Genome Analysis Course.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
What is BLAST? BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Assignment feedback Everyone is doing very well!
Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Copyright OpenHelix. No use or reproduction without express written consent1.
MCB 3421 class 25. student evaluations Please go to husky CT and complete student evaluations !
What is BLAST? Basic BLAST search What is BLAST?
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource Kutbuddin Doctor, PhD.
Lab 3.2: Database Similarity Searching “The BLAST Buffet” Stephanie Minnema University of Calgary.
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
What is BLAST? Basic BLAST search What is BLAST?
Stand alone BLAST on Linux
A Practical Guide to NCBI BLAST
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST program selection guide
Identifying templates for protein modeling:
BLAST.
Identify D. melanogaster ortholog
Comparative Genomics.
Basic Local Alignment Search Tool
Blast, Psi BLAST, Perl: Arrays, Loops
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Welcome - webinar instructions
Presentation transcript:

MCB 5472 Assignment #5: RBH Orthologs and PSI-BLAST February 19, 2014

Assignment feedback Always check your output files! PRINT is your friend Hash keys and array positions can (and often should) be called directly People seem to want to always loop through – this is unnecessary and seems to be leading people astray Protein vs. nucleotide paralogs – why?

This week 1.Count RBH protein orthologs between our complete E.coli genomes 2.Use PSI-BLAST to find divergent homologs of molecular parasites

Genome A Genome B Recall: RBH orthologs Genome A Genome B Best matches in both directions - ortholog Best matches in only 1 direction - not ortholog Different matches in each direction - not ortholog

Discuss: how will this code work? 1.BLAST proteins from each genome vs. themselves What alignment and similarity thresholds to use? 2.Store first set of BLAST hits in a hash Key: query IDValue: best match ID 3.Parse reciprocal BLAST Store in a second hash or evaluate as you go 4.Use first hash to identify RBH hits RBH if: $first_hash{2 nd _BLAST_best_hit} eq 2 nd _BLAST_query_id 5.Tabulate shared and unique proteins Partially do as you go

Submit for part 1: Number of shared orthologs, unique sequences in each genome Perl scripts Short (1-2 sentences) justification of BLAST similarity thresholds

Part 2: Find homologs of molecular parasites Make BLAST database of proteins and chromosomes for one or more complete genomes of your choice Can be our model E.coli Download the protein sequence for a molecular parasite of your choice

Part 2: BLASTp Query reference genomes using BLASTp blastp -db all.faa -query integrase1.fa -evalue 1e-5 - num_threads 2 -out blastp.out -outfmt 6 -comp_based_stats 1 -comp_based_stats : correction for compositional biases in query; only 0 (off) or 1 can be used with psiblast (default is 2 ) -num_threads : multithreading parameter for computational efficiency

Part 2 -outfmt 6 : same as –outfmt 7 but no header lines i.e., every line is a hit Can count hits from terminal using wc –l wc : word count -l flag: count number of lines

Part 2: PSI-BLAST Make pssm of query vs NCBI nr database psiblast -db nr -query integrase1.fa - num_iterations 5 -num_threads 2 - inclusion_ethresh 1e-5 -out blast.out -out_pssm integrase1.pssm - comp_based_stats 1 Search reference genome proteins using that pssm psiblast -db all.faa -in_pssm integrase1.pssm -num_iterations 1 - num_threads 2 -inclusion_ethresh 1e-5 -outfmt 6 -out psiblastp.out -evalue 1e-5 -comp_based_stats 1

Part 2: tBLASTn Search reference genome itself using tBLASTn combined with the pssm tblastn -in_pssm integrase1.pssm -db all.fna -evalue 1e-5 -num_threads 2 -out psitblastn.out -outfmt 6 - comp_based_stats 1 Discuss: what are we trying to demonstrate?

Submit for part 2: Number of homologs found using each approach Short (1-2 sentences) interpretation of these results Terminal commands used during each step You should be recording these anyway, just like any experiment you document in a lab book Scripts, if you use any