Assignment feedback Everyone is doing very well!

Slides:



Advertisements
Similar presentations
Lecture 6 More advanced Perl…. Substitute Like s/// function in vi: #cut with EcoRI and chew back $linker = “GGCCAATTGGAAT”; $linker =~ s/CAATTG/CG/g;
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Linux Platform  Download the source tar ball from the BLAST source code link  ncbi-blast src.tar.gz  Compilation  cd /BLASTdirectory/c++ ./configure.
Run BLAST in command line mode Yanbin Yin Fall
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Practice retrieving data and running stand alone BLAST. Step 1. Identify genes in the ABA biosynthesis pathway from the Arabidopsis Cyc database
Bioinformatics and Phylogenetic Analysis
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
Psi-Blast: Detecting structural homologs Psi-Blast was designed to detect homology for highly divergent amino acid sequences Psi = position-specific iterated.
MCB 372 PSI BLAST, scalars J. Peter Gogarten Office: BPB 404 phone: ,
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Lecture 8: Basic concepts of subroutines. Functions In perl functions take the following format: – sub subname – { my $var1 = $_[0]; statements Return.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
An Introduction to Bioinformatics
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
MCB 5472 Psi BLAST, Perl: Arrays, Loops, Hashes J. Peter Gogarten Office: BPB 404 phone: ,
MCB 5472 Assignment #5: RBH Orthologs and PSI-BLAST February 19, 2014.
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Input, Output, and Processing
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Adding GO GO Workshop 3-6 August GOanna results and GOanna2ga 2. gene association files 3. getting GO for your dataset 4. adding more GO (introduction)
Identifying the ortholog of TNF (Tumor necrosis factor) in mosquito genomes Pet Projects:
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Introduction to C Programming Chapter 2 : Data Input, Processing and Output.
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
Computational Skills Course week 3 Mike Gilchrist NIMR May-July 2011.
2# BLAST & Regular Expression Searches Functionality Susie Stephens Life Sciences Product Manager Oracle Corporation.
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
5 1 Data Files CGI/Perl Programming By Diane Zak.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Using Local Tools: BLAST
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Step 3: Tools Database Searching
Copyright OpenHelix. No use or reproduction without express written consent1.
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Galaxy based BLAST submission to distributed high throughput computing resources Rob Quick and Soichi Hayashi Open Science Grid Operations Indiana University.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
What is BLAST? Basic BLAST search What is BLAST?
Using Local Tools: BLAST
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
Engineering Innovation Center
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
BLAST.
Sequence alignment, Part 2
Comparative Genomics.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Using Local Tools: BLAST
Using Local Tools: BLAST
Basic Local Alignment Search Tool
Welcome - webinar instructions
Presentation transcript:

MCB 5472 Assignment #4: Introduction to command line BLAST February 12, 2014

Assignment feedback Everyone is doing very well! Most people lose marks because they have not read the question close enough (e.g., not handing in pseudocode for Assign. #2) Check that your output files match your input and that they contain what you think they should

Code hints: Filehandles should be in block capitols open (INFILE, $ARGV[0]); # good open (infile, $ARGV[0]); # less good Will work otherwise but not under “use warnings” and “use strict” pragmas, i.e., sloppy code

Code hints: Align you code blocks properly Purpose: so you can intuitively see your code logic foreach $word (@array){ if ($word =~ /fox/){ print “found the fox”; }

Code hints: When using pattern matching, you should always consider possible exceptions in your input file $line =~ /^>/; # match fasta header $line =~ />/; # matches any line with a “>” character $line =~ tr/ACGT/TGCA/; # compliments high quality sequence $line =~ tr/ACGTacgt/TGCAtgca/; # compliments high and low quality sequence $line =~ /^[ACGT]/; # line starts with a nucleotide $line !~ /^>/; # any line not a fasta header, accommodates degenerate bases: N, V, B, H, D, K, S, W, M, Y, R

Code hints: “\s” means “any white space character” Includes: “\t” # tab “\r” # return “\n” # new line

Code hints: Filehandles should be in block capitols open (INFILE, $ARGV[0]); # good open (infile, $ARGV[0]); # less good Will work otherwise but not under “use warnings” and “use strict” pragmas, i.e., sloppy code Keep your tabs aligned Matching /^>/ vs />/ tr/ATCG/TAGC/ vs tr/ATCGatcg/TAGCtagc/ “\s” matches: “ “, “\t”, “\r”, “\n”

Command line BLAST We will be using BLAST+ Need to run on the Biotechnology Center server Preinstalled on Biolinux so can be run locally Two parts to every BLAST Format the BLAST database Perform the BLAST itself

Formatting a BLAST database [jlklassen@bbcsrv3 ~]$ makeblastdb -in [name of input file] -dbtype [either ‘nucl’ or ‘prot’] e.g., [jlklassen@bbcsrv3 ~]$ makeblastdb -in all.fna - dbtype nucl Produces: nucleotide: [name].nhr, [name].nin, [name].nsq protein: [name].phr, [name].pin, [name].psq where [name] is whatever was entered for the makeblastdb -in flag For help: [jlklassen@bbcsrv3 ~]$ makeblastdb -help

Running BLAST [jlklassen@bbcsrv3 ~]$ blastn – query [query file name] –db [database name] e.g., [jlklassen@bbcsrv3 ~]$ blastn – query NC_018651.fna –db all.fna For other BLAST flavors: replace blastn with blastp, blastx, tblastn or tblastx For help: [jlklassen@bbcsrv3 ~]$ blastn –help Multiple fasta file can be used as query multiple BLAST outputs in the same output file

Other helpful BLAST options –evalue [maximum evalue threshold] –out [output file name] –outfmt [0 for normal alignment format; 7 for easy to parse table format]

Tabular output format Separated by tabs (“\t”) @blastline = split “\t”, $line;

This week: Question #1 Last week’s complete genomes each had plasmids Are the plasmids from each organism with a complete genome homologous? Are the plasmids present in any of the draft genome sequences?

Practically: Use BLASTn to compare plasmids with each other Use BLASTn to find homologous sequence to each plasmid type in the draft genomes Use your judgment to infer homologs – this is ultimately subjective and needs to be defended! YOU DO NOT NEED TO WRITE PERL SCRIPTS FOR THIS (unless you want to)

This week: Question #2 Find paralogous genes and proteins in the complete Escherichia coli O104:H4 str. 2009EL-2050 genome and its plasmids Compare number of gene and protein paralogs Tabulate paralog age estimated from their percent BLAST similarly

Practically: Download the genes from NCBI BLAST all genes & proteins against each other using blastn and blastp (respectively) Round percent identity to the nearest 10% and tabulate # of paralogs % ID

Rounding in perl is not trivial int: truncates decimals to integers print int(1.4); # returns 1 print int(1.6); # returns 1 So to round: Divide to convert to a decimal Add 0.5 Apply int Multiply to revert divide $number1 = 19; print int($number1/10+0.5)*10; # returns 20 $number2 = 11; print int($number2/10+0.5)*10; # returns 10

Discuss: how will we tabulate rounded %IDs %tabulated_IDs = ( 100 => “”, 90 => “”, 80 => “”, 70 => “”, 60 => “”, 50 => “”, 40 => “”, 30 => “”, 20 => “”, 10 => “”, 0 => “”, ); # set up output hash $rounded_ID = int($blast_table[2]/10+0.5)*10; # perform rounding $tabulated_IDs{$rounded_ID}++; # tabulate in output hash or: $tabulated_IDs{$rounded_ID} = $tabulated_IDs{$rounded_ID} + 1; # same thing as above

To submit for next week Your conclusions from your results and your justification of them (esp. question #1) Your scripts and/or representative terminal commands detailed enough that I can reproduce your results You don’t have to submit input files or pseudocode