Condor: BLAST Monday, July 19 th, 3:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Intermediate Condor: DAGMan Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Jeff Shen, Morgan Kearse, Jeff Shi, Yang Ding, & Owen Astrachan Genome Revolution Focus 2007, Duke University, Durham, North Carolina Introduction.
Sequence Similarity Searching Class 4 March 2010.
Integration of Bioinformatics into Inquiry Based Learning by Kathleen Gabric.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Similar Sequence Similar Function Charles Yan Spring 2006.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Recap Don’t forget to – pick a paper and – me See the schedule to see what’s taken –
Welcome to Introduction to Bioinformatics Computing aka BIC1.
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Automatic methods for functional annotation of sequences Petri Törönen.
C OMPUTATIONAL BIOLOGY. O UTLINE Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity of the Algorithms.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Run restriction digestion: TA's will take the pictures for you.
An Introduction to High-Throughput Computing Monday morning, 9:15am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Welcome to Introduction to Bioinformatics Computing aka BIC1.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Lab 3 – BLAST – Directed It’s a BLAST! (too easy?)
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
Construction of Substitution Matrices
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
BioInformatics Database of Primer Results In order to help predict the way proteins will act in an organism, biologists cross-examine sequences of amino.
Condor: BLAST Rob Quick Open Science Grid Indiana University.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Condor: BLAST Monday, 3:30pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Database Similarity Search. 2 Sequences that are similar probably have the same function Why do we care to align sequences?
Sequence Alignment.
Construction of Substitution matrices
Integration of Bioinformatics into Inquiry Based Learning by Kathleen Gabric.
How to benefit from the International Summer School on Grid Computing 2009 Alain Roy.
What is BLAST? Basic BLAST search What is BLAST?
While hiking, a student decided to collect and eat berries from the plants he came across on the AT trail. Unfortunately, he became very ill and had to.
CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
What is BLAST? Basic BLAST search What is BLAST?
Using BLAST to Identify Species from Proteins
Introduction to Bioinformatics Resources for DNA Barcoding
Basics of BLAST Basic BLAST Search - What is BLAST?
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Using BLAST to Identify Species from Proteins
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Comparative Genomics.
What do you with a whole genome sequence?
Basic Local Alignment Search Tool
Explore Evolution: Instrument for Analysis
Sequence Similarity Andrew Torda, wintersemester 2006 / 2007, Angewandte … What is the easiest information to find about a protein ? sequence history.
Applying principles of computer science in a biological context
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Basic Local Alignment Search Tool
Using BLAST to Identify Species from Proteins
Lab 3 – BLAST – Directed It’s a BLAST! (too easy?)
Condor: BLAST Tuesday, Dec 7th, 10:45am
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Condor: BLAST Monday, July 19 th, 3:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison

OSG Summer School 2010 Before we begin… Any questions on the lectures or exercises up to this point? 2

OSG Summer School 2010 I hope you’re not getting too tired 3

OSG Summer School 2010 BLAST Up to now, you’ve done toy examples  Simple, easy to use  Illustrate basics of what you need to know  But not a “real” application Let’s try out a real application: BLAST  More complex, not so easy to use  A real application 4

OSG Summer School 2010 First, some honesty I am a computer scientist I am not a biologist My knowledge of BLAST is shallow But it’s way cooler application than what we’ve done so far! 5

OSG Summer School 2010 BLAST Description From the BLAST web page: 6 The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

OSG Summer School 2010 Blast Description (My understanding) Biologists have sequences:  Nucleotides in DNA: ACGTTGCA…  Amino acids in proteins: GECVASR… They also have databases of lots of sequences  From lots of organisms, from tiny bacteria to humans BLAST helps them answer questions:  Which bacterial species have a protein that is related in lineage to another protein?  What other genes encode proteins that exhibit structures or motifs such as ones that have just been determined?  … BLAST is widely used and considered important. 7

OSG Summer School 2010 Is this just string comparison? It’s harder than just comparing two strings: Is “GCTA == GCTA”? BLAST can find “similar” sequences, based on metrics that biologists determine.  “Similar” means this is more computationally expensive than just string comparison BLAST is a very popular program to ask these questions 8

OSG Summer School 2010 BLAST exercise The final set of exercises have you run queries with BLAST. They are a bit arbitrary, because we know less about the underlying biology But it’s a real application with real data! Your challenge: run a bunch of BLAST queries and summarize the results. Do it all within a DAG. 9

OSG Summer School 2010 Time to try it out! 10

OSG Summer School 2010 Questions? Questions? Comments? Feel free to ask me questions later: Instructor Name, Upcoming sessions  Now – 5:00pm  Hands-on exercises  Finish up earlier exercises  Try out BLAST 11