Sequence Comparison Bioinformatics 91-05. Why do people suggest that translated sequences be used to search for relatives in databanks? DNA vs Protein.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

On line (DNA and amino acid) Sequence Information Lecture 7.
BLAST Sequence alignment, E-value & Extreme value distribution.
Bioinformatics for biomedicine Sequence search: BLAST, FASTA Lecture 2, Per Kraulis
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Structural bioinformatics
Database Searching for Similar Sequences Search a sequence database for sequences that are similar to a query sequence Search a sequence database for sequences.
Sequence Similarity Searching Class 4 March 2010.
We continue where we stopped last week: FASTA – BLAST
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Database Searches Guoqing Lu Office: E115 Beadle Center Tel: (402) Website:
Sequence alignment, E-value & Extreme value distribution
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
An Introduction to Bioinformatics
BLAST What it does and what it means Steven Slater Adapted from pt.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Protein Sequence Alignment and Database Searching.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Comparing Sequences and Multiple Sequence Alignment Bioinformatics
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Computational Biology, Part 9 Efficient database searching methods Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Comparing Sequences AND Multiple Sequence Alignment Bioinformatics
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Comparing Sequences and Multiple Sequence Alignment
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Sequence Alignment.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
Copyright OpenHelix. No use or reproduction without express written consent1.
Searching & Management of Databases Bioinformatics
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
What is sequencing? Video: WlxM (Illumina video) WlxM.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
Sequence database searching – Homology searching Dynamic Programming (DP) too slow for repeated database searches. Therefore fast heuristic methods: FASTA.
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Sequence Based Analysis Tutorial
BLAST.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Presentation transcript:

Sequence Comparison Bioinformatics 91-05

Why do people suggest that translated sequences be used to search for relatives in databanks? DNA vs Protein Sequence DNA is composed of only four kinds of units -A, G, C and T- and even if gaps were not allowed, it would be anticipated that, on the average, 25% of the residues of any two aligned sequences would be identical. In fact, there would be a dispersion around the mean expectation, and a predictable fraction of random cases would be as much as 35% identical. Once we decide to allow gaps in the sequences, then the range of chance similarities between two unrelated sequences can exceed 50%, thereby obscuring any genuine relationships that may exist. link Nucleotide sequence alignment 137 AGACCAACCTGGCCAACATGGTGAAATCCCATCTCTAC.AAAAATACAAA 185 |||||| ||||||||||||||||||| |||||||||| |||||||||| 1 AGACCAGCCTGGCCAACATGGTGAAACTCCATCTCTACTGAAAATACAAA 50 match mismatchgap

Why do people suggest that translated sequences be used to search for relatives in databanks? Why Protein Sequence Protein sequences are composed of a 20 aa alphabet determined by 61 degenerate codons. When the DNA sequences are translated into 21 different types of codons (20 aa and a terminator), the information is sharpened up considerably. The 'wrong- frame' information is discarded, and third-base degeneracies are consolidated. All in all, the signal-to-noise ratio is greatly improved for the specific purpose of identifying protein relatives. It is accepted that convergence phenomena in aa sequences are very rare and thus aa similarity almost always means homology. Furthermore, aa sequences may still show a similarity derived from common folding patterns and function of the proteins, even while their coding DNA sequences might have strongly diverged due to other selective pressures existent at the genome level (e.g., G+C pressure, preferential usage of synonymous codons, etc.). Protein evolution is governed by the constraint of maintaining a characteristic fold which enables some function. Thus, it is possible to infer relationships between proteins that last shared a common ancestor billion years ago by conducting protein searches, doubling the lookback time obtained performing DNA database searches. link

BLAST vs FASTA FASTA - a sensitive search engine The early personal computers had insufficient memory and were too slow to carry out a database scan using a rigorous searching method (dynamic programming). Accordingly, Wilbur and Lipman [(1983) Proc. Nat. Acad. Sci. 80, ] developed a fast procedure for DNA scans that in concept searches for the most significant diagonals in a dot plot. FASTA only shows the top scoring region, it does not locate all high scoring alignments between two sequences. As a consequence, FASTA may not directly identify repeats or multiple domains that are shared between two proteins BLAST - a faster alternative BLAST (Basic Local Alignment Search Tool) is a heuristic method to find the highest scoring locally optimal alignments between a query sequence and a database. Previous versions of BLAST did not allow gapped alignments, but BLAST2 (from the HGMP-RC telnet and www menus) does. A gapped BLAST search allows gaps (deletions and insertions) to be introduced into the alignments that are returned. Allowing gaps means that similar regions are not broken into several segments. The scoring of these gapped alignments tends to reflect biological relationships more closely.

The BLAST family of programs allows all combinations of DNA or protein query sequences with searches against DNA or protein databases. (Most of the time use of these is transparent, behind an interface.) blastp: compares an amino acid query sequence against a protein sequence database. blastn: compares a nucleotide query sequence against a nucleotide sequence database. blastx: compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database. tblastn:compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands). tblastx:compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. The BLAST Family of Programs

The FASTA Family of Programs FastA : uses the method of Pearson and Lipman (Proc. Natl. Acad. Sci. USA 85; (1988)) to search for similarities between one sequence (the query) and any group of sequences of the same type (nucleic acid or protein) as the query sequence. TFastA : treats each of the six reading frames of a query nucleotide sequence as a separate sequence, resulting in three separate alignments for each strand. TFastX : compares the protein query sequence to only one translated protein per strand of the nucleotide sequence, resulting in one alignment per strand.

NCBI Blast vs GCG Blast WWW system Larger database Interlinked Data Slow Single search only Unix system Smaller database Data not interlinked Built your own database Fast Support multiple search Output file easier to parse

Reference Searching 1. LookUp - Identifies sequences in sequence database (name, accession number, author, et al..) 2. Names - Identifies sequences entries by name. 3. StringSearch - Identifies sequences by character patterns. Sequence Searching 1. BLAST - Finds sequences in a database that are similar to a query sequence (ver.2.0) 2. FastA - Search for similarity sequences of the same type 3. FastX - Search for similarity sequences between a nucleotide sequence and protein database, taking frameshifts into account. 4. FindPatterns - Identifies sequences with short sequence pattern 5. FrameSearch - Search protein sequences for similarity to nucleotide query sequences, or nucleotide sequences for similarity to protein query sequences. 6. Motifs - Search through proteins for the patterns defined in the PROSITE. 7. MotifSearch - Use a set of profiles search a database for new sequences. 8. NetBLAST - Search database maintained at NCBI 9. ProfileSegments - Make optimal alignments found by ProfileSearch. 10. ProfileSearch - Use a profile to search the database for new sequence. 11. Segments - Aligns and displays the segments found by WordSearch. 12. Ssearch - Does a rigorous Smith-Waterman search for similarity 13. TFastA - Search for similarity sequences between a protein sequence and nucleotide database 14. TFastX - Search for similarity sequences between a protein sequence and nucleotide database, taking frameshifts into account. 15. WordSearch - Identifies sequences in the database that share large numbers of common words SEARCHING in SeqWEB/GCG

(1)What is cdk2? -search UNIGENE, OMIM (2) How many cdk2 proteins already discovered in different organisms? -try ENTREZ protein, -start search protein for “cdk2”, then “cyclin dependent kinase 2” -search again with the same keywords but limit to “protein name”. (3) Display & Save the sequences in NCBI -DISPLAY the “cdk2” sequences (limit to protein name) in fasta format (34 sequences) -SAVE the first sequence in FASTA format as xp SAVE ALL THE SEQUENCES in FASTA with the file name cdk2-psq.fasta -SAVE ALL THE SEQUENCES IN GENBANK with the file name cdk2-psq.gp -Upload xp and cdk2-psq.fasta to GCG -Change to GCG format fromfasta xp and fromfasta cdk2-psq.fasta (ALL SEQUENCES IN THE FILE WILL BE REFORMATED) Exercise05-01

Build Your Own Database Blast xp pep gcgtoblast combines any set of GCG sequences into a database that you can search with BLAST. GCGTOBLAST of what input sequence(s) ? *.pep What should I call the database ? cdk2psq Change xp to gcg format blast -BAT -IN2=cdk2psq BLAST searches one or more nucleic acid or protein databases for sequences similar to one or more query sequences of any type. BLAST can produce gapped alignments for the matches it finds. Blast with what query sequence(s) ? xp pep

Online Tutorial: Download NOW Download NOW

Options:- Folders - Formats - Dialogs - Trace files - Alignments

Options:- Folders - Formats------Merge Sequences Split Sequences - Dialogs - Trace files - Alignments

ASSIGNMENT 02 Use the database searching techniques you learned today to retrieve the amino acid sequences of Human (Homo sapiens) Vacuolar ATP synthase Question: (1)How many human V-ATP synthase deposited in NCBI (2)Built a V-ATP synthase database in GCG  download this sequence  TELL ME WHICH SEQUENCE IN YOUR DATABASE MATCHES BEST the ANSWER as attached files to before 27MAR2003. **** 郵件主旨: ASS02 bioinfo – ( 學號 ) [ vatpase.txt ]