NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

NCBI BLAST, CDD, Mini-courses Katia Guimarães 2007/2.
SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM.
BLAST Sequence alignment, E-value & Extreme value distribution.
Rationale for searching sequence databases
NCBI Minicourses BLAST Quick Start
NCBI Minicourses BLAST Quick Start
Lecture 3.11 BLAST. Lecture 3.12 BLAST B asic L ocal A lignment S earch T ool Developed in 1990 and 1997 (S. Altschul) A heuristic method for performing.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Database searching. Purposes of similarity search Function prediction by homology (in silico annotation) Function prediction by homology (in silico annotation)
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Similar Sequence Similar Function Charles Yan Spring 2006.
BLAST.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Introduction to BLAST David Fristrom Bibliographer/Librarian
Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.
Sequence alignment, E-value & Extreme value distribution
Homology bird wing bat wing human arm by Bob Friedman.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive
Sequence Alignment Lakshmanan Iyer, Ph. D.. The Building Blocks… ATGC VLMFNQEDHKRCSTPYW.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
An Introduction to Bioinformatics
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
BLAST : Basic local alignment search tool B L A S T !
NCBI FieldGuide NCBI Molecular Biology Resources March 2007 Peter Cooper Using NCBI BLAST.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
BLAST benchmarks George Coulouris NCBI/NLM/NIH June 2005.
NCBI FieldGuide NCBI Molecular Biology Resources Part 2 November 2008 Peter Cooper.
Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
School B&I TCD Bioinformatics Database homology searching May 2010.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
What is BLAST? BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
NCBI FieldGuide NCBI Molecular Biology Resources January 2008 Peter Cooper Using NCBI BLAST.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
What is BLAST? Basic BLAST search What is BLAST?
NCBI FieldGuide NCBI Molecular Biology Resources A Field Guide part 2 (post intermission) September 30, 2004 ICGEB.
Sequence Similarity The bioinformatics for molecular biologists lecture series.
Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource Kutbuddin Doctor, PhD.
Lab 3.2: Database Similarity Searching “The BLAST Buffet” Stephanie Minnema University of Calgary.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
What is BLAST? Basic BLAST search What is BLAST?
A Practical Guide to NCBI BLAST
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Genome Center of Wisconsin, UW-Madison
BLAST.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1

Outline Tools – BLAST – Specialized BLAST – GEO ftp download Hands on exercise 2

References 3 NG/stat_scores.pdf ourse/notes/lecture6.pdf NCBI discovery workshops ftp://ftp.ncbi.nih.gov/pub/education/discover y_workshops/NLM/2012/Sept2012/ ftp://ftp.ncbi.nih.gov/pub/education/discover y_workshops/NLM/2012/Sept2012/

Evolution of pairwise alignment tools 4 Smith-Waterman algorithm FASTA BLAST Faster but less accurate Needleman-Wunsch algorithm

Basic Local Alignment Search Tool Widely used similarity search tool Heuristic approach based on Smith Waterman algorithm Finds best local alignments Provides statistical significance All combinations (DNA/Protein) query and database – DNA vs DNA – DNA translation vs Protein – Protein vs Protein – Protein vs DNA translation – DNA translation vs DNA translation www, standalone, and network client 5

6

7

8

9

Local Alignment Statistics High scores of local alignments between two random sequences follow the Extreme Value Distribution Score Alignments (applies to ungapped alignments) E = Kmne - S or E = mn2 -S’ K = scale for search space = scale for scoring system S’ = bitscore = ( S - lnK)/ln2 Expect Value E = number of database hits you expect to find by chance size of database your score expected number of random hits 10

Local Alignment Scoring: Protein K K +5 K E +1 Q F -3 Gap -(11 + 4(1))= - 14 Number of Chance Alignments = 4 X Scores from BLOSUM62, a position independent matrix 11

Local Alignment Scoring: Nucleotide Gap -(5 + 4(2))= -13 Number of Chance Alignments = 2 X Match=+2 Mismatch=-3 12

BLAST and BLAST-like programs Traditional BLAST (formerly blastall) nucleotide, protein, translations – blastn nucleotide query vs. nucleotide database – blastp protein query vs. protein database – blastx nucleotide query vs. protein database – tblastn protein query vs. translated nucleotide database – tblastx translated query vs. translated database Megablast nucleotide only – Contiguous megablast Nearly identical sequences – Discontiguous megablast Cross-species comparison 13

Position-specific BLAST Programs (protein only) Position Specific Iterative BLAST (PSI-BLAST) Automatically generates a position specific score matrix (PSSM) Position-Hit Initiated BLAST (PHI-BLAST) Focuses search around pattern (motif) Domain Enhanced Lookup Time Accelerated (DELTA) BLAST Uses domain PSSM in first round of search Reverse PSI-BLAST (RPS-BLAST) Searches a database of PSI-BLAST PSSMs Conserved Domain Database Search 14

15

Non-redundant protein nr (non-redundant protein sequences) – GenBank CDS translations – NP_, XP_ refseq_protein – Outside Protein PIR, Swiss-Prot, PRF PDB (sequences from structures) pat protein patents env_nr metagenomes (environmental samples) nr (non-redundant protein sequences) – GenBank CDS translations – NP_, XP_ refseq_protein – Outside Protein PIR, Swiss-Prot, PRF PDB (sequences from structures) pat protein patents env_nr metagenomes (environmental samples) Services blastp blastx Services blastp blastx 16

Nucleotide Databases: Traditional Services blastn tblastn tblastx Services blastn tblastn tblastx 17

Nucleotide Databases: Traditional nr (nt) – Traditional GenBank – NM_ and XM_ RefSeqs refseq_rna NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes dbest – EST Division non-human, non- mouse ests htgs – HTG division gss – GSS division wgs – whole genome shotgun contigs tsa – transcriptome shotgun assembly 16S microbial – Selected 16S sequences (targeted loci) Databases are mostly non-overlapping 18

Specialized BLAST Pages 19

Hands on exercise 1 blastn and megablast 20

21

22 Search against human database

23 A lot of things you may explore

24 Protein db search has different programs

25 Go back to blastn page Change here to 1000 Upload a text file with human tp53 mRNA fasta sequence Download from course webpage Question: how many ESTs match tp53 genes?

26 It took ~1 minute to finish A lot of things you may explore!!!

27 Search against other refseq databases

28

Hands on exercise 2 Protein blast (blastp and tblastn) 29

30 If not select organisms …

31 You can still specify organisms …

32 Upload a text file with two arabidopsis protein fasta sequence Download from course webpage Type in populus to choose populus trichocarpa You may submit many sequences, but expect it takes time Question: what are the homologs in poplar tree?

33 It took ~1 minute (smaller database) Click here to choose to view which query protein

34 How to determine what is a good e-value cutoff to select homologs? =PL8FD4CC12DABD6B39&index=6

35 Type in charoph to choose charophytes Question: what are the EST homologs in charophytic algae?

36

37

Hands on exercise 3 PHI-BLAST Query protein + short motif/pattern & PSI-BLAST (iterated BLAST) Multi-round BLASTP 38

39 Example: plant glycosyltransferase family 8 (GT8) has signature motif We want to search Arabidopsis GAUT1 protein (gi #: ) and the HXXGXXKPW motif ProSite style pattern: H-x(2)-G-x(2)-K-P-W

40

41

42

Hands on exercise 4 RPS-BLAST Given protein sequences, find conserved functional domains 43

44

45

46

Next class: NCBI GEO and ftp resource (with a little bit intro to Linux skills) and practice 47