Sequence Search and Analysis SPE 1653 (703) 308-2923.

Slides:



Advertisements
Similar presentations
NCBI BLAST, CDD, Mini-courses Katia Guimarães 2007/2.
Advertisements

1 Homology Language Brian R. Stanton Quality Assurance Specialist Technology Center 1600 U.S. Patent and Trademark Office (703)
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
On line (DNA and amino acid) Sequence Information Lecture 7.
BLAST Sequence alignment, E-value & Extreme value distribution.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.
BLAST.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
1 Automated Searching of Polynucleotide Sequences Michael P. Woodward Supervisory Patent Examiner - Art Unit
JYC: CSM17 BioinformaticsCSM17 Week 6: DNA, RNA and Proteins Transcription (reading the DNA template) Translation (RNA -> protein) Protein Structure.
Sequence alignment, E-value & Extreme value distribution
An Introduction to Bioinformatics Molecular Biology Databases.
Introduction to NCBI & Ensembl tools including BLAST and database searching Incorporating Bioinformatics into the High School Biology Curriculum Fran Lewitter,
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
On line (DNA and amino acid) Sequence Information
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Bioinformatics.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
School B&I TCD Bioinformatics Database homology searching May 2010.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Part I: Identifying sequences with … Speaker : S. Gaj Date
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Function preserves sequences
Nucleotide Sequence Analysis 1 Part I [web page]web page Osvaldo Graña CNIO Bioinformatics Unit March 2013.
Introduction to NCBI & Ensembl tools including BLAST and database searching Incorporating Bioinformatics into the High School Biology Curriculum Fran Lewitter,
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Computer Storage of Sequences
EBI is an Outstation of the European Molecular Biology Laboratory. EBI patent related services Jennifer McDowall Senior Scientist, EMBL-EBI 3 rd Annual.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
1 Searching in Applications Containing Bio-Sequences Ram R. Shukla Supervisory Patent Examiner Art Unit
Construction of Substitution matrices
Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens.
Copyright OpenHelix. No use or reproduction without express written consent1.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
What is sequencing? Video: WlxM (Illumina video) WlxM.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Computer Applications and Bioinformatics
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
BLAST.
Large-Scale Genomic Surveys
Basic Local Alignment Search Tool
Automated Searching of Polynucleotide Sequences
Basic Local Alignment Search Tool (BLAST)
Introduction to Databases
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Sequence Search and Analysis SPE 1653 (703)

Biosequence Patent Search Mission Impossible - ? Mission Difficult - ?

Sample Searchable Public Databases National Center for Biotechnology Information (NCBI) Entrez – European Bioinformatics Institute (EBI) – DNA DataBank of Japan (DDBJ) – SwissProt, PIR, etc do not cover patents

NCBI Entrez NCBI Genbank –In collaboration with EMBL and DDBJ Databases from other producers –SwissProt, TrEMBL, PDB, PIR, etc Bibliographic databases –E.g., PubMed (MEDLINE) NCBI BLAST ® sequence searching

EMBL-EBI on the Web EMBL databases –EMBL Nucleotide Database (i.e. GenBank) –Translated EMBL (TrEMBL) Databases from other producers –SwissProt, PDB, etc Many sequence search options: FASTA, NCBI-BLAST, WU-Blast, Smith-Waterman

DDBJ via the Web DDBJ databases –DNA DataBank of Japan (i.e. GenBank) –Protein Mutant Database (PMD) Databases from other producers –Protein Databank (PDB) Several sequence search options: FASTA, BLAST, Smith-Waterman

USPTO Nucleic Acid Databases –GenEmbl (GenBank) –N-Genseq –ESTs Protein Databases –Protein Databank (PDB) –SwissProt –A-Genseq

Searched Sequence HIV protease PQITLWQAPLVTIKIGGQLKEALLDT GADDTVLEEMNLPGRWKPKMIGGIG GFIKVAQYDQILIEICGHKAIGTVLVG PTPVNIIGANLLTQIGCT Default parameters selected

Searched Sequence – Results - A Database: Protein sequences derived from the Patent division of GenBank 78 Hits |gb|AAN | Sequence 17 from patent US Length = 1003 Score = 191 bits (486), Expect = 4e-50 Identities = 93/96 (96%), Positives = 93/96 (96%) |gb|AAN | Query1 : PQITLWQAPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVAQYD 60 PQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYD PQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYD Sbjct: 57 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVGQYD 116 Query1 : QILIEICGHKAIGTVLVGPTPVNIIGANLLTQIGCT 96 QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCT QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCT Sbjct: 117 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCT 152

Searched Sequence – Results - B Database: Protein Data Base (PDB) 75 Hits gi|230577|pdb|2HVP| HIV-1 Protease Length = 99 Score = 172 bits (437), Expect = 1e-44 Identities = 93/96 (96%), Positives = 93/96 (96%) gi|230577|pdb|2HVP| gi|230577|pdb|2HVP| Query1 : PQITLWQAPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVAQYD 60 PQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYD PQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYD Sbjct: 57 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60 Query1 : QILIEICGHKAIGTVLVGPTPVNIIGANLLTQIGCT 96 QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCT QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCT Sbjct: 117 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCT 96

Searched Sequence – Results - C Title: TITLE OF YOUR APPLICATION GOES HERE Perfect score: 521 Sequence: 1 PQITLWQRPLVTIKIGGQLK TPVNIIGRNLLTQIGCTLNF 99 Scoring table: BLOSUM62 Gapop 10.0, Gapext 0.5 Searched: seqs, residues Total number of hits satisfying chosen parameters: Minimum DB seq length: 0 Maximum DB seq length: Post-processing: Minimum Match 0% Maximum Match 100% Listing first 45 summaries Database : A_Geneseq_101002:*

Searched Sequence – Results - D RESULT 1 ID AAU77767 standard; Protein; 99 AA. AC AAU77767; DT 05-JUN-2002 (first entry) DE Human immunodeficiency virus type 1 (HIV-1) related protein #1. KW Human immunodeficiency virus type 1; HIV-1; protease. OS Unidentified. PN KR A. PD 15-OCT PF 28-JAN-1997; 97KR PR 28-JAN-1997; 97KR PA (GLDS ) LG CHEM LTD. PI Kwon YD, Lee TG; DR WPI; /51. PT Mutated human immunodeficiency virus type 1 (HIV-1) protease PT and process for preparing the same - PS Example 3; Page 10; 18pp; Korean. CC The invention relates to a mutated human immunodeficiency CC virus type 1 (HIV-1) protease and a process for preparing the CC mutants. This sequence represents a human immunodeficiency CC virus associated protein described in the invention. SQ Sequence 99 AA;

Searched Sequence – Results - E Pred. No. is the number of results predicted by chance to have a score greater than or equal to the score of the result being printed, and is derived by analysis of the total score distribution. SUMMARIES % Result Query No. Score Match Length DB ID Description AAU77767 Human immunodefici AAR05744 HIV-1 protease gen SQ Sequence 99 AA; RESULT 1 Query Match 100.0%; Score 521; DB 20; Length 99; Query Match 100.0%; Score 521; DB 20; Length 99; Best Local Similarity 100.0%; Pred. No. 2.6e-58; Best Local Similarity 100.0%; Pred. No. 2.6e-58; Matches 99; Conservative 0; Mismatches 0; Indels 0; Gaps 0; Matches 99; Conservative 0; Mismatches 0; Indels 0; Gaps 0; Qy 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Db 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60 Qy 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99 ||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||| Db 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99

Searched Sequence – Results - F Pred. No. is the number of results predicted by chance to have a score greater than or equal to the score of the result being printed, and is derived by analysis of the total score distribution. SUMMARIES % Result Query No. Score Match Length DB ID Description AAU77767 Human immunodefici AAR05744 HIV-1 protease gen SQ Sequence 177 AA; RESULT 15 Query Match 99.0%; Score 516; DB 11; Length 177; Query Match 99.0%; Score 516; DB 11; Length 177; Best Local Similarity 99.0%; Pred. No. 2.3e-57; Best Local Similarity 99.0%; Pred. No. 2.3e-57; Matches 98; Conservative 1; Mismatches 0; Indels 0; Gaps 0; Matches 98; Conservative 1; Mismatches 0; Indels 0; Gaps 0; Qy 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60 ||||||||||||||||||||||||||||||||||||:||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||:||||||||||||||||||||||| Db 56 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYD 115 Qy 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99 ||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||| Db 116 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 154

Sample Claims A polypeptide having HIV protease activity An isolated polypeptide having HIV protease activity An isolated polypeptide comprising SEQ ID NO: 1 An isolated polypeptide consisting essentially of SEQ ID NO: 1 An isolated polypeptide consisting of SEQ ID NO: 1 A peptide fragment having HIV protease activity A peptide fragment of SEQ ID NO: 1 with HIV protease activity A epitope of ten amino acids in length of SEQ ID NO: 1 capable of binding to an antibody to SEQ ID NO:1 An isolated polypeptide or fragment thereof of SEQ ID NO: 1 wherein one or more of amino acid residues have been substituted, deleted, or inserted and which polypeptide retains HIV protease enzymatic activity

Acknowledgements STIC / Toby Port and David Schreiber TC 1600 / James Martinell