Sequence Search and Analysis SPE 1653 (703)
Biosequence Patent Search Mission Impossible - ? Mission Difficult - ?
Sample Searchable Public Databases National Center for Biotechnology Information (NCBI) Entrez – European Bioinformatics Institute (EBI) – DNA DataBank of Japan (DDBJ) – SwissProt, PIR, etc do not cover patents
NCBI Entrez NCBI Genbank –In collaboration with EMBL and DDBJ Databases from other producers –SwissProt, TrEMBL, PDB, PIR, etc Bibliographic databases –E.g., PubMed (MEDLINE) NCBI BLAST ® sequence searching
EMBL-EBI on the Web EMBL databases –EMBL Nucleotide Database (i.e. GenBank) –Translated EMBL (TrEMBL) Databases from other producers –SwissProt, PDB, etc Many sequence search options: FASTA, NCBI-BLAST, WU-Blast, Smith-Waterman
DDBJ via the Web DDBJ databases –DNA DataBank of Japan (i.e. GenBank) –Protein Mutant Database (PMD) Databases from other producers –Protein Databank (PDB) Several sequence search options: FASTA, BLAST, Smith-Waterman
USPTO Nucleic Acid Databases –GenEmbl (GenBank) –N-Genseq –ESTs Protein Databases –Protein Databank (PDB) –SwissProt –A-Genseq
Searched Sequence HIV protease PQITLWQAPLVTIKIGGQLKEALLDT GADDTVLEEMNLPGRWKPKMIGGIG GFIKVAQYDQILIEICGHKAIGTVLVG PTPVNIIGANLLTQIGCT Default parameters selected
Searched Sequence – Results - A Database: Protein sequences derived from the Patent division of GenBank 78 Hits |gb|AAN | Sequence 17 from patent US Length = 1003 Score = 191 bits (486), Expect = 4e-50 Identities = 93/96 (96%), Positives = 93/96 (96%) |gb|AAN | Query1 : PQITLWQAPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVAQYD 60 PQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYD PQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYD Sbjct: 57 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVGQYD 116 Query1 : QILIEICGHKAIGTVLVGPTPVNIIGANLLTQIGCT 96 QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCT QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCT Sbjct: 117 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCT 152
Searched Sequence – Results - B Database: Protein Data Base (PDB) 75 Hits gi|230577|pdb|2HVP| HIV-1 Protease Length = 99 Score = 172 bits (437), Expect = 1e-44 Identities = 93/96 (96%), Positives = 93/96 (96%) gi|230577|pdb|2HVP| gi|230577|pdb|2HVP| Query1 : PQITLWQAPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVAQYD 60 PQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYD PQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYD Sbjct: 57 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60 Query1 : QILIEICGHKAIGTVLVGPTPVNIIGANLLTQIGCT 96 QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCT QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCT Sbjct: 117 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCT 96
Searched Sequence – Results - C Title: TITLE OF YOUR APPLICATION GOES HERE Perfect score: 521 Sequence: 1 PQITLWQRPLVTIKIGGQLK TPVNIIGRNLLTQIGCTLNF 99 Scoring table: BLOSUM62 Gapop 10.0, Gapext 0.5 Searched: seqs, residues Total number of hits satisfying chosen parameters: Minimum DB seq length: 0 Maximum DB seq length: Post-processing: Minimum Match 0% Maximum Match 100% Listing first 45 summaries Database : A_Geneseq_101002:*
Searched Sequence – Results - D RESULT 1 ID AAU77767 standard; Protein; 99 AA. AC AAU77767; DT 05-JUN-2002 (first entry) DE Human immunodeficiency virus type 1 (HIV-1) related protein #1. KW Human immunodeficiency virus type 1; HIV-1; protease. OS Unidentified. PN KR A. PD 15-OCT PF 28-JAN-1997; 97KR PR 28-JAN-1997; 97KR PA (GLDS ) LG CHEM LTD. PI Kwon YD, Lee TG; DR WPI; /51. PT Mutated human immunodeficiency virus type 1 (HIV-1) protease PT and process for preparing the same - PS Example 3; Page 10; 18pp; Korean. CC The invention relates to a mutated human immunodeficiency CC virus type 1 (HIV-1) protease and a process for preparing the CC mutants. This sequence represents a human immunodeficiency CC virus associated protein described in the invention. SQ Sequence 99 AA;
Searched Sequence – Results - E Pred. No. is the number of results predicted by chance to have a score greater than or equal to the score of the result being printed, and is derived by analysis of the total score distribution. SUMMARIES % Result Query No. Score Match Length DB ID Description AAU77767 Human immunodefici AAR05744 HIV-1 protease gen SQ Sequence 99 AA; RESULT 1 Query Match 100.0%; Score 521; DB 20; Length 99; Query Match 100.0%; Score 521; DB 20; Length 99; Best Local Similarity 100.0%; Pred. No. 2.6e-58; Best Local Similarity 100.0%; Pred. No. 2.6e-58; Matches 99; Conservative 0; Mismatches 0; Indels 0; Gaps 0; Matches 99; Conservative 0; Mismatches 0; Indels 0; Gaps 0; Qy 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Db 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60 Qy 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99 ||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||| Db 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99
Searched Sequence – Results - F Pred. No. is the number of results predicted by chance to have a score greater than or equal to the score of the result being printed, and is derived by analysis of the total score distribution. SUMMARIES % Result Query No. Score Match Length DB ID Description AAU77767 Human immunodefici AAR05744 HIV-1 protease gen SQ Sequence 177 AA; RESULT 15 Query Match 99.0%; Score 516; DB 11; Length 177; Query Match 99.0%; Score 516; DB 11; Length 177; Best Local Similarity 99.0%; Pred. No. 2.3e-57; Best Local Similarity 99.0%; Pred. No. 2.3e-57; Matches 98; Conservative 1; Mismatches 0; Indels 0; Gaps 0; Matches 98; Conservative 1; Mismatches 0; Indels 0; Gaps 0; Qy 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60 ||||||||||||||||||||||||||||||||||||:||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||:||||||||||||||||||||||| Db 56 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYD 115 Qy 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99 ||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||| Db 116 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 154
Sample Claims A polypeptide having HIV protease activity An isolated polypeptide having HIV protease activity An isolated polypeptide comprising SEQ ID NO: 1 An isolated polypeptide consisting essentially of SEQ ID NO: 1 An isolated polypeptide consisting of SEQ ID NO: 1 A peptide fragment having HIV protease activity A peptide fragment of SEQ ID NO: 1 with HIV protease activity A epitope of ten amino acids in length of SEQ ID NO: 1 capable of binding to an antibody to SEQ ID NO:1 An isolated polypeptide or fragment thereof of SEQ ID NO: 1 wherein one or more of amino acid residues have been substituted, deleted, or inserted and which polypeptide retains HIV protease enzymatic activity
Acknowledgements STIC / Toby Port and David Schreiber TC 1600 / James Martinell