1 Searching in Applications Containing Bio-Sequences Ram R. Shukla Supervisory Patent Examiner Art Unit 1634 571 272 0735

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Homology Language Brian R. Stanton Quality Assurance Specialist Technology Center 1600 U.S. Patent and Trademark Office (703)
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
BLAST Sequence alignment, E-value & Extreme value distribution.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
1 Single Nucleotide Polymorphisms (SNP) Gary Jones SPE, Technology Center 1600 (703)
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
1 Automated Searching of Polynucleotide Sequences Michael P. Woodward Supervisory Patent Examiner - Art Unit
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Sequence alignment, E-value & Extreme value distribution
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Local alignment
1 Unity of Invention: Biotech Examples TC1600 Special Program Examiner Julie Burke (571)
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Protein Sequence Alignment and Database Searching.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
U.S. Patent and Trademark Office Technology Center 1600 Michael P. Woodward Unity of Invention: Biotech Examples.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Sequence Search and Analysis SPE 1653 (703)
Patentability Considerations in the 3-D Structure Arts Patentability Considerations in the 3-D Structure Arts Michael P. Woodward Supervisory Patent Examiner.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Trilateral Project WM4 Report on comparative study on Examination Practice Relating to Single Nucleotide Polymorphisms (SNPs) and Haplotypes. Linda S.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Database Similarity Search. 2 Sequences that are similar probably have the same function Why do we care to align sequences?
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Step 3: Tools Database Searching
Dynamic programming with more complex models When gaps do occur, they are often longer than one residue.(biology) We can still use all the dynamic programming.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
1 Utility Guidelines, Homology Claims and Anti-Sense Molecule Claims Drew Hissong, Ph.D. dhissong*sughrue.com Sughrue Mion, PLLC
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Sequence comparison: Local alignment
Automated Searching of Polynucleotide Sequences
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

1 Searching in Applications Containing Bio-Sequences Ram R. Shukla Supervisory Patent Examiner Art Unit

2 Types of Molecules Claimed: Nucleic Acids Sequence Structure A polynucleotide sequence that encodes a polypeptide (cDNA/ Genomic) Oligomers Probes/Primers Fragments

3 Types of Molecules Claimed: Nucleic Acids Function Antisense/Complements RNAi/Ribozymes/Triplex Aptamers Amino Acid Binding Domains Immunostimulatory CpG Sequences Transgene Regulatory Sequences

4 Types of Molecules Claimed: Nucleic Acids Other Accession Number Single Polynucleotide Polymorphism rs (Reference SNP) number Biological Deposit

5 Types of Molecules Claimed: Amino Acid Sequences Structure An amino acid sequence Oligopeptide Specifically Identified Fragments Accession Number A polypeptide encoded by a polynucleotide sequence Function Nucleic Acid Binding Domains Antibody Dominant Negative Mutant

6 Types of Sequences Claimed: Sequence Disclosure and Compliance IF an application discloses a nucleotide or an amino acid sequence and (Sequences may be anywhere in the application including specification, drawings, abstract) the nucleic acid sequence is a specific unbranched sequence of 10 or more nucleotides; and/or the amino acid sequence is a specific unbranched sequence of 4 or more amino acids, THEN the application must be analyzed for compliance with the sequence rules (37 CFR ). See MPEP 2422 for Sequence Compliance Requirements

7 Types of Sequences Claimed: Accession Number If a sequence is claimed by an Accession No, the specific sequence has to be disclosed in the specification. If the specification does not disclose the sequence of the Accession No, the office may object to the specification.

8 Types of Nucleotide Searching: Accession No If the sequence is added to the Specification It must be determined if the sequence has been properly incorporated by reference and adds no new matter. The sequence must be uniquely identified. For discussion of incorporation by reference of a sequence, see the BCP presentations by Jean Witz at the Sept 2008 BCP meeting ( and Julie Burke at the June 2008 BCP meeting (

9 Search Strategy The sequence recited in the Claim is used as the search query. The interpretation of a claim requires a sequence to be present and used as a query for a search of sequence databases.

10 What is searched? Claim interpretation Complementary/Antisense sequences Reverse Transcription/Translation RNA reverse transcribed to DNA Protein back translated to DNA cDNA, genomic DNA Oligomer/primers/probes

11 Smith-Waterman Finds an optimal local alignment between two protein (p2p) or two nucleic (n2n) sequences. Uses a two-dimensional matrix to look for the highest scoring alignment Similarity score is calculated based on: Comparison matrix: provides probability scores for all substitutions between pairs of residues Gap penalties: cost of inserting or deleting residues in the alignment There are two gap penalty models: Non-affine: a single gap penalty value is applied to any unmatched residue Affine: a penalty for a gap is calculated as gapop+gapext*l, where gapop is the penalty for opening a gap, gapext is the penalty for extending the gap, and l is the length of the gap. Smith and Waterman, Advances in Applied Mathematics, 2: (1981)

12 Search Request Considerations: How much substitution does the claim allow ?

Search Request Considerations: Does the claim allow for gaps?

14 Smith-Waterman (cont’d) In each cell, the algorithm stores the highest score of all possible paths leading to the cell Each path can be described as a traversal of an automaton consisting of three states: Match: two residues are matched Insert: the query matches a gap to a database residue Delete: the query matches a residue with a gap in the database sequence The path leading to the highest score can be of any length, and, by definition of a local alignment, doesn’t have to start at the beginning or end at the end of both sequences.

15 Translated Smith-Waterman First translate the nucleic sequence into three or six reading frames, then align each frame independently to the protein sequence; in the results, indicate the frame that produced each high-scoring hit.

16 Types of Nucleotide Sequence Searching: Standard Search Query using the full length of the SEQ ID NO (up to 10 Kb in size) useful for finding full length hits. hit size could be limited to a size range by requesting a “length- limited” search (range provided by the examiner). the search parameters are the default parameters-Gap Opening Penalty 10 & Gap Extension Penalty of 1.

17 Types of Nucleotide Sequence Searching: Standard Search Interpretation of the search results is needed to find fragments and genomic sequences. Fragments are buried in the hit list. The presence of introns in the database sequence results in low scores.

18 Types of Nucleotide Sequence Searching: Standard Search For a large sequence, 10 kb or greater, multiple large subsections of the sequence are used as a query to search the databases. For a genomic sequence, If exons and their boundaries are known, several exons are searched. If exons are not known, multiple large subsections of the sequence are used as a query to search the database.

19 Impact of Sequence Identity and Length Adjustment of search parameters (e.g. Smith-Waterman Gap values) influences % Query Match value. % Query Match value approximates overall identity Mismatches - Varying Degrees of Percent Identity Gaps - Insertion or Deletions - Gap Extensions Wild Cards Complements/Matches

20 Types of Nucleotide Sequence Searching Standard Oligomer: Prioritizes the longest uninterrupted hits. Accomplished by significantly increasing the gap penalty. The hit size could be limited to a size range by requesting a “length-limited” search (range provided by the examiner). Not optimal for finding small sequences that are 100% identical or complementary.

21 Score Over Length Searching Optimal for finding small sequences that are 100% identical or complementary. Calculated by dividing the hit score by the hit length hit “score” represents the number of perfect matches between query and hit. The number of perfect matches relative to a hit’s length is calculated (Score/Length). hits then sorted by Score/Length value. Hits with a Score/Length value closer to “1” are prioritized.

22 Publicly Available Databases: Nucleic Acids GenEMBL N_Genseq Issued_Patents_NA EST Published_Applications_NA

23 Publicly Available Databases: Proteins A-Geneseq UniProt PIR Published_Applications_AA Issued_AA

24 USPTO Databases Searched at the Time of Allowability Published_Applications_NA Issued_NA Pending_Applications_NA Published_Applications_AA Issued_AA Pending_Applications_AA

25 Search Results : Query length/Perfect Score; 2: Gap Parameters; 3: Minimum and Maximum Length Limitations

26 Search Results 1 2 1:Score; 2:% Query Match

27 Standard Search GenBank Alignments Against cDNA

28 Oligomer Search GenBank Hit Table Against cDNA Gap Penalties

29 Oligomer Search GenBank Hit Table Against cDNA

30 Oligomer Search GenBank Alignments Against cDNA

31 Length-Limited (8 to 20) Oligomer Search GenBank Hit Table cDNA Gap Penalties Minimum & Maximum Length Limitations

32 Length-Limited (8 to 20) Oligomer Search GenBank Alignments cDNA

33 Claim 1 Claim: An isolated polynucleotide comprising SEQ ID NO:1. Claim Interpretation : Comprising: must have all of SEQ ID NO:1, may include any flanking sequences, as in the claim above. Consisting of: limited to only SEQ ID NO:1, with No flanking sequences. Search Strategy: A standard search looking for full length hits is performed.

34 Claim 2 Claim: An isolated polypeptide comprising SEQ ID NO: 2. Claim Interpretation: Comprising: must have all of SEQ ID NO:2, may include any flanking sequences, as in the claim above. Consisting of: limited to only SEQ ID NO:2, with No flanking sequences. Search Strategy: A standard search looking for full length hits is performed in all the amino acid databases.

35 Claim 3 Claim: An isolated polynucleotide comprising a nucleotide sequence of SEQ ID NO:1 Claim Interpretation: This claim embraces any fragment of SEQ ID NO:1 due to the language “--a nucleotide sequence of--.” This could be obviated by amending to read “--the nucleotide sequence of--.” Search Strategy: A standard nucleotide sequence search as well as a standard oligomer search is performed using SEQ ID NO:1 as a query.

36 Claim 4 Claim: An isolated polynucleotide comprising a polynucleotide with at least 90% identity over its entire length to SEQ ID NO:1. Claim Interpretation: This claim encompasses any sequence that has 90% or higher sequence identity over its entire length to SEQ ID NO:1. Search Strategy A standard search looking for full length hits is performed. Hits having at least 90% identity will appear in the results.

37 Claim 5 Claim: An isolated polynucleotide comprising a polynucleotide encoding the amino acid sequence of SEQ ID NO:2. Claim Interpretation: The claim encompasses any polynucleotide that encodes the polypeptide of SEQ ID NO:2. Search Strategy: SEQ ID NO:2 is “back translated” into a nucleic acid sequence, which is used as a query to search the nucleic acid databases.

38 Claim 6 Claim: An isolated polynucleotide comprising a polynucleotide which hybridizes under stringent conditions to SEQ ID NO: 1. Claim Interpretation: Claim is interpreted as embracing any sequence with less than 100% complementarity or identity to SEQ ID NO:1. Search Strategy A standard oligomer search as well as a standard search is perfomed.

39 Claim 7 Claim: An isolated polynucleotide comprising at least 15 contiguous nucleotides of SEQ ID NO:1. Claim Interpretation: The claim embraces any fragment of 15 nucleotides or greater of SEQ ID NO:1. Search Strategy: A standard oligomer search is performed with a length of 15 nucleotides set as the lower limit for a hit.

40 Claim 8 Claim: An isolated polypepide comprising at least 15 contiguous amino acids of SEQ ID No:2. Claim Interpretation: The claim embraces any fragment of 15 amino acids or greater of SEQ ID NO:2. Search Strategy: A standard oligomer search is performed with a length of 15 amino acids set as the lower limit for a hit.

41 Claim 9 Claim: An oligonucleotide consisting of 8 to 20 nucleotides which specifically hybridizes to the nucleic acid sequence of SEQ ID NO:1. Claim Interpretation: The specification teaches that oligonucleotides which specifically hybridize need not have 100% sequence correspondence. Search Strategy A Score/Length search is performed with 8 and 20 as lower and upper limits respectively.

42 Claim 10: Searching a SNP Claim: A nucleic acid comprising SEQ ID NO:1 where the nucleotide at position 101 is a T. Claim Interpretation: The Claim encompasses any sequence comprising SEQ ID NO:1, with a T at position 101. Search Strategy: A standard nucleotide sequence search is performed for SEQ ID NO:1. The examiner manually searches for any changes at position 101.

43 Thanks! James (Doug) Schultz Supervisory Patent Examiner Art Unit Barb O’Bryen Scientific and Technical Information Center (STIC)

44 Questions? Ram R. Shukla Supervisory Patent Examiner Art Unit