Automated Searching of Polynucleotide Sequences

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
1 Homology Language Brian R. Stanton Quality Assurance Specialist Technology Center 1600 U.S. Patent and Trademark Office (703)
BLAST Sequence alignment, E-value & Extreme value distribution.
1 Single Nucleotide Polymorphisms (SNP) Gary Jones SPE, Technology Center 1600 (703)
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Sequence Similarity Searching Class 4 March 2010.
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Sequence Analysis Tools
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Sequence similarity.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Genomics and bioinformatics summary 1. Gene finding: computer searches, cDNAs, ESTs, 2.Microarrays 3.Use BLAST to find homologous sequences 4.Multiple.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
1 Automated Searching of Polynucleotide Sequences Michael P. Woodward Supervisory Patent Examiner - Art Unit
Sequence alignment, E-value & Extreme value distribution
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
1 Unity of Invention: Biotech Examples TC1600 Special Program Examiner Julie Burke (571)
BLAST What it does and what it means Steven Slater Adapted from pt.
Protein Sequence Alignment and Database Searching.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
U.S. Patent and Trademark Office Technology Center 1600 Michael P. Woodward Unity of Invention: Biotech Examples.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Sequence Search and Analysis SPE 1653 (703)
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Patentability Considerations in the 3-D Structure Arts Patentability Considerations in the 3-D Structure Arts Michael P. Woodward Supervisory Patent Examiner.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BioInformatics Database of Primer Results In order to help predict the way proteins will act in an organism, biologists cross-examine sequences of amino.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
DNA and the genetic code DNA is found in the chromosomes in the nucleus in eukaryotic cells or in the cytoplasm in prokaryotic cells. DNA is found in the.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Database Similarity Search. 2 Sequences that are similar probably have the same function Why do we care to align sequences?
1 Searching in Applications Containing Bio-Sequences Ram R. Shukla Supervisory Patent Examiner Art Unit
Chapter 3 The Interrupted Gene.
Doug Raiford Phage class: introduction to sequence databases.
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
Dynamic programming with more complex models When gaps do occur, they are often longer than one residue.(biology) We can still use all the dynamic programming.
What is BLAST? Basic BLAST search What is BLAST?
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
1 Utility Guidelines, Homology Claims and Anti-Sense Molecule Claims Drew Hissong, Ph.D. dhissong*sughrue.com Sughrue Mion, PLLC
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
What is BLAST? Basic BLAST search What is BLAST?
Pairwise Sequence Alignment and Database Searching
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Welcome to Introduction to Bioinformatics
Sequence comparison: Local alignment
Do Now 2/12.
Selection of Oligonucleotide Probes for Protein Coding Sequences
Lecture 4: Probe & primer design
Chapter 14 Bioinformatics—the study of a genome
Do Now 2/12.
BLAST.
Pairwise Sequence Alignment
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Pairwise Alignment Global & local alignment
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

Automated Searching of Polynucleotide Sequences Michael P. Woodward Supervisory Patent Examiner - Art Unit 1631 571 272 0722 michael.woodward@uspto.gov John L. LeGuyader Supervisory Patent Examiner - Art Unit 1635 571 272 0760 john.leguyader@uspto.gov

Standard Databases GenEMBL .rge N_Genseq .rng Issued_Patents_NA .rni EST .rst Published_Applications_NA .rnpb

Databases at Time of Allowability Pending_Patents_NA_Main .rnpm Pending_Patents_NA_New .rnpn

Types of Nucleotide Sequence Searching Standard (cDNA) Oligomer Length Limited Oligomer Score over Length Purposefully leaving out example where no specific search is requested

Types of Nucleotide Sequence Searching Standard (cDNA) useful for finding full length hits the query sequence is typically the full length of the SEQ ID NO: the search parameters are the default parameters-Gap Opening Penalty & Gap Extension Penalty of 10 standard suite of NA databases are searched normally 45 results and the top fifteen alignments are provided, however, additional results and alignments can be provided. Purposefully leaving out example where no specific search is requested

Standard (cDNA) search Fragments and genomic sequences are often difficult to find Fragments are buried in the hit list The presence of introns in the database sequence results in low scores.

Types of Nucleotide Sequence Searching Standard Oligomer finds longest matching hits – mismatches not tolerated in region of hit match Length Limited Oligomer returns database hits within length range requested mismatches not tolerated in region of hit match Purposefully leaving out example where no specific search is requested

Standard Oligomer Searching Only provides the longest oligomer present in the sequence A thorough search of fragments requires multiple searches Can be an effective way of finding genomic sequences

Standard Oligomer Searching the search parameters are the default parameters-Gap Opening Penalty & Gap Extension Penalty of 60-mismatches not tolerated Consequently inefficient means of finding small sequences, and with <100% in correspondence

Claim 1 An isolated polynucleotide comprising SEQ. ID. No: 1. The claim is interpreted as reading on an isolated polynucleotide which contains within it the entirety of SEQ ID No: 1.

Searching Claim 1 A standard search looking for full length hits is performed.

Standard (cDNA) search result 0001 CGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGATGG 0060 2031 CGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGG---CAGATGG 2090

Claim 2 An isolated polynucleotide comprising at least 15 contiguous nucleotides of SEQ. ID. No: 1.

Searching Claim 2 An standard oligomer search is performed with an oligomer length of 15 nucleotides set as the lower limit for a hit.

Oligomer Search Results Standard Oligomer CAAATGCAGGCCCCCGGACCTCCCTGCTCCTGGCTTTCGCCCTGCTCTGCCTGCCCTGG Query CCCTGCTCCTGGCTTTCGCCCTGCTCTGCCTGCCCTGG 0060 Database CCCTGCTCCTGGCTTTCGCCCTGCTCTGCCTGCCCTGG 2500 Length Limited Oligomer CAAATGCAGGCCCCCGGACCTCCCTGCTCCTGGCTTTCGCCCTGCTCTGCCTGCCCTGG  Database CCCTGCTCCTGGCTTTCGCCCTGCTCTGCCTGCCCTGG 0039  

Claim 3 An isolated polynucleotide comprising a polynucleotide encoding a polypeptide of SEQ ID No: 2. (SEQ ID No: 2 is an Amino Acid (AA) sequence)

Searching Claim 3 Seq ID No: 2 is searched against the Polypeptide databases and it is “back translated” and searched against the polynucleotide databases.

Claim 4 An isolated polynucleotide comprising a polynucleotide with at least 90% identity to SEQ ID No: 1.

Searching Claim 4 A standard search looking for full length hits is performed. Hits having at least 90% identity will appear in the results.

Claim 5 An isolated polynucleotide comprising a polynucleotide which hybridizes under stringent conditions to SEQ ID No: 1.

Searching Claim 5 A standard oligomer search is performed as well as a standard search.

Searching Small Nucleotide Sequences John L. LeGuyader

Types of Small Nucleotide Sequences Claimed Fragments Complements/Antisense Primers/Probes Oligonucleotides/Oligomers Antisense/RNAi/Triplex/Ribozymes (inhibitory) Accessible Target/Region within Nucleic Acids Aptamers Nucleic Acid Binding Domains Immunostimulatory CpG Sequences Non-limiting examples, others possible

Small Nucleotide Sequences Claimed as Sense or Antisense? What is being claimed? Requesting the correct sequence search starts with interpreting what is being claimed Complementary Sequences DNA to DNA: C to G DNA to RNA: A to U Matching Sequences A to A U to U DNA, RNA, Chimeric cDNA, Message (mRNA), Genomic DNA

Impact of Sequence Identity and Length Size and Identity Matter Complements/Matches 100% correspondence Mismatches - Varying Degrees of Percent Identity Gaps - Insertion or Deletions - Gap Extensions Wild Cards % Query Match value approximates identity Adjustment of search parameters (e.g. Smith-Waterman Gap values) influences % Query Match value Hit Length, Mismatches and Gaps affect the score and % query match. Impacts how the search orders the hits and impacts what hits are actually provided to, and reviewed by, the examiner. Wild Cards need to be specifically dealt with by the search preparer. % Query Match value is not necessarily a reliable identity value.

Types of Nucleotide Sequence Searching Standard Search (cDNA) Oligomer finds database hits with longest regions of matching residues – mismatches not tolerated in region of hit match Length Limited Oligomer returns database hits within requested length range mismatches not tolerated in region of hit match Score Over Length – finds mismatched sequence database hits based on requested length and identity range Purposefully leaving out example where no specific search is requested

Why doesn’t a standard search of the cDNA provide an adequate search of fragments? Long length sequence hits with many matches and mismatches score higher and appear first on the hit list, compared to short sequences having high correspondence lots of regional local similarity in a long sequence scores higher than a 10-mer with 100% identity Consequence small sequences, of 100% identity or less, are buried tens of thousands of hits down the hit list most small sequence hits effectively lost especially for hits with <100% correspondence

Fragments and types of sequence searches Why doesn’t a standard search of the cDNA provide an adequate search of fragments? Fragments and types of sequence searches Standard Search (cDNA): fragment hits buried oligomer: fragment hits buried searching multiple fragments: millions of hits and alignments to consider Each fragment of a specified sequence and length requires a separate search

Standard Oligomer Searching Won’t provide thorough search of fragments since longer hits score higher on hit table Smaller size hits lost, effectively not seen Does not tolerate mismatches in region of matches Consequently inefficient means of finding small sequences, and with <100% in correspondence Better suited to finding long sequences

Length Limited Oligomer Searching Sequence request needs to set size limit consistent with the size range being claimed Does not tolerate mismatches in region of matches Consequently inefficient means of finding small sequences with <100% in correspondence Better suited to finding small sequences with 100% correspondence

Score Over Length Searching Small oligos with <100% correspondence within requested length and identity (>60%) range Manual manipulation of first 65,000 hits necessitates 2+ additional hrs. of searcher’s time does not include computer search time Calculation Hit Score divided by Hit Length for first 65,000 hits of table Hits then sorted by Score/Length value First 65,000 hits likely to contain small length sequence hits down to 60% identity Additional hours needed for post-processing is a resource issue.

Searching Small Sequences: Example Consider the following claim: An oligonucleotide consisting of 8 to 20 nucleotides which specifically hybridizes to a nucleic acid coding for mud loach growth hormone (Seq. Id. No. X). The specification teaches that oligonucleotides which specifically hybridize need not have 100% sequence correspondence.

Mud Loach Growth Hormone cDNA 670 nucleotides long 630 nucleotides in the coding region 210 amino acids

Standard Search GenBank Hit Table Against cDNA

Standard Search GenBank Hit Table Against cDNA

Standard Search GenBank Alignments Against cDNA

Standard Search GenBank Alignments Against cDNA

Oligomer Search GenBank Hit Table Against cDNA

Oligomer Search GenBank Hit Table Against cDNA

Oligomer Search GenBank Alignments Against cDNA

Oligomer Search GenBank Alignments Against cDNA

Length-Limited (8 to 20) Oligomer Search GenBank Hit Table cDNA

Length-Limited (8 to 20) Oligomer Search GenBank Hit Table cDNA

Length-Limited (8 to 20) Oligomer Search GenBank Alignments cDNA

Score/Length GenBank Hit Table Against cDNA: 8-20-mers down to 80%

Score/Length GenBank Hit Table Against cDNA: 8-20-mers down to 80%

Score/Length Alignments Against cDNA: 8-20-mers down to 80%

Score/Length Alignments Against cDNA: 8-20-mers down to 80%

QUESTIONS? Michael P. Woodward Supervisory Patent Examiner - Art Unit 1631 571 272 0722 michael.woodward@uspto.gov John L. LeGuyader Supervisory Patent Examiner - Art Unit 1635 571 272 0760 john.leguyader@uspto.gov