EBI is an Outstation of the European Molecular Biology Laboratory. EBI patent related services Jennifer McDowall Senior Scientist, EMBL-EBI 3 rd Annual Forum for SMEs September 3-4 th 2009
Overview Databases available Sequence archives Searching the database EBI patent related services
Databases available… EBI patent related services
September 2009 nucl > 9.4m sequences prot > 2.5m sequences GenBank EMBL DDBJ EPO USPTOJPO EPO policy: data released to public (and to EMBL) 18 months after the patent application date, independent of whether patent has been granted.. Sequence data from patent literature EBI patent related services
EMBL Know the Data…Nucleotides EBI patent related services Release and updates
EMBL Know the Data…Nucleotides Divided into classes and divisions... Release and updates ANN – Annotated Constructed SeqPAT – Patent CON – Constructed SequenceSTS – Sequence Tagged Site EST – Expressed Sequence TagSTD – Standard GSS – Genome Survey SequenceTPA – Third Party Annotation HTC – High Throughput cDNATSA – Transcriptome Shotgun Assembly HTG – High Throughput GenomeWGS – Whole Genome Shotgun EBI patent related services
EMBL Know the Data…Nucleotides Divided into classes and divisions... Release and updates EBI patent related services HUM – Human MUS – Mouse ROD – Rodent (excluding mouse) MAM – Mammal (excluding human, mouse, rodent) VRT – Vertebrate (excluding human, mouse, rodent, mammal) FUN – Fungi PRO – Prokaryote ENV – Environment INV – Invertebrate PHG – Phage SYN – Synthetic PLN – Plant VIR – ViralTGN – Transgenic UNC – Unclassified
EMBL Know the Data…Nucleotides Divided into classes and divisions... Release and updates Supplementary sets: EMBL-CDS, EMBL-MGA EBI patent related services Specialist databases: Immunoglobulins (IMGT/HLA, IMGT/LIGM) Alternative splicing (ASDT) Completed proteomes (Ensembl, Integr8) Variation (HGVBase, dbSNP)
EBI patent related services EMBL Patent Sequence Entry Version, dates, archive Patent number, title, link to patent
EBI patent related services UniProt Know the Data…Proteins Release and updates
UniProt Know the Data…Proteins Divided into 3 sections: Release and updates UniProtKB Taxonomic info Annotated sequence UniRef Combines sequences by % ID UniRef100, 90, 50 UniParc Protein archive Covers ALL proteins (including UniMess) EBI patent related services SwissProtTrEMBL Manual annotation Automatic annotation
UniProt Know the Data…Proteins Divided into 3 sections Release and updates Specialist databases linked to UniProt: Structure (PDBe, SGT) Immunoglobulins (IMGT/HLA) Alternative splicing (ASDT) Completed proteomes (Ensembl, Integr8) Protein interactions (IntAct) Protein signatures (InterPro) Patent proteins (EPO, USTPO, JPO, KIPO) EBI patent related services
Bulk download Nucleotide sequences Protein sequences
EBI patent related services Bulk download ftp.ebi.ac.uk/pub/databases/embl/patent/
Sequence archives… EBI patent related services
EMBL nucleotide sequence version archive (SVA) UniSave – UniProt sequence/annotation version archive Sequence archives EBI patent related services
EMBL sequence version archive (SVA) EBI patent related services View old entries Enter accession #
EBI patent related services Sequence record from EMBL SVA
EBI patent related services Comparing versions in EMBL SVA Select and compare versions
EBI patent related services
UniProtKB sequence annotation version server - UniSave Enter accession #
EBI patent related services UniSave results Select and compare versions View old entries
EBI patent related services Searching the databases…
EB-eye search by patent number Search for patent WO EBI patent related services
EB-eye search by patent number EBI patent related services
EB-eye nucleotide sequences from WO
Sequence Similarity Search Tools EBI patent related services Toolbox BLAST NCBI-BLAST Wu-BLAST FASTA FASTA suite Smith-Waterman MPsrch ScanPS SSEARCH PSI search PSI-SEARCH PSI-BLAST
Blast v. patent nucleotide sequences
EBI patent related services Fasta v. patent protein sequences
Tools: Genomes & Proteomes FASTA EBI patent related services
Database size Query length FASTA WU-BLAST NCBI BLAST PSI-SEARCH When to use which search? EBI patent related services
PDB Swiss-Prot UniRef50 UniRef 90 UniRef100 UniProtKB UniParc FASTA WU-BLAST NCBI BLAST PSI-SEARCH time to search When to use which search? EBI patent related services
InterProScan protein signature search EBI patent related services
InterPro signature database EBI patent related services
Some search guidelines…
Search Guidelines #1 Use the most appropriate tool for your search - Don’t assume one tool will cater to all your search needs Database size Query length FASTA WU-BLAST NCBI BLAST PSI-SEARCH EBI patent related services
Search Guidelines #1 Use the most appropriate tool for your search #2 Best search option protein seq v. protein DB 2 nd translated DNA seq v. protein DB 3 rd DNA seq v. DNA DB Worst protein seq v. transl DNA BD EBI patent related services
Search Guidelines #1 Use the most appropriate tool for your search #2 Best search option protein seq v. protein DB #3 Search the smallest DB likely to have your sequence #4 Check statistics – histograms... #5 Change parameters when necessary (gap penalties, scoring matrices...) #6 Don’t assume homologues have the same function Orthologs have similar functions Paralogs acquire different functions EBI patent related services
Search Guidelines #1 Use the most appropriate tool for your search #2 Best search option protein seq v. protein DB #3 Search the smallest DB likely to have your sequence #4 Check statistics – histograms... #5 Change parameters when necessary (gap penalties, scoring matrices...) #6 Don’t assume homologues have the same function EBI patent related services #7 Use multiple sequence alignments to validate relatedness #8 Consider filtering low complexity regions
Typical workflow search review Check stats compare evolution function EBI patent related services
EBI is an Outstation of the European Molecular Biology Laboratory. Contacts: