UniProt Non-redundant Reference Cluster (UniRef) Databases www.uniprot.org Swiss Institute of Bioinformatics (SIB) European Bioinformatics Institute (EMBL-EBI)

Slides:



Advertisements
Similar presentations
Genome Annotation: A Protein-centric Perspective.
Advertisements

Analysis of Biomolecular Sequences 29/01/2015 Mail: Prof. Neri Niccolai Simone Gardini
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Swiss-Prot Protein Database Daniel Amoruso December 2, 2004 BI 420.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
Protein Databases EBI – European Bioinformatics Institute
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Bioinformatics. Strategies for proteomics: which database? Dr Richard J Edwards 27 August 2009; CALMARO workshop.
Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Proteins and Protein Function Charles Yan Spring 2006.
Bioinformatics. Analysis of proteomic data. Dr Richard J Edwards 28 August 2009; CALMARO workshop. ©Gary Larson (In not much detail)
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
URL: European Bioinformatics Institute (EMBL-EBI) Swiss Institute of Bioinformatics (SIB) Protein Information Resource.
UniProt - The Universal Protein Resource
Claire O’Donovan EMBL-EBI. In UniProtKB, we aim to provide… o A high quality protein sequence database A non redundant protein database, with maximal.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Protein Sequence Databases Computational Molecular Biology Biochem 218 – BioMedical.
The PIR-PSD current release 78.03, November 24, 2003, contains entries. 65 proteins The PIR was established in 1984 by the National Biomedical.
Biological Sequences and Patents Chemical compounds and Patents Agenda Acknowledgements: FELICS is funded by the European.
Tunis, March 2007 A. Auchincloss UniProtKB and ExPASy 1 Practical exercises Answers…
Bioinformatics for biomedicine
The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number.
1 Protein Bioinformatics – Advances and Challenges Sona Vasudevan Peter McGarvey BY.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
? Functional Site rule: tags active site, binding, other residue- specific information Functional Annotation rule: gives name, EC, other activity- specific.
EBI is an Outstation of the European Molecular Biology Laboratory. Amaia Sangrador InterPro curator Introduction to InterPro.
Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results.
Web Services for PIR/UniProt Databases Baris E. Suzek, Hongzhan Huang, Sehee Chung, Hsing-Kuo Hua, Peter McGarvey, Zhangzhi Hu, Cathy H. Wu, Protein Information.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Protein Analysis Course Day 1: Databases, dotplots and pairwise alignment.
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
1 EMBL Outstation — The European Bioinformatics Institute Removing redundancy in SWISS-PROT and TrEMBL.
Copyright OpenHelix. No use or reproduction without express written consent1.
EMBL – EBI European Bioinformatics Institute UniProt - The Universal Protein Resource Claire O’Donovan.
Computer Storage of Sequences
EBI is an Outstation of the European Molecular Biology Laboratory. EBI patent related services Jennifer McDowall Senior Scientist, EMBL-EBI 3 rd Annual.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
EBI is an Outstation of the European Molecular Biology Laboratory. Amaia Sangrador InterPro curator Introduction to InterPro.
Answering Gene Ontology terms to proteomics questions by supervised macro reading in MEDLINE Julien Gobeill 1, Emilie Pasche 2, Douglas Teodoro 2, Anne-Lise.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
? Functional Site rule: tags active site, binding, other residue- specific information Functional Annotation rule: gives name, EC, other activity- specific.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall Protein Sequence Database:
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Tutorial: Bioinformatics Resources ( georgetown
Protein databases Henrik Nielsen
Biological Sequence Databases
생물정보학 Bioinformatics.
Functional Annotation of the Horse Genome
UniProt: Universal Protein Resource
UniProt: the Universal Protein Resource
PIR: Protein Information Resource
Introduction to Bioinformatics
Tutorial: Bioinformatics Resources
Protein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview -
Introduction to Databases
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Overview of Enzyme, Protein and Network Databases
Presentation transcript:

UniProt Non-redundant Reference Cluster (UniRef) Databases Swiss Institute of Bioinformatics (SIB) European Bioinformatics Institute (EMBL-EBI) Protein Information Resource (PIR) Contact UniProt is mainly supported by the National Institutes of Health (NIH) grant 2 U01 HG Additional support for the EBI's involvement in UniProt comes from the European Commission contract FELICS (021902) and from the NIH grant 5 P41 HG UniProtKB/Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the Federal Office of Education and Science. PIR activities are also supported by the NIH grants for NIAID proteomic resource (HHSN C) and grid enablement (NCI-caBIG-ICR), and National Science Foundation grants for protein ontology (ITR ) and BioTagger (IIS ). UniProtKB Sequences UniProtKB Isoform Sequences Selected UniParc Sequences from ENSEMBL, RefSeq and PDB databases String Comparison: Identifying sub-fragments and identical sequences CD-HIT computation: Clustering UniRef100 representative sequences at 90% level CD-HIT computation: Clustering UniRef90 representative sequences at 50% level Generating data files for distribution UniRef Release UniRef100 Identical sequences and sub-fragments with 11 or more residues are placed into a single record UniRef90 Members of related UniRef100s at 90% level form a UniRef90 cluster. The representative is selected based on the quality of the entry, name, organism and sequence length. Title and identifier are derived from the representative sequence. UniRef50 Members of related UniRef90s at 50% level form a UniRef90 cluster. The representative is selected based on the quality of the entry, name, organism and sequence length. Title and identifier are derived from the representative sequence. UniProt Reference Clusters (UniRef), UniRef100, UniRef90 and UniRef50 are automatically generated from UniProt Knowledgebase and selected UniParc records. The databases provide complete coverage of sequence space while hiding redundant sequences from view. The non-redundancy allows faster sequence similarity searches by using UniRef90 and UniRef50 UniRef90 40% size Reduction UniRef50 65% size Reduction >UniRef90_P00439 Phenylalanine-4-hydroxylase related cluster MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVLRLFEENDV NLTHIESRPSRLKKDEYEFFTHLDKRSLPALTNIIKILRHDIGATVHELSRDKKKDTVPW FPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQFADIAYNYRHGQPIPRVEYM EEEKKTWGTVFKTLKSLYKTHACYEYNHIFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGF RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFA QFSQEIGLASLGAPDEYIEKLATIYWFTVEFGLCKQGDSIKAYGAGLLSSFGELQYCLSE KPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKEKVRNFAATIPRPFSVRYDPYTQR IEVLDNTQQLKILADSINSEIGILCSALQKIK <UniRef90 xmlns=" Phenylalanine-4-hydroxylase related cluster MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVLRLFEENDV NLTHIESRPSRLKKDEYEFFTHLDKRSLPALTNIIKILRHDIGATVHELSRDKKKDTVPW FPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQFADIAYNYRHGQPIPRVEYM EEEKKTWGTVFKTLKSLYKTHACYEYNHIFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGF RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFA QFSQEIGLASLGAPDEYIEKLATIYWFTVEFGLCKQGDSIKAYGAGLLSSFGELQYCLSE KPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKEKVRNFAATIPRPFSVRYDPYTQR IEVLDNTQQLKILADSINSEIGILCSALQKIK XML file FASTA file UniRef Usages ●Speeding up similarity search ●Reducing bias in homology searches by providing more even sequence space ●Using the clusters for family classification ●Using the clusters to annotate EST and other sequence databases ●Using the clusters to check the consistency of UniProtKB annotations