Proteins and Protein Function Charles Yan Spring 2006.

Slides:



Advertisements
Similar presentations
Genome Annotation: A Protein-centric Perspective.
Advertisements

Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
Asking translational research questions using ontology enrichment analysis Nigam Shah
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Gene Ontology John Pinney
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Swiss-Prot Protein Database Daniel Amoruso December 2, 2004 BI 420.
Lipids, Proteins, and Carbohydrates
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
What is bioinformatics? Answer: It depends who you ask.
Visualizing Protein Structures. Genetic information, stored in DNA, is conveyed as proteins.
COG and GO tutorial.
Day 2. Genetic information, stored in DNA, is conveyed as proteins.
Protein Databases EBI – European Bioinformatics Institute
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
Internet tools for genomic analysis: part 2
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
UniProt - The Universal Protein Resource
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Chemistry 121(001) Winter 2015 Introduction to Organic Chemistry and Biochemistry Instructor Dr. Upali Siriwardane (Ph.D. Ohio State)
Claire O’Donovan EMBL-EBI. In UniProtKB, we aim to provide… o A high quality protein sequence database A non redundant protein database, with maximal.
The PIR-PSD current release 78.03, November 24, 2003, contains entries. 65 proteins The PIR was established in 1984 by the National Biomedical.
P2 Discussion 1. Revise on Central Dogma 2
Bioinformatics for biomedicine
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Enzymes (Proteins) Standards 1b, 1h, 4e, 4f, From the largest entity in the Universe to the smallest entity that makes up all the matter in the Universe.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
Biological Databases By : Lim Yun Ping E mail :
UniProt Non-redundant Reference Cluster (UniRef) Databases Swiss Institute of Bioinformatics (SIB) European Bioinformatics Institute (EMBL-EBI)
Computational Analysis of Proteins Dr. K. Sivakumar Department of Chemistry SCSVMV University Chemistry – Our Life, Our Future National.
The Gene Ontology: a real-life ontology, progress and future. Jane Lomax EMBL-EBI.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Amino Acids Colorless, crystalline, water soluble substances Distinguishing features are a -COOH group and a -NH 3 group attached to the same carbon R.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
The Gene Ontology and its insertion into UMLS Jane Lomax.
Copyright OpenHelix. No use or reproduction without express written consent1.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
To Boldly GO… Amelia Ireland GO Curator EBI, Hinxton, UK.
1 EMBL Outstation — The European Bioinformatics Institute Removing redundancy in SWISS-PROT and TrEMBL.
EMBL – EBI European Bioinformatics Institute UniProt - The Universal Protein Resource Claire O’Donovan.
Introduction to the Gene Ontology GO Workshop 3-6 August 2010.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
Computer Storage of Sequences
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
Example of regression by RBF-ANN Prediction of charge on peptides after electron-spray ionization in mass spectrometry What are the best attributes to.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
1 EMBL Outstation — The European Bioinformatics Institute Mus musculus - a model organism in SWISS-PROT.
InterPro Sandra Orchard.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
Joined up ontologies: incorporating the Gene Ontology into the UMLS.
1 EMBL Outstation — The European Bioinformatics Institute Large-Scale Characterization of Protein Sequence Data.
Getting GO annotation for your dataset
Protein databases Henrik Nielsen
Annotating with GO: an overview
Department of Genetics • Stanford University School of Medicine
UniProt: Universal Protein Resource
UniProt: the Universal Protein Resource
PIR: Protein Information Resource
Introduction to Bioinformatics
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Proteins and Protein Function Charles Yan Spring 2006

2 Amino Acids General structure of an amino acid 20 standard amino acids each with a different R group

3 Amino Acids Amino Acid3-letter code1-letter code AlanineAlaA ArginineArgR AsparagineAsnN AspartateAspD CysteineCysC GlutamineGlnQ GlutamateGluE GlycineGlyG HistidineHisH IsoleucineIleI Table standard amino acids

4 Amino Acids Amino Acid3-letter code1-letter code LeucineLeuL LysineLysK MethionineMetM PhenylalaninePheF ProlineProP SerineSerS ThreonineThrT TryptophanTrpW TyrosineTyrY ValineValV Table standard amino acids (Cont.)

5 Amino Acids Amino Acid3-letter code1-letter code Asparagine (N) or aspartate (D)AsxB Glutamine (Q) or glutamate (E)GlxZ Any amino acidXaaX Authority IUPAC-IUB Joint Commission on Biochemical Nomenclature. Reference IUPAC-IUB Joint Commission on Biochemical Nomenclature. Nomenclature and Symbolism for Amino Acids and Peptides. Eur. J. Biochem. 138:9-37(1984). Amino Acid Abbreviations (IUPAC)

6 Proteins Two separate amino acids can be linked together by a peptide bond A chain of amino acids linked by peptide bonds is called a polypeptide. A protein is made up of one or more polypeptide chains For simplicity, in this course, a protein is a chain of amino acids linked by peptide bonds, e.g. VSQLLKQRVRYAPYLSKVRRAEELLPLFKHGQYIGWSGFTGVGAPKVI

7 Protein Database UniProt (Universal Protein Resource) ( is the world's most comprehensive catalog of information on proteins. It is a collaboration betweenhttp:// Swiss Institute of Bioinformatics (SIB) Department of Bioinformatics and Structural Biology of the Geneva University European Bioinformatics Institute (EBI) Georgetown University Medical Center's Protein Information Resource (PIR) It includes three components

8 Protein Database UniProt Knowledgebase (UniProtKB): the central access point for extensive curated protein information. UniProtKB/Swiss-Prot: a manually annotated protein sequence database which provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. UniProtKB/Swiss-Prot Release 48.7 of 20-Dec-2005: 204,086 entries UniProtKB/TrEMBL: a computer-annotated supplement of Swiss-Prot that contains all the translations of EMBL nucleotide sequence entries not yet integrated in Swiss-Prot. UniProtKB/TrEMBL Release 31.7 of 20-Dec-2005: 2,506,886 entries UniProt Reference Clusters (UniRef): databases combine closely related sequences into a single record to speed searches. UniProt Archive (UniParc): a comprehensive repository, reflecting the history of all protein sequences

9 Protein Database

10 Protein Database

11 Protein Database

12 Protein Database

13 Protein Database

14

15 Gene Ontology Protein synthesis Translation Goal: find all the proteins that are involved protein synthesis

16 Gene Ontology Volkswagen Golf Golf I like golf. Me too!

17 Gene Ontology Ontology n. the branch of metaphysics dealing with the nature of being. (The New Oxford American Dictionary, Edited by Elizabeth J. Jewell, Frank Abate, Oxford University Press, 2001,pp 1197.)  Metaphysics n. the branch of philosophy that deals with the first principles of things, including abstract concepts such as being, knowing, substance, cause, identity, time, and space. (The New Oxford American Dictionary, Edited by Elizabeth J. Jewell, Frank Abate, Oxford University Press, 2001,pp 1074.)

18 Gene Ontology The Gene Ontology (GO) ( project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The project began as a collaboration between three model organism databases: FlyBase (Drosophila),the Saccharomyces Genome Database (SGD) and the Mouse Genome Database (MGD) in Since then, the GO Consortium has grown to include many databases, including several of the world's major repositories for plant, animal and microbial genomes.

19 Gene Ontology Develop structured, controlled vocabularies (ontologies) that describe gene products Make associations between the ontologies and the genes and gene products in the collaborating databases, Develop tools that facilitate the creation, maintainence and use of ontologies The use of GO terms facilitates uniform queries across databases

20 Gene Ontology The three components of GO are molecular function, biological process and cellular component GO terms are organized in structures called directed acyclic graphs (DAGs), which differ from hierarchies in that a child, or more specialized, term can have many parent, or less specialized, terms hexose biosynthesis monosaccharide biosynthesishexose metabolism

21 Gene Ontology The controlled vocabularies are structured so that you can query them at different levels GO browser AmiGO ( bin/amigo/go.cgi)

22

23 Protein function Three steps to get a set of proteins that have a certain function Search for the GO term ( Search for the proteins belong to a certain GO ( Save the sequence in FASTA format

24 Search for the GO

25 Search for the proteins belong to a certain GO

26 Save sequences in FASTA format