Presentation is loading. Please wait.

Presentation is loading. Please wait.

Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.

Similar presentations


Presentation on theme: "Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and."— Presentation transcript:

1 Other biological databases and ontologies

2 Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological systems Protein families and domains Whole genome data Sequence data Ontologies -GO

3 Ontologies An ontology is a formal specification of terms and relationships between them –widely used in biology and boinformatics (e.g. taxonomy) The relationships are important and represented as graphs Ontology terms should have definitions Ontologies are machine-readable They are needed for ordering and comparing large data sets

4 What is a cell? What’s in a name?

5 What is a cell?

6 Ambiguities in naming The same name can be used to describe different concepts, e.g: –Glucose synthesis –Glucose biosynthesis –Glucose formation –Glucose anabolism –Gluconeogenesis All refer to the process of making glucose Makes it difficult to compare the information Solution: use Ontologies and Data Standards

7 Gene Ontology (GO) http://www.geneontology.org Controlled vocabulary/ontology Introduced to provide standardised way of annotating gene products (http://www.geneontology.org) Used for functional annotation of genes or proteins

8 GO ontologies Molecular function: –tasks performed by gene product –e.g. G-protein coupled receptor Biological process: –broad biological goals accomplished by one or more gene products –e.g. G-protein signaling pathway Cellular component: –part(s) of a cell of which a gene product is a component; includes extracellular environment of cells –e.g nucleus, membrane etc.

9 GO term examples GO terms arranged in DAG Relationships between terms

10 How to annotate to GO See if gene product annotated already e.g. by MOD or GOA Manual annotation –need evidence codes Blast2GO Using GO mapping files (e.g. InterPro, EC, Swiss-Prot keyword)

11 Multiple GO terms Process mappings: -Cell communication (IPR2GO) -GPCR pathways (SPKW2GO) -GPCR pathways (IDA) Select most manual first, then most specific

12 Finding existing GO annotation Small-scale –QuickGO or AmiGO browsers Large-scale: –GOA FTP site GOA proteomes (>25% coverage) GOA human, mouse, rat, cow, zebrafish, Arabidopsis, etc. GOA UniProt –Proteome Analysis

13 Searching GOA in QuickGO http://www.ebi.ac.uk/ego

14 Microarray data analysis Proteomics data analysis Larkin JE et al, Physiol Genomics, 2004 Cunliffe HE et al, Cancer Res, 2003 GO classification Analysis of high-throughput data Uses of GO annotation

15 Open Biomedical Ontologies (OBO) http://obo.sourceforge.net Central web location for accessing well-structured CVs and ontologies for use in the biological and medical sciences. Provides a simple format for ontologies that encodes terms, relationships between terms and definitions of terms (Not all OBO ontologies use this format however).

16 Scope of OBO Anatomy Animal natural history and life history Chemical Development Ethology Evidence codes Experimental conditions Genomic and proteomic Metabolomics OBO relationship types Phenotype Taxonomic classification Vocabularies

17 Other Biological Databases Transcription factor binding sites - TRANSFAC Protein structure databases- PDB, SCOP, CATH Protein family databases- Pfam, Prints, PROSITE etc. Chemicals and small molecules - ChEBI Gene expression databases – GEO, ArrayExpress Metabolic pathways - Reactome, KEGG Genome Databases- Ensembl, FlyBase, WormBase etc.

18 Transcription factor binding sites TRANSFAC –database of eukaryotic transcription factors: http://www.gene- regulation.com/pub/databases.html#transfac TESS –Transcription Element Search System –for predicting transcription factor binding sites, uses TRANSFAC: http://www.cbi.upenn.edu/tess TFsearch –for searching transcription factor binding sites: http://www.cbrc.jp/research/db/TFSEARCH.html

19 Protein structure databases Main resource is Protein Data Bank (PDB): http://www.rcsb.org/pdb/ Repository for solved structures Can search by PDB code Structural family databases based on PDB –SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) and CATH (http://www.biochem.ucl.ac.uk/bsm/cath/) Predicted structures in SWISS-MODEL (http://swissmodel.expasy.org//SWISS- MODEL.html)

20 Searching MSD http://www.ebi.ac.uk/msd -Search by PDB code

21 Link to CATH

22 Protein family databases Databases that produce signatures for identifying protein families or domains Used for functional classification of proteins E.g. Pfam, PROSITE, Prints, SMART, TIGRFAMs etc. Integrated into single resource InterPro (http://www.ebi.ac.uk/interpro)

23 InterProScan sequence search Stand-alone version available

24 Results for protein acc

25 Example InterPro entry

26 Chemicals and small molecules Chemical abstracts- http://www.cas.org/ ChEBI- http://www.ebi.ac.uk/chebi KEGG –part of it includes chemicals http://www.genome.jp/kegg ChemID plus -chemicals cited in NLM databases http://chem2.sis.nlm.nih.gov/chemidplus/chemi dlite.jsp MSD-Chem –ligands and chemicals in MSD

27 CheBI example entry

28 Hierarchy for chemicals

29 Gene expression databases NCBI Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/ ArrayExpress http://www.ncbi.nlm.nih.gov/geo/ Stanford microarray database http://genome- www5.stanford.edu/ Can usually search for experiments or particular expression profiles

30 GEO search page

31 Profiles search results

32 Specific entry and experiment info

33 ArrayExpress search results

34 Metabolic Pathways PATHGUIDE >200 pathways KEGG (Kyoto encyclopedia of genes and genomes): http://www.genome.jp/kegg -includes: –Database of chemicals, genes and networks (metabolic, regulatory etc.) –Well-curated and quite specific EcoCyc (Encyclopedia of E. coli K12 genes and metabolism): http://ecocyc.org –curation of entries genome Reactome –curated biological pathways: http://www.reactome.org/ GenMAPP –pathways contributed by users

35 Pathway in Reactome

36 Example of a pathway in BioCyc

37 Protein-protein interaction databases Protein-protein interaction databases store pairwise interactions or complexes IntAct http://www.ebi.ac.uk/intact DIP (Database of Interacting Proteins) http://dip.doe-mbi.ucla.edu/ BIND (Biomolecular Interaction Network Database) http://submit.bind.ca:8080/bind/

38 Protein-protein interactions

39 Genome browsers Integrate sequence & functional data for a genome Ensembl –genome browser for major eukaryotic genomes, e.g. human, mouse etc. http://www.ensembl.org UCSC browser -http://genome.ucsc.edu/ FlyBase –Drosophila genome database: http://www.ebi.ac.uk/flybase WormBase –C. elegans: http://www.wormbase.org PlasmoDB –Plasmodium (malaria): http://plasmodb.org Etc.

40 Ensembl genome browser

41 Ensembl gene view 1

42 Ensembl gene view 2

43 Gene within context on chromosome


Download ppt "Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and."

Similar presentations


Ads by Google