Describing Bioinformatic Metadata at EBI James Malone

Slides:



Advertisements
Similar presentations
The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
Advertisements

Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Gene Ontology John Pinney
Welcome to mini-symposium on ontologies for biological sample description EMBL-EBI Wellcome Trust Genome Campus Deceber 5, 2001.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
The Cell Line Ontology Sirarat Sarntivijai, Zuoshuang Xiang, Terrence F Meehan, Alexander D Diehl, Uma Vempati, Stephan Schurer, Chao Pang, James Malone,
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Claire O’Donovan EMBL-EBI. In UniProtKB, we aim to provide… o A high quality protein sequence database A non redundant protein database, with maximal.
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
EBI is an Outstation of the European Molecular Biology Laboratory. MAGE-TAB - The ArrayExpress Production Experience Helen Parkinson, PhD.
Samples, Phenotype, Ontology Team at EBI SPOT Terry Meehan.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
Ontologically Modeling Sample Variables in Gene Expression Data James Malone EBI, Cambridge, UK.
Rapid Development of an Ontology of Coriell Cell Lines Chao Pang, Tomasz Adamusiak, Helen Parkinson and James Malone
DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI.
EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology ArrayExpress Helen Parkinson,
Gene Expression Data Annotation – an application of the cell type ontology Helen Parkinson, PhD 19 May 2010.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Organizing information in the post-genomic era The rise of bioinformatics.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari
Bioinformatics and Computational Biology
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Learning and exploring Life science through the EBI reosurces and tools BIOQUEST workshop_2011 Vicky Schneider, EMBL-EBI Training Programme Project leader.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Anatomy Ontologies & Potential Users: Bridging the Gap Ravensara Travillian European Bioinformatics Institute
For EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
Cheminformatics and Metabolism Team The EBI Enzyme Portal.
EMBL’s European Bioinformatics Institute
Exploiting semantic technologies to build an application ontology
How to store and visualize RNA-seq data
ELIXIR: Authentication and Authorization Infrastructure Requirements
Functional Annotation of the Horse Genome
Florian Gräf Software Developer of the McEntyre group at EMBL-EBI
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Describing Bioinformatic Metadata at EBI James Malone

Master headline2 Cross-Domain Data available from EBI Genomes DNA & RNA sequence Gene expression Protein sequence Protein families, motifs and domains Protein families, motifs and domains Protein structure Protein interactions Chemical entities Pathways Systems Literature and ontologies

Master headline The Sorts of Data we Serve We manage databases of biological data such as nucleic acid, protein sequences and macromolecular structures ENA: nucleotide sequencing information UniProt: protein sequence and functional information ArrayExpress: functional genomics data repository Ensembl: genome info for vertebrates and other eukaryotes InterPro: database of predictive protein "signatures" PDBe: data resource on biological macromolecular structures

Master headline Sorts of Metadata we need Low complexity – high volume (genome sequencing) High complexity – low volume (mouse phenotyping) 1000 genomes in order of magnitude physics data Provenance models Experimental variables Publication details Synonym and domain specific language Cross-domain mappings Metadata has existed and been captured for a while, e.g. InterPro IDs

Master headline

Metadata: Minimum Information Standards Minimum Information Standards specify minimum amount of meta data (and data) required to meet a specific aim (usually reporting data or submitting to public repository) MIAMI: Minimum Information About a Microarray Experiment MIARE: Minimum Information About an RNAi Experiment MIAPE: Minimum Information About a Proteomic Experiment MIFlowCyt: Minimum Information about a Flow Cytometry Experiment ISA: cross domain experiment reporting Some public repositories require some conformation, e.g. ArrayExpress – MIAME scoring

Master headline Ontologies As a method of representing knowledge in which concepts are described both by their meaning and their relationship to each other. Increasingly important component to formalise metadata Thriving bio-ontology community e.g. Gene Ontology ‘project to standarise the representation of gene and gene product attributes e.g. ChEBI ‘ontology of molecular entities focused on small chemical compounds’ e.g. Ontology of Biomedical Investigations ‘ontology to describe experimental protocols from inception to analysis’

Metadata that is Interoperable Goal: community is interoperable set reference ontologies Consumed by application ontologies for specific needs E.g. Experimental Factor Anatomy Reference Ontology Cell Type Ontology Chemical Entities of Biological Interest (ChEBI) Various Species Anatomy Ontologies Relation Ontology Disease Ontology

Master headline Applying Ontologies in Data Query for Cell adhesion genes in all ‘organism parts’ ‘View on EFO’ Ontologically Modeling Sample Variables in Gene Expression Data

Master headline Strategies for Integrating Multi-Domain Data Consuming reference ontologies and mapping to multiple ontologies where overlap exists offers us maximum interoperability Rdf triple QUERY Atlas Swiss Prot Amino Acid Ontology

Master headline ELIXIR Report Data Integration & Interoperability Recommendations – Jul 2009 ELIXIR should build a distributed data infrastructure based on a Service Oriented Architecture using WS technology Ontologies needed in areas of disease, anatomy and taxon Annotation systems for associating data to metadata Pan ‑ domain coordination and funding for reporting standards

Master headline Current Challenges Literature – data gap Curation relatively slow, more advanced tooling required Ontologies not interoperable yet and more needed Bio-ontology funding New high-throughput methods Assays Experiments

Master headline Challenges: Scaling World-wide sequencing data production is now just an order of magnitude behind CERN Large Hadron Collider produces 15 petabytes per year from single point source LHC grid is 140 computer centres - 33 countries centered at CERN (Tier 0) Sequencing is producing data in hundreds of centers in dozens of countries with Tier 0 sites (EBI & NCBI) More than 150 Terabytes of 1000genomes data in the Short Read Archive and this represents more than half of all the data in the archive Slide: Laura Clarke, EBI

Master headline Summary EBI uses combination of metadata strategies Minimal Information useful for reporting standards Ontologies provide powerful method describing domain knowledge Ontologies also allow community consensus to be built as well as strategies for data integration ELIXIR suggests : Infrastructures should be WS compatible Annotation tools required Pan-domain coordination is essential

Developing an Ontology from the Application Up Acknowledgements Ontology creation: James Malone, Tomasz Adamusiak, Ele Holloway, Helen Parkinson, Jie Zheng (U Penn) Atlas GUI Development Misha Kapushesky, Pasha Kurnosov, Anna Zhukova. Nikolay Kolesinkov External Review and anatomy: Jonathan Bard, Jie Zheng ArrayExpress Production Staff EBI Rebholz Group (Whatizit text mining tool) Many source ontologies for terms and definitions esp. Disease Ontology, Cell Type Ontology, FMA, NCIT, OBI Funders: EC (Gen2Phen,FELICS, MUGEN, EMERALD, ENGAGE, SLING), EMBL, NIH Eric Neumann, Joanne Luciano and Alan Ruttenberg HCLS Group - Eric Prud'hommeaux and Scott Marshall