Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.

Slides:



Advertisements
Similar presentations
Yuma Pacific-Southwest Section, AIHA
Advertisements

The Diagnostic Laboratory ……the ideal system……. Molecular Genetics Diagnostic Laboratory Exciting area of medical pathology Need to continually up-date.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
SCOPING DIGITAL REPOSITORIES SERVICES FOR RESEARCH DATA MANAGEMENT A Project of the Office of the Director of IT 1 SCOPING DIGITAL REPOSITORY SERVICES.
Jennifer Anderson ∙ Regina Vertone The Sage Colleges Libraries ∙
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
Formal Empirical Applied Mathematical and technical methods and theories Cognitive, behavioral, and organizational techniques and theories ImagingBioInformaticsClinical.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
PCBC Bioinformatics Core & Committee PCBC Steering Committee Call Nathan Salomonis Cincinnati Children’s Larsson Omberg, Sage Bionetworks Nathan Salomonis.
Gene Expression Omnibus (GEO)
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
© 2003 East Collaborative e ast COLLABORATIVE ® eC SoftwareProducts TrackeCHealth.
PLEXdb Plant Expression database Ethalinda Cannon Iowa State University January 15th, 2007.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
TAIR Workshop Model Organism Databases and Community Annotation Plant and Animal Genome XVI Conference, San Diego January 13, 2008.
Ontologies, data standards and controlled vocabularies.
Resource Curation and Automated Resource Discovery.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
The Environmental Genomics Thematic Programme Data Centre Dawn Field, Director.
Crux flexible, structured data reporting for funding agencies.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Sharing Models. How Can I Exchange Models? SBML (Systems Biology Markup Language): de facto standard for representing cellular networks. A large number.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
FuGE: A framework for developing standards for functional genomics Angel Pizarro Univesrity of Pennsylvania Andrew Jones University of Manchester.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Bio-Linux 3.0 An integrated bioinformatics solution for the EG community ClustalX showing DNA polymerase alignment GeneSpring showing yeast transcriptome.
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Building an Infrastructure for Digital Humanities: Issues and Considerations Peter Zhou 周欣平 University of California, Berkeley October 8, 2009.
Johannes Griss PSI Meeting Heidelberg, April 2011 EBI is an Outstation of the European Molecular Biology Laboratory. mzTab Proposal for.
2009 IADR, MIAMI, FL, USA Hands-on Experience for using the Human Oral Microbiome Database (HOMD) 2009 IADR Workshop, Miami, FL, USA Tsute (George) Chen.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Phenotype Curation Susan R. McCouch Department of Plant Breeding Cornell University.
Construction of Shanghai Life Science & Bio-technology Service Platform for Data Access and Sharing International Workshop on Strategies Presentation of.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
What is NCIA? National Cancer Imaging Archive Searchable repository of in vivo cancer images in DICOM format Publicly available at no cost over the Internet.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
Dasty2 DAS workshop th March Rafael Jimenez.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
1 LS DAM Overview August 7, 2012 Current Core Team: Ian Fore, D.Phil., NCI CBIIT, Robert Freimuth, Ph.D., Mayo Clinic, Mervi Heiskanen, NCI-CBIIT, Joyce.
ArrayExpress Ugis Sarkans EMBL - EBI
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Using ArrayExpress.
Development of the Amphibian Anatomical Ontology
Data Exchange & Public Reference Data
Functional Annotation of the Horse Genome
Functional Genomics Consortium: NIDDK (Kaestner) and (Permutt)
Presentation transcript:

Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder

Outline Activities at UCT: – High-throughput biology data – Sequence annotation – DAS annotation development Issues we face A note on standards and ontologies

High-throughput biology data Close ties with CPGR Microarray data storage –BASE Proteomics data: – Annotation –pipeline required – Storage –LIMS required

BASE BioArray Software Environment Open source database for storage of array- type data Manages raw data (images) and annotations Has limited LIMS options Can include specifications for MIAME compliance

BASE Sample Information

BASE experimental info

Proteomics Data Still in progress Peptide identification programs Additional cross-linking from results to public database annotations Storage of experimental data and resulting identifications Include MIAPE compliance Linking to genomics data –standards required

Sequence Annotation 1 Paeano pipeline for annotation of cDNAs from non-model organisms Uses collection of publicly available and custom software Results are stored under projects Links provided to array data in BASE

Sequence Annotation 2 Glossina (Tsetse) EST annotation project Held annotation jamboree at UWC Worked with Twiki tool developed by JBIRC Data to be submitted to public databases

Twiki system

DAS Annotation Tool Distributed Annotation System –allows viewing of annotation from different sources Can overlay your own data/annotation Facilitates information sharing without issue of updates Repositories distributed in different geographical locations Extension of DASTy2 –developed at NBN Development of DAS annotation tool underway

DASTy

Links to other DAS viewers

DAS annotation tool Collaborative visual annotation tool - Annotation - Comments - Sequences - Features - Non positional features - Methodology of trust on a collaborative annotation process

Data curation and management issues HTB software licenses are expensive Open Source not always maintained Ensuring regular backups (data size) Keeping data up to date Researchers leave data after project –not updated to new versions Privacy –researchers share data only with collaborators, patient data is private Sharing and linking data

Standards and ontologies Use a controlled vocabulary (controlled list of terms) or ontology (set of terms with relations) Enables easy data retrieval and sharing Easy comparison of results from different labs Compatibility with other labs/databases world- wide Ease of uploading data into public databases Unambiguous report of research

Open Biomedical Ontologies Central location for accessing well-structured controlled vocabularies and ontologies for use in the biological and medical sciences Provides simple format for ontologies Scope include anatomy, phenotype, development, disease, “omics”, experiment, etc.

Data exchange standards Microarray standards –MIAME and MAGE Proteomics Standards Initiative (PSI) Systems Biology Markup Language (SBML) – computer-readable format for representing models of networks Biological Pathways Exchange (BioPAX) –format for representing pathways

Conclusions Some tools in place for curation and management of different data types Need better education of researchers to encourage this Ontologies and standards are important in digital data curation and management, need to encourage compliance with international standards

Acknowledgements Funding: Collaborations: – CPGR – Researchers at UCT