Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.

Slides:



Advertisements
Similar presentations
Bioinformatics Platform Three-tier Architecture Object-based Relational Database implemented using Oracle Middleware implemented using Entity-Class Operations,
Advertisements

Bioinformatics (and Systems Biology?) in Biomedical Research Donald Dunbar Systems Biology Club 30th November 2005.
Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
MOLEDINA-1 CSE 5810 CSE5810: Intro to Biomedical Informatics The Role of AI in Clinical Decision Support Saahil Moledina University of Connecticut
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
How we assist knowledge collection Serving the monks Chris Evelo Dept of Bioinformatics – BiGCaT Maastricht University.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Integration of Bioinformatics into Inquiry Based Learning by Kathleen Gabric.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
Many genes have unknown function 30% have unknown function only 9% are experimentally verified The Arabidopsis Genome Initiative, Nature 2000 of the 25,498.
BIOCMS: Resource Integration and Web Application Framework for Bioinformatics DHUNDY R BASTOLA †, *, ANIL KHADKA †, MOHAMMAD SHAFIULLAH † AND HESHAM ALI.
1 NIH Public Access Policy Policy on Enhancing Public Access to Archived Publications Resulting From NIH-Funded Research (Public Access Policy)
CrackingSiebel.com Utility Siebel Repository Extract (SRE) Tool.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
LESSON 4: PowerPoint slides to accompany Using Bioinformatics: Genetic Testing.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Automated Explanation of Gene-Gene Relationships Wacek Kuśnierczyk.
Sage Bionetworks Mission Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by.
© What do bioinformaticians do?
Bioinformatics and medicine: Are we meeting the challenge?
Rahul Raman, Ram Sasisekharan Bioinformatics Core Massachusetts Institute of Technology Glue Grants Bioinformatics Meeting April 22-23, 2004 San Diego,
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
Data Analysis Summary. Elephant in the room General Comments General understanding that informatics is integral in medical sequencing and other –omics.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Teranode Tools and Platform for Pathway Analysis Michael Kellen, Solution Manager June 16, 2006.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Agent-Oriented Data Curation in Bioinformatics Simon Miles University of Southampton PASOA project:
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
Clustering Algorithms to make sense of Microarray data: Systems Analyses in Biology Doug Welsh and Brian Davis BioQuest Workshop Beloit Wisconsin, June.
A curated database of biological pathways.
A collaborative tool for sequence annotation. Contact:
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.
Ferran Sanz – GRIB (IMIM-UPF) Bioinformatics: How it can support the Family of International Classifications? Ferran Sanz Research Programme on Biomedical.
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
Asymmetries in Retrieval of Gene Function Information Timothy B. Patrick, PhD 1, Lillian C. Folk, MS 2, Catherine K. Craven, MLS 3 1 Healthcare Administration.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
Immunology Ontology Workshop Buffalo, NY June 11-13, 2012.
High throughput biology data management and data intensive computing drivers George Michaels.
Big Data in Biology: A focus on genomics. Bioinformatics and Genomics O Applications: O Personalized cancer medicines O Disease determination O Pathway.
Ukpmc.ac.uk As a result of the mandates Research in the open How mandates work in practice 29 th May, 2009 Paul Davey, UK PubMed Central Engagement Manager,
J. Douglas Armstrong Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh. Bioinformatics at Edinburgh.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Databases, Ontologies and Text mining Session Introduction Part 2
Innovative Uses of the Database Activity
Data Flows in ACTRIS: Considerations for Planning the Future
IT Partners Conference Oliver Thomas 19 April 2005
Bioinformatics Research Group
Data challenges in the pharmaceutical industry
Functional Annotation of the Horse Genome
PIR: Protein Information Resource
Shared Genomics Sharing paths of exploration to support collaborative reasoning in genomic data analysis David Hoyle, Mark.
Extracting Semantic Concept Relations
Cell Biology Project.
Presentation transcript:

Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in Databases May 21st 2008

Background biomedical research basic & clinical science animal, cell models, patients genes, proteins, pathways data analysis & mining publication

Biomedical discovery Looking for contribution to – human health and disease In house experiments – data workflows – knowledge capture Use public databases – many data types – integration is a problem

Databases we use sequencestructure function expression domain specific

Data workflows experiment 2 spreadsheet raw data calculations publication database processed data experiment 1 database

Data workflows copy and paste open from file ‘algorithm’ copy and paste save to file IN OUT BUT: web services automated tools & databases bioinformatics workflows

Bioinformatics workflows

Is our field changing? databases experiments knowledge knowledgebase

Knowledge capture

What provenance to we need? Example: Gene expression in a transgenic animal gene annotation gene expression measurements public databasesoutput from machine processingintegration where, when which identifiershow when, what, how data mining what and how did we select genes …

What provenance to we need? Example: Curated protein database expert data database links curator input archive contributor, date verify, add, delete, modify source, identifiers, dates Curated database versions, dates development schema & interface changes

What do we do now (for provenance) ? We trust the main data providers a lot! – a pragmatic approach We use tools and note the settings – rarely fully We put extra fields in our databases – source, modify date We deposit our data in public repositories – but only when we need to

What might we do next? Use workflow tools like Taverna – capture workflow provenance Build provenance tool & database – widely applicable Make provenance more visible to biologists – so they value and use it

Conclusions In biology we don’t do provenance well (yet) We use databases and manual workflows We implement rudimentary provenance We should build useful provenance tools We need to make provenance visible