The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.

Slides:



Advertisements
Similar presentations
April 2010 MRC Data Sharing Policy Peter Dukes Policy Lead – Data Sharing & Preservation.
Advertisements

New Services for Data Creators and Providers Louise Corti, Head ESDS Qualidata/ Outreach & Training Alasdair Crockett, ESDS Data Services Manager.
Grid Security/Edinburgh 5 th & 6 th December 2002 Confidentiality, Consent & Access Peter Singleton - Cambridge Health Informatics.
UKRDS Conference 26 February 2009 A Researchers Perspective: the Value and Challenge of Data Professor John Coggins Vice Principal, Life Sciences & Medicine.
Open Access, Research Funders and the Research Excellence Framework Open Access Team, Library.
Data Management Planning Kerry Miller Digital Curation Centre University of Edinburgh DIY Research Data Management Training Kit for.
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
An Open Access publisher’s perspective on data publishing Matthew Cockerill Managing Director, BioMed Central Dryad-UK meeting HEFCE, London, 28 April.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit
BIOCMS: Resource Integration and Web Application Framework for Bioinformatics DHUNDY R BASTOLA †, *, ANIL KHADKA †, MOHAMMAD SHAFIULLAH † AND HESHAM ALI.
ACAT 2008 Erice, Sicily WebDat: Bridging the Gap between Unstructured and Structured Data Jerzy M. Nogiec, Kelley Trombly-Freytag, Ruben Carcagno Fermilab,
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
What are research data? July 2015 This work is licensed under a Creative Commons Attribution 4.0 International LicenseCreative Commons Attribution 4.0.
SobekCM’s Community Ecosystems & Socio-Technical Practices Presented by Mark V. Sullivan June 10 th, 2014 Sobek image created by Jeff Dahl and is shared.
SWIS Digital Inspections Project (SWIS DIP) Chris Allen, Information Management Branch California Integrated Waste Management Board November 5, 2008 The.
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
The importance of DART for funding agencies Dr. Ingrid Kissling-Näf.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Gene Expression Omnibus (GEO)
David Carr The Wellcome Trust Data Matters: Wellcome Trust perspective Dryad-UK Meeting 28 April 2010.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
Future Use of Stored Samples & Data and the NIH Policy on GWAS and dbGaP NIAID/DAIDS Dione Washington, M.S. -- ProPEP Sudha Srinivasan, Ph.D.-- TRP Tanisha.
The repositories Landscape: where are Repositories now and what’s around the corner? UKDA-store Louise Corti UKDA, University of Essex MIMAS OPEN FORUM.
National Oesophago-Gastric Cancer Audit Clinical Audit Platform How to Register, Submit and View Reports CAP: |
TPM Software within Good Spirit School Division. TPM Software is an integrated Student Services Software Solution Forms / Printouts / Reports Integrated.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
Open Access and the Wellcome Trust: providing funds for open-access publishing Kathryn Lallu Grants Policy, Liaison and Support Manager Grants Administration.
Organizing information in the post-genomic era The rise of bioinformatics.
Data and storage services on the NGS Mike Mineter Training Outreach and Education
Bioinformatics Core Facility Guglielmo Roma January 2011.
Now launched! Visit nature.com/scientificdata Honorary Academic Editor Susanna-Assunta Sansone Advisory.
ITGS Databases.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Clinical Collaboration Platform Overview ST Electronics (Training & Simulation Systems) 8 September 2009 Research Enablers  Consulting  Open Standards.
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Gene Expression Omnibus (GEO)
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
SWIS Digital Inspections Project Chris Allen, Information Management Branch California Integrated Waste Management Board August 22, 2008.
Negotiating the maze – Data Complexity in the Life Sciences Sarah Butcher Bioinformatics Support Service
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Context: The Strategic Plan for Establishing the Network Integrated Biocollections Alliance Judith E. Skog, Office of the Assistant Director, Biological.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
WebDat: A Web-based Test Data Management System J.M.Nogiec January 2007 Overview.
Why RDA? A domain repository perspective George Alter ICPSR University of Michigan.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
CIAF Summary Report 2012/13 TPM Software within Good Spirit School Division.
Issues in RDM This work is licensed under a Creative Commons Attribution 4.0 International LicenseCreative Commons Attribution 4.0 International License.
High throughput biology data management and data intensive computing drivers George Michaels.
Introduction to Research Data Management Joy Davidson and Sarah Jones Digital Curation Centre
Data Coordinating Center University of Washington Department of Biostatistics Elizabeth Brown, ScD Siiri Bennett, MD.
Beyond the Repository: Research Systems, REF & New Opportunities William J Nixon Digital Library Development Manager.
National Bowel Cancer Audit
Epidemiology and Genomics Research Program
Starting from the end: what to do when restricted data is released
Data stewardship life cycle
Research Data Management
Using a CRIS to support communication of research: mapping the publication cycle to deposit workflows for data and publications Federica Fina, Data Scientist,
Interlinking standards, repositories and policies
Discovery of EDMI compliant data resources and metadata catalogues
Incorporating Scientific Practices into the BBNJ ILBI
Presentation transcript:

The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The Bioinformatics Support Service - Dept. Life Sciences Contact:

Background #1  Tissue bank infrastructure (ICHTB) allows staff to collect diverse biological material from patients treated within Imperial NHS Trust  Also hosts a large number of epidemiological cohort studies that are of international importance e.g. Chernobyl Tissue Bank  Contains over 58,000 samples from over 15,000 different patients  To date 23,000 samples issued to researchers  Used in 433 different research studies

Background #2  All studies have appropriate ethics approvals and where appropriate, Caldicott Guardian agreement  Information held about the donors, who have to give consent for their tissue to be held, is anonymized at source in accordance with the Human Tissue act  ICHTB is the first UK tissue bank to be granted approval to link its data on specimens with data from the English National Cancer Registry – patient outcomes  Role-based access to underlying database with secure web-based interfaces

Existing Workflow  Tissue bank staff record information on specimens, operations and donors  Separate interface allows researchers to search database for biological samples useful for their research  Request made to use specific samples in a research study (funded by diverse means incl. RCUK, charities) Once approved, samples issued for research  Data are generated from the samples in a variety of ways – and are subject to funders’ data sharing policies  Data are also of use to future research particularly since samples are irreplaceable

Objectives  Extend the tissue bank infrastructure to offer a searchable data catalogue for research data arising from tissue bank samples  Data repository for key datasets not already submitted to public repositories, also derived/analysed data formats of particular interest  Tie to funding information – grant codes, project title  Bring together stable accession numbers for data stored in public repositories, access to ‘locally stored dark data’, publications, summaries, SOPs  Deposition to data catalogue becomes requirement of accessing any tissue samples

 Exemplar exists in the smaller Chernobyl Tissue Bank  Some of this software infrastructure can be repurposed  Uses community metadata standards where they exist (see  Also link to associated publications  We maintain a number of existing specialised data repositories e.g. OMERO for imaging that could be linked  We frequently work with the common public data repositories and are very familiar with their requirements/formats/metadata

Challenges  MULTIPLE - sample types, study types per sample, data types per study  MULTIPLE - bio-data file formats, metadata standards, public repositories  data in public repositories may itself be held behind ethics/privacy panel  SO – this prototype will provide specialised data upload templates for key data types initially

Many Data Areas Genome sequencing imaging RNA profiles Protein profiles Metabolic profiles Protein interaction studies Large-scale field studies Improved understanding of complex biological system GWAS Challenges in primary analyses (smaller) AND in meaningful integration (huge)

Bio-Data Standards  30+ minimum reporting guidelines for diverse areas of biological and biomedical data  Few cross experimental types – confusion, fragmentation  Differing levels of use and maturity  ‘Minimum’ can still be huge – ‘just enough’ movement  Multiple standard formats for reporting e.g. MAGE-ML  Not always easy to find associated tools to help use

Data Formats Even for one experimental type, many file formats may be human readable, require require specific software, proprietary or open source….. and Excel spreadsheets

Public Repositories  NAR online Molecular Biology Database Collection currently 1552 databases  Limited by data domain or origin or both  One project may require data submission to >1  May cross-reference data-sets across databases  Each has its own format and metadata requirements  Some are manually curated, many are not  Data submission may be a requirement for journal publication

Example -  since  Genes, genomes (assembled sequences), raw DNA sequence, annotations  3 reporting standards of its own, 5 community-based minimum reporting standards  Has own XML-based submission system  Large datasets can take weeks to prepare/validate and generate 100’s of thousands of lines of XML, TB  Stable accession numbers and versioning  Can protect submissions behind embargo and/or ethics panels (EGPA) (Example)

Specialised Local Repositories Chernobyl Tissue Bank IC Tissue Bank MRIdb OMERO