Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit

Slides:



Advertisements
Similar presentations
Repositories, Learned Societies and Research Funders Stephen Pinfield University of Nottingham.
Advertisements

Update on the Implementation of BBSRC Data Sharing Policy David McAllister Research, Innovation & Skills BBSRC Sharing Research Data: Pioneers, Policies.
Data Curation in Crystallography: Publisher Perspectives JISC Data Cluster Consultation Workshop CCLRC, Didcot, Oxon 10 October 2006.
DR. STRANGEBLOG Or, how I learned to stop worrying and love classroom technology.
Open Access – a funder’s perspective Robert Terry Senior Policy Adviser The Wellcome Trust.
Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for.
Making It Happen March 19, 2013 Anita de Waard VP Research Data Collaborations, Elsevier RDS Sustainable Data Preservation and Use.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
An Integration Platform of Social Networking Applications to Support Life Long Learning in Rural Territories: the “SoRuraLL Virtual Learning World” Environment.
Archives and Information Retrieval
GenSpace: Exploring Social Networking Metaphors for Knowledge Sharing and Scientific Collaborative Work Chris Murphy, Swapneel Sheth, Gail Kaiser, Lauren.
1 Mark Gleeson (01) Graduate Students‘ Union Trinity College, Dublin New Frontiers.
Simon Briggs Department of Clinical Pharmacology University of Oxford 13 th June 2008 Data management – A researchers prospective.
Bio/CS 251 Introduction to Bioinformatics. Class Web Site This site will contain all important documents.
Building Community Online. Key Vocab foster: to nurture or encourage.
CHAPTER 2: WEBLOGS PEDAGOGY AND PRACTICE BY ARION LONG & ANGELA ALSTON.
Open Exeter Project Team
Joint EBI-Wellcome Trust Summer School June 2010.
Internet Research Finding Free and Fee-based Obituaries Online.
Moving beyond free text. Authors Scientist does research Scientist publishes research results in journal article Old Paradigm:
Information Services and Systems Digital Tools for Swansea University
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Essential CCHS Computing Information Computer Applications September 2009.
Systems Used for Collaboration When to achieve a common goal, result or work product.
CASIMIR Networking Meeting Heathrow, July 2007 CASIMIR WP4 Data Representation John Hancock Duncan Davidson.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
Support the spread of “good practice” in generating, managing, analysing and communicating spatial information Participatory Internet-based Mapping Basics.
VOA3R Virtual Open Access Agriculture & Aquaculture Repository: sharing scientific and scholarly research related to agriculture, food, and environment.
Web 2.0 and Internet Safety for Educators 3/2/20111Region 1.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
Getting the HOTs with ICTs Kerrie Smith, Schools Information Officer.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
The Read Write Web Chapter One Presentation By Shontae Dandridge October 20, 2011.
The Brain Project – Building Research Background Part of JISC Virtual Research Environments (Phase 3) Programme Based at Coventry University with Leeds.
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
+ => Bioinformatics: from Sequence to Knowledge Outline: Introduction to bioinformatics The TAU Bioinformatics unit Useful bioinformatics issues and databases:
Google Docs & Zoho Nicole Rausch Literacy & Technology.
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
What is Web 2.0? We, the users, are Web 2.0…we create sites that allow people to interact, exchange, and collaborate with each other via the World Wide.
A collaborative tool for sequence annotation. Contact:
It’s the data that makes a paper Joerg Heber Executive Editor Nature Communications.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Science in the Web- enhanced Classroom Wait, isn't this just flipped lessons? Aren't they a lot of work? Why me? Why today?
| | Healthcare Science careers.
Using RMS to comply with Open Access Requirements Betsy Fuller Research Repository Librarian Information Services.
Web 2.0 Tools. Podcasts are an audio broadcast which has been converted to an mp3 file for playback in a digital music player. Podcast can be solicited.
High throughput biology data management and data intensive computing drivers George Michaels.
Research Data Management 26 th April 2016 Federica Fina, Data Scientist, University of St Andrews Library.
Open Access: what you need to know This work is licensed under a Creative Commons Attribution 4.0 International License.This work is licensed under a Creative.
Research and Innovation Support Conference Library Support for Research Dr Stella Butler, University Librarian.
ECS – Storyboarding and Introduction to Web Design
Open access publishing - researcher's perspective
Digital data – integrity and standards
Open Access Scholarly Resources: what’s available & where
Open Exeter Project Team
Biological Databases By: Komal Arora.
Institutional Repository and Friends
Web 2.0 and Internet Safety for Educators
Institutional role in supporting open access, open science, open data
Open Access to your Research Papers and Data
Introduction to Research Data Management
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Incorporating Scientific Practices into the BBNJ ILBI
Presentation transcript:

Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit

Science used to be like this..

But now it’s something like this..

Problem.. Data doesn’t fit in:

So where do we store it instead?

Why is this a problem? Hard drives explode (more often than you think..) How is *your* “My Documents” filing system? –Most of us live in folder chaos! How well does your hard drive integrate with your lab book? –Well, generally not at all… you might be able to match things on dates if you’re lucky! Big data is EXPENSIVE to generate It makes sense to get the most value out of it Your funding bodies know this!

MRC “The MRC expects valuable data arising from MRC-funded research to be made available to the scientific community with as few restrictions as possible. Such data must be shared in a timely and responsible manner.”

BBSRC “BBSRC expects research data generated as a result of BBSRC support to be made available with as few restrictions as possible in a timely and responsible manner to the scientific community for examination and use.” (even more pointedly, they also suggest that IP and commercialisation concerns should NOT preclude you from releasing data in a timely fashion)

Opens up a new problem How do we make sure that we can exchange, and understand the data that we share with other researchers? Standardised formats for reporting certain experimental data types have been developed Although pre-dated by massive open access biological sequence databases – GenBank, DDBJ, EMBL, PDB, UniprotKB etc. these suffer from the fact there are 20+ ‘standards’ for representing DNA or protein sequence data. A new set of data standards has emerged for modern biological data Often called ‘MI’ data standards Capture ‘minimum information’ metadata (data about data) required to comprehend and share scientific data

Particularly for high throughput data All started with MIAME (minimum information about a microarray experiment) Now extends to proteomics, neurophysiology, genome sequences – even gel electrophoresis If you are going to publish a microarray experiment it is very likely that the journal you publish in will MANDATE that the data is annotated to MIAME standards AND deposited in a recognised repository for that data –GEO –ArrayExpress

Why stop at data? Whilst the RCUK’s are moving to policies where data is openly deposited, other scientific information is also being openly released Open Access publication – a new paradigm for journals (they charge no subscription fees) Scientists are beginning to really utilise the internet to share data, ideas, foster collaborations But why? –The realisation that the data in your lab books is ‘tombed’. Unless you’re going to commercialise it, or it’s going to win you a Nobel…. Why not share?

But still people argue about sharing I don’t want to be scooped! My data isn’t very good  I am hoping to commercialise this some day

Open notebook science A new concept being pioneered by some scientists Using ‘Web 2.0’ tools (i.e. user generated content) A combination of –Blogs Even if you’re not sharing data, why not share some ideas? –Wikis Wikis are like lab books on steroids, and you can link them to all kinds of external resources, open them up to the world –Other collaborative tools

Usefulchem

Forging 21 st century collaborations You’re not limited to talking to your peers in the coffee room But where can you get interacting with other scientists?

To sum up Be aware what the expectations are for releasing your data to the public from your funding body The more metadata you capture about your work the easier it will be to comply with data standards regulations later Don’t be afraid to use technology, keeping track of science is hard, and there’s no way to Google a lab book! Engage with online communities – many a collaboration has been formed via a blog post! Want to talk about how best to analyse and store your digital data? Come talk to us!