Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rafael Jimenez ELIXIR CTO BioMedBridges Life science requirements from e-infrastructure: initial results from a joint BioMedBridges workshop Stephanie.

Similar presentations


Presentation on theme: "Rafael Jimenez ELIXIR CTO BioMedBridges Life science requirements from e-infrastructure: initial results from a joint BioMedBridges workshop Stephanie."— Presentation transcript:

1 Rafael Jimenez ELIXIR CTO BioMedBridges Life science requirements from e-infrastructure: initial results from a joint BioMedBridges workshop Stephanie Suhr BioMedBridges Project Manager

2 BioMedBridges Biomedical sciences research infrastructures stronger through common links  FP7-funded cluster project  21 project partners in 9 countries  Computational ‘data and service’ bridges between the BMS RIs  Interoperability between data and services in the biological, medical, translational and clinical domains 2

3 Second BioMedBridges AGM e-infrastructure advisory board meeting with BMS RI technical representatives 10-11 March 2014 Florence, Italy

4 BioMedBridges workshop E-Infrastructure support for the life sciences: Preparing for the data deluge 15 May, 2014 Genome Campus, Hinxton, UK

5 Knowledge exchange workshop  Discussion of big data challenges in life sciences  Focus on few representative domains  Looking 5 years ahead  Jointly identify potential solutions to our problems Data ICT e-infrastructures LS life sciences Physical facilities Scientific information Transfer Computation Storage

6 How does it affect data sharing in life sciences?

7 Large-scale data sharing in the life sciences http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552

8 How does big data affect data sharing? http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552 Compute Storage Compute Transfer Transfer Storage What How Where

9

10 Growing data

11 Cost of DNA sequencing 11

12 Data generation vs. data transfer 12 ~100 GB ~4 TB 24 hours 1 Gb100 Mb 10 Mb ~30 min ~9 hour ~5 hours ~4 days ~2 days ~5 weeks DNA sequencing Mass spectrometry Microscopy Network File Transfer

13 Potential Bottlenecks in Life Sciences  Data production grows faster than storage  Cost of data production technologies declines faster than storage  It takes longer to transfer data than produce the data.

14 Data growth how to reduce the IT budget shortfall? http://www.eweek.com/

15 Data growth how to reduce the IT budget shortfall? http://www.eweek.com/ Optimization Using technology more effectively Selecting relevant data

16 Potential solutions  Storage  Data compression  Select what we store  Evaluate data reproducibility & value of data  Network  Faster protocols  Partitioning  Network upgrade  Computation  Clouds  Data close to computation

17 Data compression  Efficient representation  Capacity for controlled data reduction  Efficient transformations  Tool chain Precision Compression CRAM  Fritz, M.H. Leinonen, R., et al. (2011) Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. 21 (5), 734-40  Cochrane G., Cook C.E. and Birney E. (2012) The future of DNA sequence archiving. GigaScience 2012, 1:2  http://www.ebi.ac.uk/ena/about/cram_toolkit http://www.ebi.ac.uk/ena/about/cram_toolkit

18 Data transfer optimization  e.g. Getting more from available bandwidth

19 Data partitioning  Organisation of data around biological concepts  Indexing system around these concepts  Support for requests for partitions along this index Reference-oriented indexing

20 What data is relevant?

21 21 Life sciences diversity Genomes Nucleotides Transcripts Proteins Complexes Pathways Small molecules Structures Domains Cells Biobanks Tissues and organs Human populations Therapies Disease prevention Early Diagnosis Human individuals

22 Life sciences diversity Different communities Some similar requirements Not always the same solutions Proteomics Metabolomics Clinical dataGenomics Imaging

23 Factors that can influence data availability  scientific (e.g. data reproducibility, uniqueness, value of processed and/or raw data)  financial (cost of data storage, transfer, reproduction)  technical (storage, network, computation…)  political (drivers e.g. from funding bodies/large organisations/national interests)  social (data sharing mentality of the community in question)  legal/ethical/formal (requirements/constraints for data storage/transfer/access - e.g. need to store data on German citizens in Germany; requirements from journal publishers, data management plans, etc.)

24 Some conclusions  Opportunity for e-infrastructures to better understand BMS RI problems.  Identification of bottlenecks  Discussion of some potential solutions  Data growth will change how we do things today  Different communities -> different models -> some common solutions  Solutions have to come from use cases  BMS RI need to be better defining requirements  We need to use technology more efficiently  BMS community has to evaluate the practicality of storing everything  Privacy issues makes big data more challenging  Difficult to separate big data from computation  Shortage of expertise of how to deal with scientific data and IT services

25 Thank you! Questions? Special thanks to … Stephanie Suhr BioMedBridges Tom Hancocks EMBL-EBI Cath Brooksbank EMBL-EBI

26 Data deposition

27 Data submission 27 Submissions raw data processed data metadata Centralized database

28 Data sharing The casual approach ‘data on my disk and available to anyone who requests it' Submission to data repositories

29 Data submissions 29 Data repository Journal submission Data repository Journal submission reads Journal requestCurator Data repository Data Management Plan submission Data management +

30 Data sharing The casual approach ‘data on my disk and available to anyone who requests it' Submission to data repositories Will big data affect data deposition?

31 Data submissions How much data? How much available data?


Download ppt "Rafael Jimenez ELIXIR CTO BioMedBridges Life science requirements from e-infrastructure: initial results from a joint BioMedBridges workshop Stephanie."

Similar presentations


Ads by Google