Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

Similar presentations


Presentation on theme: "11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &"— Presentation transcript:

1 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey & Simon Coles School of Chemistry University of Southampton

2 AHM2006Data Curation Workshop 2 The Comb e Chem Project  End to End linking of data and information  Laboratory to publication and back again  Very long data chains can be involved e.g. from a chemistry lab to mouse genetic expression  The exponential world of combinatorial synthesis and high throughput analysis meets the exponentially growing power of computing  “Automation, Semantics & the Grid”  End to End linking of data and information  Laboratory to publication and back again  Very long data chains can be involved e.g. from a chemistry lab to mouse genetic expression  The exponential world of combinatorial synthesis and high throughput analysis meets the exponentially growing power of computing  “Automation, Semantics & the Grid”

3 AHM2006Data Curation Workshop 3 Plan & COSHH Digital Model Information Integration Report Knowledge Goal Literature Synthesis not just one laboratory but many co-laboratories working together Analysis Smart Laboratory Smart StorageSmart Dissemination Smart HCI

4 AHM2006Data Curation Workshop 4 Problems with ‘Small Laboratory’ Working Practice “Data from experiments conducted as recently as six months ago might be suddenly deemed important, but those researchers may never find those numbers – or if they did might not know what those numbers meant” “Lost in some research assistant’s computer, the data are often irretrievable or an undecipherable string of digits” “To vet experiments, correct errors, or find new breakthroughs, scientists desperately need better ways to store and retrieve research data” “Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.” ‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education (23/06/2006)

5 AHM2006Data Curation Workshop 5 The concept of Publication@Source  Trace all the way back from publication to the original data – provenance  The data is the key - DataGrid  Start as you mean to go on – ELNs are a necessity  Curation of subsequently produced data  Trace all the way back from publication to the original data – provenance  The data is the key - DataGrid  Start as you mean to go on – ELNs are a necessity  Curation of subsequently produced data

6 AHM2006Data Curation Workshop 6 Observations are never collected on note pads, filter paper or other temporary paper for later transfer into a notebook If you are caught using the “scrap of paper” technique, your improperly recorded data may be confiscated by your TA

7 AHM2006Data Curation Workshop 7 Lab books are a big block to publication@source: if it’s not digital, it is more difficult to share Need a usable digital lab book. Design by analogy to help Chemists and Computer Scientists work together. Only some equipment is networked This is where it all starts: The Lab & The Lab Book

8 AHM2006Data Curation Workshop 8 COSHH leverage off things we already have to do

9 AHM2006Data Curation Workshop 9 PLAN Process Record

10 AHM2006Data Curation Workshop 10

11 AHM2006Data Curation Workshop 11 getRecord() There is a potential containment problem in pulling back partial RDF graphs from the triple store. Solved by using multiple triple stores but boundaries are a major issue for the future.

12 AHM2006Data Curation Workshop 12 Architecture SURIG Data stores Semantic Data Other services Weights & Measures Bench Planner0 Viewer0 PHP Java “Client” Libraries SOAP Jena SURIG Applications Institutional archives and metadata publication

13 AHM2006Data Curation Workshop 13 The Analytical Laboratory  Capture information from places you would not want to put your eyes  Capture environmental data automatically  Capture people and movements  Provide this information in real time as well as for the laboratory record  Capture information from places you would not want to put your eyes  Capture environmental data automatically  Capture people and movements  Provide this information in real time as well as for the laboratory record

14 AHM2006Data Curation Workshop 14 Data Source Archive Client Web Client Mobile phone Data Source PDA Message Broker Translator Service Pub-Sub systems provide the flexible & extensible approach to distribution BLOG

15 AHM2006Data Curation Workshop 15 Temperature – room, laser Door & interlock, Motion Sensors Air Conditioning failed

16 AHM2006Data Curation Workshop 16 Databases - Our experience  What do you do when the actual users keep changing their mind?  Is a traditional relational database suitable?  Danger of re-enforcing scientific bias against relational database for laboratory data.  RDF & Triple stores were again the solution  What do you do when the actual users keep changing their mind?  Is a traditional relational database suitable?  Danger of re-enforcing scientific bias against relational database for laboratory data.  RDF & Triple stores were again the solution

17 AHM2006Data Curation Workshop 17 RDF/RDFS High level Schema for chemical properties

18 AHM2006Data Curation Workshop 18

19 AHM2006Data Curation Workshop 19 Triple Stores - The Heart of the Semantic Web Scaling - 3Store response Memory leak in testing program!

20 AHM2006Data Curation Workshop 20 Scaling the triplestores Moved from…  A model of harvesting data from multiple sources into one scalable store to  A model of distributed RDF sources and caching what is needed for the task at hand into multiple stores fit-for-purpose Moved from…  A model of harvesting data from multiple sources into one scalable store to  A model of distributed RDF sources and caching what is needed for the task at hand into multiple stores fit-for-purpose The Semantic Web!

21 AHM2006Data Curation Workshop 21 Experiments on the Grid: The NCS Service HTTPS

22 AHM2006Data Curation Workshop 22 Binary raw data archived in Atlas Datastore x300 ADS £’s

23 AHM2006Data Curation Workshop 23 A Data-Rich Subject – the Crystallography Problem 30,000,000 1.5,000,000 450,000

24 AHM2006Data Curation Workshop 24 The eCrystals Digital Repository http://ecrystals.chem.soton.ac.uk

25 AHM2006Data Curation Workshop 25 Access to the underlying data

26 AHM2006Data Curation Workshop 26 Aggregator services Institutional data repositories Validation Deposit Publishers: peer- review journals, conference proceedings, etc Publication Validation Data analysis, transformation, mining, modelling Search, harvest Presentation services / portals Data discovery, linking, citation Laboratory repository Deposit The eCrystals ‘Global’ Model Preservation and curation

27 AHM2006Data Curation Workshop 27 Laboratory Repositories and Information Management

28 AHM2006Data Curation Workshop 28 Need for a data archive in the laboratory Not just the published spectra!

29 AHM2006Data Curation Workshop 29 Deposit The R4L Repository Search / Browse Create new compoundAdd experiment data and metadata

30 AHM2006Data Curation Workshop 30 Several groups making and analysing; the library Administrative Domains transfer or share the data Researcher National Archive Research Group Institution International Database Research Group

31 AHM2006Data Curation Workshop 31 SVG “active” graphics Link to data, follow links back to the raw data archive Link to simulation, full simulation data archived in BioSimGrid R4L Paper organized using RDF

32 AHM2006Data Curation Workshop 32 Summary:  Making sure other people can find, understand and re-use your data easily and with confidence (even when there is a huge amount of it!)  Make use of Plans to inform the digital context - metadata in advance  Have concern for the “End-to-End life cycle” of chemistry information from the start.  Understanding Usability and Human Computer Interaction is vital for adoption  Making sure other people can find, understand and re-use your data easily and with confidence (even when there is a huge amount of it!)  Make use of Plans to inform the digital context - metadata in advance  Have concern for the “End-to-End life cycle” of chemistry information from the start.  Understanding Usability and Human Computer Interaction is vital for adoption


Download ppt "11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &"

Similar presentations


Ads by Google