More speed, more data, more automation, more work? Alun Ashton.

Slides:



Advertisements
Similar presentations
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Advertisements

CCPN project modeling framework University of Cambridge European Bioinformatics Institute MSD group.
Slide: 1 Welcome to the workshop ESRFUP-WP7 User Single Entry Point.
Introduction on WP7/WP9 Dominique PORTE 29/05/2008 Menu What is WP7? What is WP9? Goal of the brainstorming Introduction on WP7/WP9.
Introduction Main technologies: core written in Java embedded Jython interpreter code managed using the Eclipse plugin framework client program uses Eclipse.
PiMS, xtalPiMS and beyond: proteins, crystals and data Chris Morris STFC Daresbury Laboratory… …and the PIMS development team CCP4 Study Weekend, Nottingham,
PiMS overview: version 0.3 & beyond Robert Esnouf, PiMS Project Sponsor, Oxford.
Wayne Lewis Australian Synchrotron Beamline Controls Design and Implementation.
Automated collection and processing of macromolecular diffraction data with DNA Project started in 2001 following an ESRF user meeting Currently involves.
ICAT Integration at DLS. Alun Ashton. What were the requirements? Integrate with current business system Collect Data and Metadata relating to a proposal.
The MEMOPS Programming Framework Wayne Boucher, Cambridge
Experimental Facilities Division ANL-ORNL SNS Experimental Data Standards (Status) Richard Riedel SNS Data Acquisition Group Leader.
PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?
Data Analysis I19 Upgrade Workshop 11 Feb Overview Short history of automated processing for Diamond MX beamlines Effects of adding Pilatus detectors.
26-28 th April 2004BioXHIT Kick-off Meeting: WP 5.2Slide 1 WorkPackage 5.2: Implementation of Data management and Project Tracking in Structure Solution.
PIMS: The Problems of Project Management Robert Esnouf, Scientific Sponsor for PIMS OPPF/STRUBI, University of Oxford strubi.ox.ac.uk.
Slide 1 Copyright © 2003 Encapsule Systems, Inc. Hyperworx Platform Brief Modeling and deploying component software services with the Hyperworx™ platform.
Peter J. Briggs, Liz Potterton *, Pryank Patel, Alun Ashton, Charles Ballard, Martyn Winn CLRC Daresbury Laboratory, Warrington, Cheshire WA4 4AD, UK *
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
Integrated e-Infrastructure for Scientific Facilities Kerstin Kleese van Dam STFC- e-Science Centre Daresbury Laboratory
Authors Project Database Handler The project database handler dbCCP4i is a small server program that handles interactions between the job database and.
Materials Science and Protein Crystallography Using the MX Beamline Control Toolkit William M. Lavender
1 st -4 th December st BioXHIT Annual Meeting WorkPackage 5.2: Implementation of Data management and Project Tracking in Structure Solution Peter.
interested in how Diamond is planning to integrate the use of imgCIF into the offered Data Processing/Storing Services: which format the users can get.
PiMS at the OPPF Jon Diprose EMBO Course EBI, 23/09/2008.
Mantid Development introduction Nick Draper 11/04/2008.
Diamond. Status 3 beamlines with users, 1 starting commissioning. datasets > 20 images – total –2008 – 521 (first 3 months despite problems!)
17 th October 2005CCP4 Database Meeting (York) CCP4(i)/BIOXHIT Database Project: Scope, Aims, Plans, Status and all that jazz Peter Briggs, Wanjuan Yang.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Building the e-Minerals Minigrid Rik Tyer, Lisa Blanshard, Kerstin Kleese (Data Management Group) Rob Allan, Andrew Richards (Grid Technology Group)
Project Database Handler The Project Database Handler dbCCP4i is a brokering application that mediates interactions between the project database and an.
E-HTPX: A User Perspective Robert Esnouf, University of Oxford.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
17 th October 2005CCP4 Database Meeting York University Database Requirements for CCP4 Projects Monday 17 th October 2005 Abstract Gather information on.
Project Database Handler The Project Database Handler is a brokering application, which will mediate interactions between the project database and other.
Mantid Stakeholder Review Nick Draper 01/11/2007.
Simplified Experiment Submit Proposal Results Excited Users Do Expt Data Analysis Feedback.
A Remote Collaboration Environment for Protein Crystallography HEPiX-HEPNT Conference, 8 Oct 1999 Nicholas Sauter, Stanford Synchrotron Radiation Laboratory.
Computing at SSRL: Experimental User Support Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Managing crystallization experiments within PIMS.
Diamond update and kappa activities village structure 19 operational beamlines 5 operational MX beamlines one MX beamline (I23) under construction mini.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
Data Management and Software Centre Mark Hagen Head of DMSC IKON7, September 15th 2014.
AUTOMATION OF MACROMOLECULAR DATA COLLECTION - INTEGRATION OF DATA COLLECTION AND DATA PROCESSING Harold R. Powell 1, Graeme Winter 1, Andrew G.W. Leslie.
Peter J. Briggs, Alun Ashton, Charles Ballard, Martyn Winn and Pryank Patel CCLRC Daresbury Laboratory, Warrington, Cheshire WA4 4AD, UK The CCP4 project.
1 10 th February 2016 JRA2 SOLEIL MOTIVATIONS FOR ITS PARTICIPATION IN JRA Alain BUTEAU : Software for Controls and Data Acquisition group leader.
ISPyB for MX at Diamond Pierre Aller. -Before beamtime Shipping preparation Sample registration -During beamtime Beamline status (remote) Puck allocation.
Dr Andrew Peter Hammersley ESRF ESRF MX COMPUTIONAL AND NETWORK CAPABILITIES (Note: This is not my field, so detailed questions will have to be relayed.
Project Database Handler The Project Database Handler is a brokering application which will mediate interactions between the project database and other.
Advanced Higher Computing Science
Solange Delageniere ESRF/TID/MIS
BIOXHIT Working Group 1 Co-ordinator / Developer
Pierre Aller ISPyB for MX at Diamond.
Solange Delageniere ESRF/TID/MIS
ISPyB December 4th, 2013 From sample to data analysis: how to track every step of an experiment in the ISPyB database. Marjolaine Bodin, ESRF/EXP/Structural.
Simulation Production System
Grid Portal Services IeSE (the Integrated e-Science Environment)
Database Requirements for CCP4 17th October 2005
Graeme Winter STFC Computational Science & Engineering
PiMS, xtalPiMS and beyond: proteins, crystals and data Chris Morris STFC Daresbury Laboratory… …and the PIMS development team CCP4 Study Weekend, Nottingham,
Towards standard APIs for the exchange of metadata between
Experimental Definition in SynchWeb for XPDF
Autoprocessing updates at the MX beamlines
Current CRIMS – ISPyB communication status
Project tracking system for the structure solution software pipeline
Automation from a user perspective
eHTPX crystallization, shipping and future
CCLRC Daresbury Laboratory
Presentation transcript:

More speed, more data, more automation, more work? Alun Ashton

Thanks to organisers.

1.75+ million man-hours 2,100 tons of steel 35,000 m 3 of concrete 33,000 m 2 of roofing Joint venture company between CCLRC (86%) and Wellcome Trust (14%) Electron Beam Energy3 GeV Circumference561.6 m Diameter of outer wall235 m Beam current300 mA(500 mA) Start March 2003: Users January 2007 Diamond Light Source

Beamlines

Computing at Diamond. Data Acquisition and Scientific Computing Controls IT support External groups

Scientific Computing Data Analysis Data Visualisation eScience Data Curation Data Acquisition Automation Simulation And Theory

Macromolecular Crystallography computing at Diamond Phase I (2007) 3 MX (0.5 – 2.5 Å optimised for 0.98Å) with double crystal monochromator, Kirkpatrick Baez horizontal and vertical focusing mirrors; Focal spot size ~ 94  m (h) x 17  m (v) (FWHM); estimated flux at 12.6 keV 3.5 x ph/s; fully automated sample handler; cryo cooling; CCD detector. One station will have containment three facility for pathogenic samples Phase II Microfocus beam line Fixed wavelength side station (0.96 Å) (MR & ligand binding studies) Long wavelength side station for Sulphur anomalous (1.5 – 2.5 Å)

More speed

MX computing at diamond on the beamline On each of the 3 Beamlines 2 CPU server for Data Acquisition 2 CPU server for Data Analysis 20Tb (RAW) beamline storage 1 read and 1 write server (Approx 1 month data storage) 4 Beamline user workstations per beamline: 3 RedHat Linux, (2 with dual monitors) 1 windows XP 1 in hutch computer similar to tablet PC with touch screen. Networking is 1 GBit on beamline and 10 between MX beamlines and MX “near” beamline computers.

MX computing at diamond “near” the beamline 180 Tb (RAW) secondary MX storage (shared between 3 Phase 1 beamlines, approx 3 months data storage) Administered by 8 servers 24 dual dual (2x2) core CPU Cluster (50% infiniband fast interconnects Running Sun Grid Engine queuing system) Local user backup via USB and Firewire drives (small scale CD and DVD writing facilities available) CCLRC Atlas Data Store – Petabyte data storage Long term data storage and backup:

Near Beamline computing Crunchie the cluster

Where does everything fit? Synchrotro n Crystallization PIMS (Protein Production) Data Processing & Structure Solution Pipelines CollectionDB e-HTPX

More data

PiMS Thanks to Chris Morris and PiMS developers

General Introduction

Why is Data Modelling Important? ■A Data Model is a plan for building a database ■detailed enough to be used to create the physical structure ■simple enough to communicate to the end user the data structure ■The Unified Modelling Language (UML)

Database ■Record keeping is an important aspect of most business today ■A stable and clean repository of data ■Constraints to enforce data integrity ■Open interface ■Allow users to access, search and retrieve data easily ■Multiple concurrent access ■Extensible ■New data added ■Maintainable ■Database provides maintenance tools, plus industry standards to ensure long-term compatibility ■Robust ■“industrial strength”

Scientific goals ■Recording laboratory information ■A lot of data keeping ■10,000s of experiments ■1,000,000s of samples ■Data interchange and interoperation ■Collaboration in protein production ■Share data between stages and sites ■Data transfer to beamline or NMR ops ■Data mining and reporting ■Analysis ■Negative results can be mined to improve methods ■Scientific publications ■Data deposition ■All made feasible by data model ■… plus common understanding of it

Acknowledgements ■PiMS developers ■Chris Morris (CCP4) ■Ed Daniel (Daresbury) ■Peter Troshin (MPSI) ■Bill Lin (CCP4) ■Jo van Niekerk (SSPF) ■Susy Griffiths (YSBL) ■Jon Diprose (OPPF) ■Marc Savitsky (OPPF) ■Anne Pajon (EBI) ■Crystallization developers ■Ian Berry (OPPF) ■Gael Seroul (EMBL- Grenoble) ■Diederick de Vries (NKI-Amsterdam) ■Sabrina Haquin (Paris) ■CCPN developers ■Wayne Boucher ■Rasmus Fogh ■Tim Stevens ■Wim Vranken

What does ‘PiMS’ mean for diamond and diamond users?

Synchrotron data

Image format

Images off the beamlines ADSC Q315 –ADSC image size – 20-80Mb –ADSC image rate - <>60Mb/second ImgCIF/CBF –30% size of ADSC uncompressed images NeXus

imgCIF/CBF ADSC header HEADER_BYTES= 512; DIM=2; BYTE_ORDER=little_endian; TYPE=unsigned_short; PIXEL_SIZE=0.1026; BIN=2x2; ADC=fast; DETECTOR_SN=922; DATE=Fri Sep 15 10:07: ; TIME=1.00; DISTANCE= ; OSC_RANGE=1.000; PHI=0.000; OSC_START=0.000; TWOTHETA=0.000; AXIS=phi; WAVELENGTH=1.0000; BEAM_CENTER_X=10.000; BEAM_CENTER_Y=20.000; CREV=1; CCD=TH7899; BIN_TYPE=HW; ACC_TIME=1781; UNIF_PED=1500; IMAGE_PEDESTAL=40; SIZE1=3072; SIZE2=3072;

Synchrotron and Beamline Beam conditions: ring energy and current Beam size Attenuation If available, estimate of photon flux coming out of the collimator. Backstop type, size and position wrt sample Date and time Detector type and serial number Goniostat (manufacturer and model) Method of sample mounting (by hand, arcs/tongs or by robotics (type)) Temperature of sample Sample code (barcode ?) Text field to allow any special comments relevant to this experiment to be stored. eg If crystal has been annealed, and if so, what the conditions were. Has the crystal been cryocooled in a capillary etc

Record the mode the synchrotron is running in. Attenuation - this should be a calculated factor Photon flux + error. Maybe an intensity reading A record of an experiment number, this would give us the link back to everything else e.g. user etc. An image of the crystal, with the cross hairs marking the beam and beam size? Beam size at sample and beam size on detector.

NeXus All diamond data collection runs will produce NeXus files NeXus will serve as a longer term data storage format.

More automation

Joint collaboration between Daresbury SRD and Diamond. GDA sits ‘above’ EPICS which wich does the majority of low level/component/compound motion control. Generic Data Acquisition (GDA)

Design considerations A single software framework which can be applied to all beamlines Must be flexible \ adaptable – “plug and play” –must work with both EPICS and non-EPICS hardware –highly configurable system: different GUIs and hardware on different beamlines, but all work within the same overall architecture Similar look and feel across all beamlines –users can visit different beamlines without learning new software every time A single window to operate the beamline Framework defines more than just code: includes programming methodologies, coding conventions etc. Result is a system which is simpler and easier to maintain

Experiment automation automateD collectioN of datA – DNA –Automated strategy calculation using BEST –Multi crystal ranking and data collection –Automated autoindex with Mosflm –Automated integration with Mosflm –Quick Scaling results for data quality –Basic radiation damage consideration –Data reading and writing into beamline database –MiniKappa incorporation with STAC

DNA Acknowledgements –Cambridge -MRC –Diamond –EMBL Grenoble –EMBL Hamburg –ESRF –GlobalPhasing –Soleil –SRD Daresbury –Brookhaven –Users DNA 2.0…..

ISPyB Management of experimental data produced in protein crystallography Management of experiment related information (shipping of samples, beam time allocation, safety information…) Tracking your progress through the experimental process: –Retrieves information from DataCollection automatically –Stores both Beamline and Experimental information –Allows disparate groups to monitor projects –Communicates with other systems (Sample Changer, DNA, …) –Portable Interface (using PDA + wireless DataMatrix reader) to track Samples –User friendly web interface –Custom interface and access restricted based on privileges –Generates report

23/11/ ISPyB: Webservice or web based user interface … Webservices available for: Crystal details Shipment Diffraction and Screening plan Diffraction results

23/11/ Solange Delageniere Ricardo Leal Darren Spruce Dominique Porte & MIS Group Lilian Cardonne Matias Guijarro Olof Svensson Jose Gabadinho Collaboration to develop joint system ISPyB & associated BM14 eHTPX eHTPX members and associated collaborations Ludovic Launer Martin Walsh Hugo Caserotto Max Nanao Jean_Baptiste Reiser Hassan Belrhali Laurent Geoffroy (Maatel) David Stuart, Robert Esnouf Oxford, Colin Nave, Rob Allan, Martyn Winn, Daresbury, Kim Henrick EBI, Kevin Cowtan York, Martin Walsh Grenoble DEVELOPERS: Chris Mayo, Ian Berry (Oxford) Graeme Winter, Ronan Keegan, David Meredith (Daresbury) Joel Fillon (EBI), Paul Young (York), Ludovic Launer (Grenoble) Florent Cipriani Franck Felisaz Jean-Sebastien Aksoy Bernard Lavault Arnaud Clere Julien Huet S. Cusack

Where does everything fit? Synchrotro n Crystallization PIMS (Protein Production) Data Processing & Structure Solution Pipelines CollectionDB e-HTPX

Remote data collection Remote data monitoring –ISPyB Remote experiment monitoring –ISPyB Remote experiment control –GDA –VNC eInfrastructure!

10 second pause

How do MX ‘legacy’ projects bespoke solutions fit into a bigger picture? More work!

e-Science Infrastructure for Diamond Light Source

Phase 1 Single Sign On Automatic cataloguing of data and metadata relating to a scientific experiment. Backup all Diamond’s data to the Atlas Data Centre for long term storage. Be able to view and retrieve your data. Works in conjunction with Diamonds current computing infrastructure. Backbone for further e-Science work

Single Sign On

GDA DDH StorageD Data / metadata Nexus File & Data DUO DUO Desk IKitten DLS ICAT SRB People DB Active Directory Diamond, CICT Modified by e-Science DataPortal Diamond Proposal Web pages Atlas Data Store

GDA DDH StorageD Data / metadata Nexus File & Data DUO DUO Desk IKitten DLS ICAT SRB People DB Active Directory Diamond, CICT Modified by e-Science DataPortal Diamond Proposal Web pages Atlas Data Store

SRB in practice

What Next? Work towards live collection of data on Beamlines. Gain operational experience. Have a consultation period with scientist to get feedback on the work and input into what metadata to collect. Work closer with science community to understand what metadata best describes the experiments. Add analytical framework.

What's really next? More work! Plenty of software to demonstrate

Acknowledgements