Download presentation
Presentation is loading. Please wait.
Published byCatherine Peters Modified over 6 years ago
1
Graeme Winter STFC Computational Science & Engineering
Data Analysis Systems Graeme Winter STFC Computational Science & Engineering
2
Caveat Methods developer for macromolecular crystallography (MX)
Synchrotron as a user facility Broad view
3
Online Proposal System
Context Online Proposal System User Office System incl.: User Database Scheduling Health and Safety Proposal Management Single Sign On Account Creation and Management Diagnostics Metadata Catalogue Data Acquisition System Data Analysis & Feedback DataPortal Storage Management System Data Pre-Reduction
4
Overview Data rates Users & expectations
From the outside: automation in MX Illustrative experiments Analysis methods & hardware Conclusions
5
Data Rates: XFEL guess 4TB = ~ 7 minutes (proto 1k)
= ~ 30 seconds (full 4k) @ 5120 frames / second
6
Aside: Problem in MX DAQ rates ~ 20% sustained @ 70MB/s
Typical data set ~ GB Data reduction memory / disk bandwidth limited Computer architecture (e.g. clusters / NFS) for data reduction... is no good!
7
User Questions Who are the users? What do they expect?
8
Expectations XFEL = Tool = Measurements or
XFEL = Experiment = Discoveries Latter is fine (somebody else’s problem) former presents us problems
9
Why compare to MX? Area detectors Users awkward!
Real time analysis quite advanced Frequently non-technical users
10
Beamline End-station GUI not script / CLI
11
Data Handling Detector systems: Distortion correction
Background subtraction Compression Add metadata
12
Automation in MX Measurements not experiment, however:
Measure carefully Sample lifetime unknown Sample generally uncharacterised
13
Automated Data Collection
Experimental systems exist: Characterise sample Measure data Perform initial data reduction
14
Automated Data Reduction
Expert systems exist to: Thoroughly reduce data Provide information for downstream analysis – spacegroup, resolution limits
15
MX: Summary Easy cases – automatic sample to structure
Hard cases – still hard but automation really helps Hardware & software is mature – suitable for biologist
16
Assumptions for discussion
Area detector will be tiled array Computing budget will be limited One objective is to image biological structures
17
Illustrative XFEL Experiments
Consider two kinds of “experiment”: Single molecule imaging Single-shot crystallography* / nano-MX *One shot per crystal, many tiny crystals
18
Assumptions: Single Molecule
Flow of molecules past the beam – by “magic” (SEP) Some pulses will hit molecule, some will not Will need ~ 109 images for useful reconstruction (Shneerson, 2008)
19
Outcome: Single Molecule
Emphasis will be on images – no time-dependence Filtering will be critical Reforming the images at the earliest possible time will be important Analysis build on existing reconstruction techniques
20
Analysis Filter / correct tiles Reconstruct images
Compute orientations Accumulate Assume perhaps 1 sample / few pulses
21
Assumptions: Nano-MX 1 sample / pulse train so 10 Hz rate so we have time series Sample will survive > 1 pulse ~ 104 samples per data set
22
Outcome: nMX Emphasis pixel time series Extrapolate to dose = 0
Later reconstruct and analyse data Will build on existing MX methods – including expert systems & CCP4 Not what XFEL was designed for but…
23
Analysis Filter / correct Construct d = 0 image with σ estimate
Index, cluster (different crystal forms) Integrate, rescale, cluster again Accumulate h, k, l, i, σ (i) Estimate remaining measurement time
24
Analysis Methods Filtering & correction Compression Analysis Feedback
Simpler Filtering & correction Compression Analysis Feedback More valuable
25
Filtering / correction
Identify “bad pixels” Remove “blank” images Deconvolve time structure of pulses / measurement effects Include corrections for scattering angle Incorporate metadata
26
Compression If SMI, much of the image will have zero / low counts
If nMX, make use of the fact that image j and j + 1 will be similar MX detectors already do this – CBF
27
Analysis SMI: how close to objective are we – how much can we do with remaining time? nMX: how much data are accumulated? What is the quality like?
28
Feedback SMI: Change sample rate / pulse structure – improve data
nMX: Improve data collection process / stop when experiment complete
29
Challenges Synchronisation with beam parameters
Image reconstruction from tiles Deconvolution of time structure Getting dynamic feedback Robustness
30
Computing Hardware Hardware at least as important as software
Detector is tiled (parallel) so emphasise parallel computing as far as possible
31
Factors Memory bandwidth – filtering, image transpose
Floating point horsepower – deconvolution, FFT Thanks to games…
32
Benefits Custom hardware (e.g. Cell, GPGPU) allows much more FP horsepower than vanilla cpu Memory can be managed properly Libraries available for e.g. FFT Analysis can be timed to fit
33
Costs Memory must be managed properly
Novel architecture (need to recompile) Hardware costs (complicated) Interfacing (I have no idea)
34
Conclusions XFEL DAQ with area detectors presents a challenge (evidently) User facility would require solid, well thought out computing infrastructure Automation & real-time feedback can help to get maximum value from XFEL
35
Conclusions Science determines computational architecture (e.g. time series vs. image) MX as a technique is mature, so good role model
36
Acknowledgements EU FP7 for supporting pre-XFEL work
EU FP6 BioXHit and UK BBSRC for supporting xia2 and DNA development Scientists & engineers at DL for coffee-time discussions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.