Data Analysis I19 Upgrade Workshop 11 Feb 2014. Overview Short history of automated processing for Diamond MX beamlines Effects of adding Pilatus detectors.

Slides:

Advertisements

Similar presentations

RFQ Tuning and RFQ Control Status

Advertisements

Test-First Programming. The tests should drive you to write the code, the reason you write code is to get a test to succeed, and you should only write.

Strategies for Editing University Writing Center Jaclyn Wells.

Software Engineering Lab Session Session 1 – Introduction to the practicum © Jorge Aranda, 2005.

Sharpdesk Overview Desktop Composer Search Imaging

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Vector-Scanned Microcrystallographic Data Collection Techniques Malcolm Capel NE-CAT Dept. Chemistry & Chemical Biology Cornell University.

Data Analytics and Dynamic Languages Lee E. Edlefsen, Ph.D. VP of Engineering 1.

User Interface Testing. Hall of Fame or Hall of Shame?  java.sun.com.

Integrated Scientific Workflow Management for the Emulab Network Testbed Eric Eide, Leigh Stoller, Tim Stack, Juliana Freire, and Jay Lepreau and Jay Lepreau.

CPSC 231 Sorting Large Files (D.H.)1 LEARNING OBJECTIVES Sorting of large files –merge sort –performance of merge sort –multi-step merge sort.

Operations Management and Technology Ross L. Fink.

Enhancing Paper Mill Performance through Advanced Diagnostics Services Dr. BS Babji Process Automation Division ABB India Ltd, Bangalore IPPTA Zonal Seminar.

Xtreme Programming. Software Life Cycle The activities that take place between the time software program is first conceived and the time it is finally.

Mehnaaz Asad IB2-3. Data logging is the collection of data over a period of time, and is something often used in scientific experiments. Data logging.

CODING Research Data Management. Research Data Management Coding When writing software or analytical code it is important that others and your future.

Challenges choosing storage systems for experimental data Nick Rees.

The Need for Speed. The PDF-4+ database is designed to handle very large amounts of data and provide the user with an ability to perform extensive data.

A Streamlined Approach to Data Management with EQuIS

Fermi Large Area Telescope (LAT) Integration and Test (I&T) Data Experience and Lessons Learned LSST Camera Workshop Brookhaven, March 2012 Tony Johnson.

GIRLS Robotic Camp. Let’s Begin Meet and Greet – Camp leaders introduce themselves – Students introduce themselves.

Designing For Testability. Incorporate design features that facilitate testing Include features to: –Support test automation at all levels (unit, integration,

Pattern Matching in DAME using AURA technology Jim Austin, Robert Davis, Bojian Liang, Andy Pasley University of York.

3. Spot Finding 7(i). 2D Integration 2. Image Handling 7(ii). 3D Integration 4. Indexing 8. Results Gwyndaf Evans 1, Graeme Winter 1, David Waterman 2,

…using Git/Tortoise Git

Chapter 4 Realtime Widely Distributed Instrumention System.

interested in how Diamond is planning to integrate the use of imgCIF into the offered Data Processing/Storing Services: which format the users can get.

Motion Planning in Games Mark Overmars Utrecht University.

Technology & Inquiry Projects: What I’ve Learned from Teaching with Data Jennifer Leimberer Chicago Public Schools: Boone, NKO Charter LETUS Center, Northwestern.

“Live” Tomographic Reconstructions Alun Ashton Mark Basham.

E-HTPX: A User Perspective Robert Esnouf, University of Oxford.

JETT 2005 Session 5: Algorithms, Efficiency, Hashing and Hashtables.

Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Apan.

3. Spot Finding 7(i). 2D Integration 2. Image Handling 7(ii). 3D Integration 4. Indexing 8. Results 1. Introduction5. Refinement Background mask and plane.

High Speed Detectors at Diamond Nick Rees. A few words about HDF5 PSI and Dectris held a workshop in May 2012 which identified issues with HDF5: –HDF5.

Healthcare Quality Improvement Dr. Nishan Sharma University of Calgary, Canada March

Snapshot of DAQ challenges for Diamond Martin Walsh.

Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉教授 : 許毅然作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.

Developments in xia2 Graeme Winter CCP4 Dev Meeting 2008.

ERDDAP The Next Generation of Data Servers Bob Simons DOC / NOAA / NMFS / SWFSC / ERD Monterey, CA Disclaimer: The opinions expressed.

CPSC Why do we need Sorting? 2.Complexities of few sorting algorithms ? 3.2-Way Sort 1.2-way external merge sort 2.Cost associated with external.

Lecture 7 Page 1 CS 111 Summer 2013 Dynamic Domain Allocation A concept covered in a previous lecture We’ll just review it here Domains are regions of.

Accumulator : A robot mechanism designed to pick up a large number of similar objects.

ISPyB for MX at Diamond Pierre Aller. -Before beamtime Shipping preparation Sample registration -During beamtime Beamline status (remote) Puck allocation.

Dr Andrew Peter Hammersley ESRF ESRF MX COMPUTIONAL AND NETWORK CAPABILITIES (Note: This is not my field, so detailed questions will have to be relayed.

Automating the Single Crystal X-Ray Diffraction Experiment – Mark Light – School of Chemistry - University of Southampton – ECM22 – Budapest 2004 Automating.

A Solution for Maintaining File Integrity within an Online Data Archive Dan Scholes PDS Geosciences Node Washington University 1.

Automatic Data Processing at the ESRF

OptiView™ XG Network Analysis Tablet

Record Storage, File Organization, and Indexes

Designing For Testability

Informatica PowerCenter Performance Tuning Tips

Model Governance Industry Evolution Beyond Model Accuracy

Database Requirements for CCP4 17th October 2005

Graeme Winter STFC Computational Science & Engineering

Experimental Definition in SynchWeb for XPDF

Autoprocessing updates at the MX beamlines

Tapping the Power of Your Historical Data

Compiler Construction

A Portion of the SCP RF Control System LCLS Related

MM002plus/007HF comparison #33120: middle resolution range case:

Automation from a user perspective

eHTPX crystallization, shipping and future

Overview of Workflows: Why Use Them?

Apex scanning strategy & map-making

Statistical Process Control

Computed Tomography (C.T)

Statistical Process Control

NSLS II High Data Rate Workshop May 2016

Presentation transcript:

Data Analysis I19 Upgrade Workshop 11 Feb 2014

Overview Short history of automated processing for Diamond MX beamlines Effects of adding Pilatus detectors Current capabilities Downstream analysis Future developments Changes for chemical crystallography Benefits resulting from automated processing

Automated Processing at Diamond Automatic processing with xia2 since 2007/8 ish Took 10 – 20 minutes for “standard” ADSC Q315 data set At first users wanted to switch off automatic processing Complaints “xia2 is too slow!” Also: warned of impending Pilatus 6M

Automated Processing at Diamond Wrote fast_dp – took experience from coding xia2 and fact XDS will work in parallel on a cluster Typically took <= 2 minutes for ADSC data set When Pilatus arrived it took about the same time with 5-10 x as many images Which gets us to here…

Before the Pilatus Upgrade Phase 1 MX beamlines had ADSC Q315 detectors, typical readout time around 1s With typical exposure time would get 20 – 30 images (10 – 15 degrees) per minute 180 degree data set gave you time for (a cup of) tea – long data sets gave time for a meal Manual data processing could keep up with collection

After the Pilatus Upgrade Steady data collection speed will give 180 degrees of data in 3 minutes – much faster is possible (less than 1 minute) Fast sample changer much more important – can now change samples in ~ 40s Possible to record 12 – 15 data sets / hour Throughput potentially more than doubled

What does this mean? Keeping up with data collection manually close to impossible Pushes pressure on downstream analysis even with automated data processing fast_dp is now critical to get timely results Databases more important for tracking results No longer will you have time for a cup of tea

A sign of the times…

What else does this mean? Short shifts / remote access become useful (Dave will talk about this later) Speculative data collection now more sensible – if you are not sure whether to collect or no You need to bring a bigger hard drive

Current Capabilities (on MX) Per-image analysis fast_dp – get feedback on your data collection within 2 minutes of the experiment xia2 – more comprehensive data processing fast_ep – based on fast_dp output, try experimental phasing dimple – based on fast_dp output, try searching for ligands

Per image analysis

Low resolution High resolution Rmerge I/sigma Completeness Multiplicity Anom. Completeness Anom. Multiplicity Anom. Correlation Nrefl Nunique Mid-slope dF/F dI/sig(dI) Merging point group: P Unit cell: Processing took 00h 01m 26s (86 s) [ reflections]

But not just data processing Collection of screening images will result in strategy calculations Fluorescence scans are analyzed automatically

In summary Merging statistics from quick (but reasonable) processing in ~ 2 minutes Maps possible within ~ 3 – 4 minutes Automatic strategies can guide data collection Everything tracked in the database

Future developments Better handling of weak data / tiny crystals Room temperature / in situ data collection Pushing algorithm development …

Differences for Chemical Crystallography All sets consist of multiple sweeps Indexing harder as data sparse – for strategy / screening and processing Scaling more important due to absorption effects (normally) Many more spacegroups to consider Strategies more complex Downstream analysis perhaps more tractable

Benefits from automated analysis Close to real-time feedback on data collection Allows you to focus on experimental results not driving GUI’s (for processing)

Trouble for the users Data collection less cosy – no time for tea! no time to inspect every image! not enough time to process data by hand… Need to bring more samples (this surprised MX users for a while) You need to be more organized You need a bigger hard drive

Upshot… you will get