Managing Diffraction Beam

Slides:



Advertisements
Similar presentations
GWDAW 16/12/2004 Inspiral analysis of the Virgo commissioning run 4 Leone B. Bosi VIRGO coalescing binaries group on behalf of the VIRGO collaboration.
Advertisements

Touchstone Automation’s DART ™ (Data Analysis and Reporting Tool)
13- 1 Chapter 13.  Overview of Sequential File Processing  Sequential File Updating - Creating a New Master File  Validity Checking in Update Procedures.
CS-EE 481 Spring March, 2007 University of Portland School of Engineering Project ZigZag Team Adam Russell Will French Matt Heye Advisor Dr. Rylander.
The Electromagnetic Spectrum High Harmonic Generation
Mathematical Derivation of Probability
Jonathan Donkers1, Lin Zhang1+
Measuring the Thickness of Cryosheet Jets
Madhu Karra1, Jason Koglin1,2
TensorFlow– A system for large-scale machine learning
INTRODUCTION TO STATISTICS
OPERATING SYSTEMS CS 3502 Fall 2017
Sentiment Analysis and Data visualization Techniques of Experiments
Complications and Troubleshooting
Image Processing For Soft X-Ray Self-Seeding
Structure and Function: Ubiquitination of NEMO and Optineurin
Characterizing Optics with White-Light Interferometry
Matter in Extreme Conditions at LCLS:
Kevin Shimbo1, Matt Hayes+
X-Ray Shielding Introduction Application Research Conclusions
Crystallography images
Hutch 4 (XCS) Laser System
Anomaly Detection LCLS data: Introduction Research Conclusion
MAKING WAY FOR LCLS II HERRIOTT CELL IGNIS THE NEED FOR LCLS-II
A remote EPICS monitoring tool
EASE: Alert Management
Designing and Prototyping Optics Table and Shelves
Integration of Blu-Ice into
Effects of Wrapping Vacuum
Scintillation Material Encapsulation Fixture
LCLS CXI Sample Chambers and Engineering Solutions
Values to Normalize Data
Designing a molecular probe for detection
LAMP Vacuum System Documentation
G6PD Complex Structural Study Crystallization Setup
Your Name, Elias Catalano, Robert Sublett, Robin Curtis
Determining a Detector’s Saturation Threshold
HXRSD Helium Enclosure
Crystallography Labeling Using Machine Learning
IT Inventory and Optimization
Structural study of soluble ammonia monooxygenase
High Efficiency X-band Klystron Design Study
Creation and development of a PRP database
IT Inventory and Optimization
Electro Optic Sampling of Ultrashort Mid-IR/THz Pulses
Introduction to Summary Statistics
Margaret Koulikova1, William Colocho+
MD Online IEP System Instructional Series – PD Activity
Optimization of PHEV/EV Battery Charging
MD Online IEP System Instructional Series – PD Activity #7
Analytical Solutions for Heat Distribution in KB Mirror Systems
Repeatability Test of Attocube Piezo Stages
Analytical Solutions for Temperature
Barron Montgomery1,2, Christopher O’Grady2, Chuck Yoon2, Mark Hunter2+
Elizabeth Van Campen, Barrack Obama2, Stephen Curry1,2, Steve Kerr2+
Advanced Methods of Klystron Phasing
BW Analytics SAP Best Practices.
Ryan Dudschus, Alex Wallace, Zachary Lentz
BW Analytics SAP Best Practices.
Jennifer Hoover1, William Colocho2
Chapter 8 Search and Sort
X-Ray Spectrometry Using Cauchois Geometry For Temperature Diagnostics
Redesigning of The SED Calendars
Automatic compression control for the LCLS injector laser
Russell Edmonds1, Matt Hayes2+ 1LCLS Summer Research Intern
Prototype and Design of High-Speed X-Ray Chopper
Title of Your Poster Introduction Research Conclusions Acknowledgments
Reconstruction algorithms Challenges & Future work
Database Implementation Issues
Félix Proulx-Giraldeau, Thomas J. Rademaker, Paul François 
Presentation transcript:

Managing Diffraction Beam Data with an Offline Event Builder Ian Laurent van Roque1, Dr. Monarin Uervirojnangkoorn2+, Dr. Paul Christopher O'Grady 2++ 1Intern. 2Linac Coherent Light Source, SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, CA 94025, USA. + Contact: monarin@.stanford.edu ++Contact: cpo@slac.stanford.edu Overview Results Final principle Script: The “superdealer” Utilizes master-client relationship between cores. Indices with matching timestamps are gathered. Client cores request batches of data from the master. Clients pair event data with their timestamps. We developed an Offline Event Builder to process diffraction data in support of major upgrades to LCLS. The LCLS-II project intends to increase pulse frequency by 8000 fold, allowing beam diffraction information to pour in at 20GB of data every second. We evaluate the effectiveness of numerous analysis scripts in order to determine the optimal performance of a data-reduction pipeline intended to filter through “interesting” beam events. Keywords: Python, MPI, core threading, LCLS-II Unfortunately for our “superdealer,” time tests have displayed asymptotic behavior towards a minimum analysis time of nine gigabytes a second. At this point, increasing the batch size of the distributed data will not increase efficiency. In fact, increasing batch size much further is expected to reduce performance as cores are overwhelmed by prodigious volumes of data. As seen in the graphs, the inspection times of 32 cores is similar to the 64 core analysis. While adding more cores would speed up survey length, it is not expected to bolster the process by more than a few gigabytes a second at most. (Best time for 256 cores is 17.6 GB per second) Fig 4. Flowchart of the “superdealer” principle script. The master core matches timestamps for all of the events which are then given to the client cores. Client cores gather event data over the ten h5 files. Fig 1. Flowchart depicting LCLS data flow. The top half of the image illustrates the “online” portion of the process in which data is handled as it is received. The bottom half portrays an “offline” portion of analysis where event data has already been saved to memory. Our scripts are to be implemented in the bottom half Analysis Each script is tested over multiple batch sizes. From the data we can see that the primitive “mpiscript” can filter though 1GB of data on minute-length timescales, too slow for our ambitious analysis requirements. As expected, analysis time is seen to decrease with both batch size and number of cores. Methods In our analysis, variations of three principle scripts are timed in order to test their effectiveness in auditing large quantities of data. Scripts are required to inspect h5 files across a multi-core threading system using MPI. Data is distributed to cores in user-defined “batches.” It is expected that scripts will become more efficient as batch number increases toward a maximum-efficiency batch after which analysis time plateaus off. First Principle Script: The “mpiscript” Analyzes single h5 file. Timestamps distributed to cores according to batch. Cores fetch event data for analysis Fig 7. Plot for the “superdealer” script at NERSC. As seen here analysis times of 8.8 gigabytes per second have been recorded. As surmised the analysis time wavers around a maximum efficiency as batch size increases. Analysis time actually increases for very large batches as client cores begin to be inundated with data. Future While the ultimate goal of managing 20 gigabytes of data has not yet been reached, many improvements to the scripts can still be made to increase the effectiveness of the algorithms. The script is intended to be optimized in C to maximize efficiency between the code and the client cores. Refactoring of the script may increase speed as well as more sophisticated use of while loops to quickly run through event data. The script will also need to be updated in order to accommodate different or changing numbers of files, especially large event arrays much heftier than 10GB per file. Tests on better file systems such as the Burst Buffer at NERSC are being prepared and are expected to produce better results. With these improvements, the master-client method of analysis may become more agile as it sifts through event data, allowing for the possibility of a 20GB per second data reduction pipeline. Fig 5. Plot for the “mpiscript.” As seen here, analysis times of about a gigabyte per minute have been recorded. As expected the algorithm is faster on more cores and approaches a minimum analysis time as batch increases. The “superscript” is shown to fare better in the volley of event data it encounters, and is capable of pairing data at about a gigabyte every second. Further tests show analysis time approaching a maximum efficiency, agreeing with our assumptions. Fig 2. Flowchart of the first principle script. Cores gather corresponding event data from the timestamp indices. The output for this script is the time it takes for all the events to be paired with their timestamps. Second Principle Script: The “superscript” Pairs timestamp data over multiple h5 files. Event data with matching timestamps are gathered. np.searchsorted pairs variable length data. Acknowledgments Use of the Linac Coherent Light Source (LCLS), SLAC National Accelerator Laboratory, is supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Contract No. DE-AC02-76SF00515. Fig 6. Data for the “superscript.” The “superscript” filters through data faster than the original principle script. The same pattern as the previous script is seen confirming our hypothesis that the analysis time is dependent on batch and the number of cores. Fig 3. Illustration of the “superscript.” The event data that has been matched across timestamps are gathered together as in “mpiscript.” The output for this script is the time for all identical timestamps to be paired to their corresponding event data. Date: 08/28/2017