Data Management at the Advanced Photon source (APS)

Slides:



Advertisements
Similar presentations
Experiment Workflow Pipelines at APS: Message Queuing and HDF5 Claude Saunders, Nicholas Schwarz, John Hammonds Software Services Group Advanced Photon.
Advertisements

DATUM in Action – Healthy research needs healthy data DATUM in Action Supporting researchers to plan and manage their research data
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
Copyright management in open access projects Iryna Kuchma Open Access Programme Manager Attribution 3.0 Unported.
First Lego League of Tennessee Quentoria Leeks Fisk University Research Alliance in Math and Science Computer Applications and Web Technologies Networking.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
A U.S. Department of Energy Office of Science Laboratory Operated by The University of Chicago Argonne National Laboratory Office of Science U.S. Department.
April 28, 2005 EPICS Collaboration Controls Group Status of the Channel Access Zippy Archiver (CZAR) B. Bevins, et. al.
The Role of DANSE at SNS Steve Miller Scientific Computing Group Leader January 22, 2007.
1 E-Discovery Changes to Federal Rules of Civil Procedure Concerning Discovery of Electronically Stored Information (ESI) Effective Date: 12/01/2006 October,
Introduction to Intellectual Property using the Federal Acquisitions Regulations (FAR) To talk about intellectual property in government contracting, we.
Connecting AreaDetector to GDA John Hammonds Software Services Group Advanced Photon Source The submitted manuscript has been created by UChicago Argonne,
In conclusion our tool: can be used with any operator overloading AD package replaces the manual process, which is slow and overestimates the number active.
Bill Wrobleski Director, Technology Infrastructure ITS Infrastructure Services.
Guide to Linux Installation and Administration, 2e1 Chapter 13 Backing Up System Data.
Ian Bird LHCC Referees’ meeting; CERN, 11 th June 2013 March 6, 2013
Agenda: DMWG SM policy status ESIP meeting recap Reminder - DM Webinar Series New and updated web pages on DM website Metadata Training Sessions CDI meeting.
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
Data management in the field Ari Haukijärvi 2nd EHES training seminar.
Managed by UT-Battelle for the Department of Energy 1 Integrated Catalogue (ICAT) Auto Update System Presented by Jessica Feng Research Alliance in Math.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Igor Gaponenko ( On behalf of LCLS / PCDS ).  An integral part of the LCLS Computing System  Provides:  Mid-term (1 year) storage for experimental.
Custom Software Development Intellectual Property and Other Key Issues © 2006 Jeffrey W. Nelson and Iowa Department of Justice (Attach G)
Session 2.  Wake Up Call, LSTA Digitization Grant  Digital Preservation Summit, May 2008  ISU Digital Preservation Group, September 2009.
A U.S. Department of Energy Office of Science Laboratory Operated by The University of Chicago Argonne National Laboratory Office of Science U.S. Department.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Series 2013 Data Management at the National Climate Change and Wildlife Science Center.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Thoughts on Data Management Nicholas Schwarz Software Services Group Advanced Engineering Support (AES) Division Advanced Photon Source (APS) 25 June 2013.
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
DRAFT EDMC Procedural Directives NOAA Environmental Data Management Committee 12/3/2015 1
Science Data in the Science Mission Directorate (SMD) Jeffrey J.E. Hayes Program Executive for MO & DA, Heliophysics Division August 17, 2011.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
DOE Data Management Plan Requirements
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Research Data Management 26 th April 2016 Federica Fina, Data Scientist, University of St Andrews Library.
Software sales at U Waterloo Successfully moved software sales online Handle purchases from university accounts Integrated with our Active Directory and.
A U.S. Department of Energy laboratory managed by UChicago Argonne, LLC. Introduction APS Engineering Support Division –Beamline Controls and Data Acquisition.
Advanced Computing Facility Introduction
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
The NOAA Big Data Project ESIP Cloud Computing Panel
Simulation Production System
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Redesigning the DOE Data Explorer to embed dataset relationships at the point of search and to reflect landing page organization Sara Studwell Department.
EPICS Roadmap Where Do We Go From Here?
Data Ingestion in ENES and collaboration with RDA
Software infrastructure for a National Research Platform
Joseph JaJa, Mike Smorul, and Sangchul Song
AMRDEC Test Facility Improvement Project
Southwest Tier 2.
VI-SEEM Data Repository
THE STEPS TO MANAGE THE GRID
Computing Infrastructure for DAQ, DM and SC
Mirjam van Daalen, (Stephan Egli, Derek Feichtinger) :: Paul Scherrer Institut Status Report PSI PaNDaaS2 meeting Grenoble 6 – 7 July 2016.
Real IBM C exam questions and answers
Access  Discovery  Compliance  Identification  Preservation
Ahmet Fatih Mustacoglu
Future Data Architectures Big Data Workshop – April 2018
Case Study: Algae Bloom in a Water Reservoir
Gwyn P. Williams and Kim Kindrew Pizza Seminar, September 18, 2013
Unit# 5: Internet and Worldwide Web
Mirjam van Daalen, (Stephan Egli, Derek Feichtinger) :: Paul Scherrer Institut Status Report PSI PaNDaaS2 meeting Grenoble 12 – 13 December 2016.
Long-Lived Data Collections
Data Management Components for a Research Data Archive
Valuable Lessons from Fuel Cycle Code Comparisons
Successful Data Curation for Large Data Archives
Presentation transcript:

Data Management at the Advanced Photon source (APS) RDA-PanSig Workshop on Interoperability Data Management at the Advanced Photon source (APS) drhgfdjhngngfmhgmghmghjmghfmf Nicholas Schwarz Principal Computer Scientist, Group Leader Scientific Software Engineering & Data Management X-ray Science Division Advanced Photon Source 3 - 4 April 2017 ALBA

Data Policy Current APS Data Policy DOE Statements The APS is committed to providing our users with their data in a timely and convenient fashion Users are responsible for meeting their data management obligations (usually dictated by their funding agencies) The APS does not guarantee long-term data archiving or management Each beamline has its own data management plan https://www1.aps.anl.gov/Users-Information/Help-Reference/Data-Management- Retrieval-Practices DOE Statements Not all data needs to be shared or preserved; cost/benefit should be considered PI funded to collect data should comply with respective funding agency requirements for a data management plan, which should address how to validate results using preserved data, or how results may be reproduced without preserving data http://science.energy.gov/funding-opportunities/digital-data-management

Data Storage Systems Argonne Leadership Computing Facility (ALCF) prototypes Petrel (Online Now) IBM ESS (Elastic Storage Server) GL6 2 x POWER8 servers GPFS Native RAID 6 JBODS 58 x 6 TB drives each 2 x 400 GB SSD each (metadata) 2 PB raw storage / 1.5 PB usable storage Extrepid (Provisioning) Data Direct Networks (DDN) S2A9900 4 racks 10 drawers of 60 drives per rack 48 1TB and 12 3TB SATA drives in each drawer 3 PB raw storage / 1.5 PB usable storage Tape Backup Utilize ALCF tape backup systems (via GPFS) when needed Contact: Mike Papka, William Allcock, Ian Foster, Rachana Ananthakrishnan, Roger Sersted, Dave Wallis, Ken Sidorowicz, et al.

Data Management & Distribution Globus Services Collaborating closely with Globus Services team (www.globus.org) to leverage best-in- class tools for automating data transfer, file sharing, and maintaining data ownership / permissions. Integration with orchid is being planned for both APS and ALCF. Contact: Ian Foster, Rachana Ananthakrishnan, Mike Papka, William Allcock, Roger Sersted, Dave Wallis, Ken Sidorowicz, et al.

Data Management & Distribution Storage Automation Some beamlines have written their own tools for automating data transfer to storage systems using the Globus command line tools: 2-BM and 32-ID-C Other beamlines are using a set of tools developed to aid in this process: 1-ID, 6- ID, 7-ID, 8-ID, 33-ID, and 34-ID # Add a new experiment > dm-add-experiment --experiment=s1id-data01 --name=s1id-data01 # Add users and roles > dm-add-user-experiment-role --experiment=s1id-data01 --username=d12345 --role=User # Start experiment > dm-start-experiment --experiment=s1id-data01 # Monitor a directory for new files and transfer data to storage system > dm-start-daq --experiment=s1id-data01 --data-directory=/local/s1id-data01 # Alternatively, data files may be uploaded after acquisition > dm-upload --experiment=s1id-data01 --data-directory=/local/s1id-data01 # Other commands, such as dm-get-daq-info and dm-get-upload-info for checking status, and dm-stop-daq and dm-stop-experiment for stopping monitoring. https://confluence.aps.anl.gov/display/DMGT/ ~750 TB of data stored since October 2015 Contact: Rachana Ananthakrishnan, Francesco De Carlo, Ian Foster, Barbara Frosik, Sinisa Veseli, et al.

Next Steps APS supports a variety of data formats: TIFFs, custom ASCII spec files, HDF5 (NeXuS, Data Exchange, others) Both a database and an archival format Exploration BNL/NSLS-II’s BlueSky Next generation data acquisition system (long-term alternative to spec?) Metadata catalog – format agnostic NoSQL database is very flexible ICAT Materials Data Facility, Citrine, Invenio, PDB US Collaborations: ExFaC; CAMERA; APS – NSLS-II analysis collaboration

Thank You

The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.  The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan. http://energy.gov/downloads/doe-public-access-plan.