Archive Engine for Large Data Sets Nikolay Malitsky EPICS Collaboration Meeting San Francisco, USA October 5, 2013.

Slides:



Advertisements
Similar presentations
Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll
Advertisements

V4 Status and Workshop Report CSS, DISCS, an V4 team.
Control System Studio (CSS) Data Access Layer (DAL) Kay Kasemir, Xihui Chen July 2009.
EPICS V4/areaDetector Integration
Nov DOLAP 2002 McLean USA A Multidimensional and Multiversion Structure for OLAP Applications Mathurin Body 1,2, Maryvonne Miquel 2, Yvan Bédard.
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
M180: Data Structures & Algorithms in Java
A Unified Data Model and Programming Interface for Working with Scientific Data Doug Lindholm Laboratory for Atmospheric and Space Physics University of.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
AreaDetector Data Processing Pipeline In EPICS V4 Dave Hickin Diamond Light Source EPICS Collaboration Meeting Diamond Light Source 01/05/2013.
DDS Integration Nikolay Malitsky.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
DM_PPT_NP_v01 SESIP_0715_JP Indexing HDF5: A Survey Joel Plutchak The HDF Group Champaign Illinois USA This work was supported by NASA/GSFC under Raytheon.
History Server & API Christopher Larrieu Jefferson Laboratory.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
Prof. Yousef B. Mahdy , Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Coverages and the DAP2 Data Model James Gallagher.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
1 BROOKHAVEN SCIENCE ASSOCIATES NSLSII Physics Applications – Applying V4 The Control Group – presented by Bob Dalesio Taiwan EPICS Meeting, June 2011.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
1/15 G. Manduchi EPICS Collaboration Meeting, Aix-en-Provence, Spring 2010 INTEGRATION OF EPICS AND MDSplus G. Manduchi, A. Luchetta, C. Taliercio, R.
HDF Dimension Scales in HDF5 HDF-EOS Workshop IX San Francisco, CA November 30 - December 2, 2005 Pedro Vicente Nunes THG/NCSA Champaign-Urbana, IL HDF.
MASAR Service Guobao Shen Photon Sciences Department Brookhaven National Laboratory EPICS Collaboration Workshop Oct 05, 2013.
Archive Service Nikolay Malitsky October 2, 2013.
Integrating netCDF and OPeNDAP (The DrNO Project) Dr. Dennis Heimbigner Unidata Go-ESSP Workshop Seattle, WA, Sept
MASAR Server & Application Guobao Shen Photon Sciences Department Brookhaven National Laboratory Collaboration Working Group Oct 02, 2013.
EPICS Collaboration Meeting, November SNS Optics Database N. Malitsky.
1 HDF5 Life cycle of data Boeing September 19, 2006.
NetCDF Data Model Issues Russ Rew, UCAR Unidata NetCDF 2010 Workshop
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
HDF Hierarchical Data Format Nancy Yeager Mike Folk NCSA University of Illinois at Urbana-Champaign, USA
The data standards soup … Is the most exciting topic you can dream of.
1 BROOKHAVEN SCIENCE ASSOCIATES Control System Overview Bob Dalesio, Control Group HLA Review for NSLS-II Project April
1 BROOKHAVEN SCIENCE ASSOCIATES EPICS Version 4 – Development Plan V4 Team – presented by Bob Dalesio EPICS Meeting October 12, 2010.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
Slide 1 SDTSSDTS FGDC CWG SDTS Revision Project ANSI INCITS L1 Project to Update SDTS FGDC CWG September 2, 2003.
HDF5 Q4 Demo. Architecture Friday, May 10, 2013 Friday Seminar2.
EPICS Development for the ASKAP Design Enhancements Program ASTRONOMY AND SPACE SCIENCE Craig Haskins 18 th October 2015 EPICS User Meeting – Melbourne.
NetCDF and Scientific Data Durability Russ Rew, UCAR Unidata ESIP Federation Summer Meeting
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
Programming Fundamentals. Today’s Lecture Array Fundamentals Arrays as Class Member Data Arrays of Objects C-Strings The Standard C++ string Class.
Control System Overview J. Frederick Bartlett Fermilab June 1,1999.
The HDF Group Introduction to HDF5 Session Two Data Model Comparison HDF5 File Format 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh
The HDF Group Introduction to HDF5 Session 7 Datatypes 1 Copyright © 2010 The HDF Group. All Rights Reserved.
TSDS (HPDE DAP). Objectives (1) develop a standard API for time series-like data, (2) develop a software package, TSDS (Time Series Data Server), that.
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
1 BROOKHAVEN SCIENCE ASSOCIATES EPICS Version 4 – Normative V4 Team – presented by Bob Dalesio EPICS Meeting October 7, 2011.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
Other Projects Relevant (and Not So Relevant) to the SODA Ideal: NetCDF, HDF, OLE/COM/DCOM, OpenDoc, Zope Sheila Denn INLS April 16, 2001.
ORNL is managed by UT-Battelle for the US Department of Energy ADnED Handling V4 Neutron Event Data Matt Pearson SNS ORNL 18 th -22 nd May 2015.
Module 11: File Structure
Data Catalog Project A Browsable, Searchable, Metadata System
Prototyping the Next EPICS Archiver
POOL persistency framework for LHC
Extraction, aggregation and classification at Web Scale
Cleaning up the mess.
Comparing NetCDF and a multidimensional array database on managing and querying large hydrologic datasets: a case study of SciDB– P5 Haicheng Liu.
Proposed Changes of the DDS Dynamic Data interface based on the consideration of EPICS PVData and Google’s Protocol Buffers Nikolay Malitsky 1.
XDR External Data Representation
Supporting High-Performance Data Processing on Flat-Files
EPICS 7 Matej Sekoranyja, Marty Karimer, Michael Davidsaver, Ralph Lange, Andrew Johnson, Timo Korhonen, Heinz Junkes, Patrick Marschalik, Murali Shankar,
Scaling Bathymetry: Data handling for large volumes
Presentation transcript:

Archive Engine for Large Data Sets Nikolay Malitsky EPICS Collaboration Meeting San Francisco, USA October 5, 2013

Outline  Objectives: new scope, new scale, new applications  Incremental Approach: big picture, classic and HDF5 versions  Major techniques: type system, chunk model, HDF5 file format  Next Use Case: time series of frames

Objectives  New Scale: 1 M samples/s from 30,000 heterogeneous physical devices of power supplies, diagnostics, RF, vacuum system, etc  New Scope: Transition from EPICS 3 to EPICS 4 bringing the middle layer and support of the user-defined data types  New Applications: Beamline DAQ 2TB/day

Integrated and Incremental Approach Channel Archiver Framework:  Data Access: EPICS V4 RPC  Engine: generic type system, chunk model, lock-free approach  Backend: HDF5

Classic Version  Data Access: XML-RPC and EPICS V4 RPC  Engine: Lock-free Circular Buffer => 100 K scalar samples/s  Backend: Index + original Data files => 300 K scalar samples/s CSS Plug-in: org.csstudio.archiver.reader.ea4 C++ RPC server: ea4:pvrpc

Data Access: Get Channel Values Heterogeneous Array Solution: EPICS 4 PVData-based dynamic structure of self-described members

HDF5 Version  Data Access: EPICS V4 RPC  Engine:  Lock-free everywhere => 0.3 – 0.5 M scalar samples/s  Generic Type catalog  Generic TS Chunk  Backend: Index + HDF5 files => 3 M scalar samples/s (HDD)  Extract, Transform, Load (ETL): Index Hard drive: SSD HDF5 chunk: one TS Chunk Hard drive: HDD HDF5 chunk: many TS Chunks ETL

Open Type System  DBR Types  PV Data  HDF5  SciDB  … Structured Data Types OMG DDS Extensible and Dynamic Topic Types Version 1.0: November 2012

RDB-based Type Catalog  xtype_kind: enumeration values of int16, float, string, sequence, structure, etc.  xtype: collection of all type ids and versions  xtype_dependence: auxiliary table with the type dependencies  xtype_member: structure-member associations  xtype_collection: sequence-element associations with D. Dohan

Chunk-Based Model TS Chunk: Generic API Type Memory Buffer char* TS Sample: Type Generic API TSSample Channel is a sequence of the TS chunks Array – Chunk – Sample

HDF5 Backend Channel Datasets DatatypeAttributesDBR Use Case Original Channel Archiver Intervals [chunk-based collection] timestamps index of the first sample number of samples Intervals associated with the index file File offsets and buffer sizes of Data Headers Info [one element] Channel-specificType NameCtrlInfoCtrlInfo’s of Data Headers Data [chunk-based collection] Channel-specificType NameOne of the DBR scalar or waveform types DBR type and count of Data Headers + Buffers HDF5 Conceptual Data Model: group: a folder of named objects, such as groups, datasets, and data types dataset: a chunk-based multidimensional array of data elements with attributes, data space and data types datatype: a description of a specific class of data element Channel is a HDF5 group including the following datasets:

Next Use Case: Time Series of Frames TS Frames Timestamp Bin Metadata Frame array of Coord Metadata Position array of pixel_meta Coord Metadata frame meta 1 …* 1 array o f position meta 1 …* timestamps frames positions The proposed generic structure of the sparse multi-dimensional array is defined after the “natural” experiment-oriented representation built from the combination of two datasets: time series of detector-specific frames and time series of the frame positions in the scan-specific multi-dimensional space (angle, energy, pressure, etc)  mapped into EPICS 4 PVData and HDF5 representations  consistent with all (22) NeXus Application Definitions