Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Photos placed in horizontal position with even amount of white space between photos and header Discussion: Dakota Results Database Brian M. Adams March 12, 2013
Why a Dakota Results Database? Primary driver: Dakota executable users want more uniform, centralized access to output from Dakota iterative studies Library mode users want the same, via C++ interface Initially focused on results from an Iterator (method) Run configuration (reproducibility) information Extensions possible to interface, approximation, transformed evals; iteration history and details; metadata For memory limited cases, push data out of core memory after computing, pull back in for results reporting (serialization may be more appropriate) More broad design notes at 2
Initial High-level Requirements Store results from most common studies; defer function evaluation data to restart database Include enough metadata for user to directly locate/extract In-core and file; options for when to sync between them Initial file format goals both human-readable and machine parse-able: simple text, HDF5, YAML/XML, SQL Avoid duplication of data In-core database may replace class data Don’t store labels many times Avoid re-computation, reimplementation when possible 3
Progress through Jan. 31, 2012 Surveyed various data output by Dakota iterators (see Trac) Initial discussion October 2012; design reviews and discussion on December 5, 2012 Initial implementation delivered in Dakota 5.3 In-core boost::any database, with option for array-based storage Simple dump to pseudo-hierarchical annotated text file Coverage of “most” results output: focused on most common Option to add metadata with any archived result Demonstrated archiving LHS moments at compute, loading at print Does not address concerns with duplication, out-of-core, re- computation, re-implementation. No YAML or HDF5. Show example of text results output for hybrid optimization, sampling, PCE, helper iterator (PCE, EGO) 4
Current Abstractions ResultsManager: manages in-core and file based databases under the hood Post data to ResultsManager through API using concrete types Under the hood, gets stored in boost::any or passed to file ResultsEntry: used to retrieve a results from the database If in-core active, manages a reference to the stored data If not, loads from file and manages a reference to a contained data object Allows retrieval of a single entry in an array to support per-function restore of data 5
Storage Types: dakota_results_types.hpp Data key: method_name, method_id, execution number, data label typedef tuple ResultsKeyType; Data value: boost::any, currently supporting RealMatrixArray of:RealMatrix RealVector(typically per-function)RealVector StringVectorStringVector Metadata: metadata label, vector of strings typedef map > MetaDataType; 6
Initial Design: Lessons / Challenges Unique identifiers for all methods/instances run, including helper iterators Structure/hierarchy vs. flexibility/extensibility Best storage of data likely different than current class member and output organization When to do per-function vs. contiguous data set How to handle highly ragged or conditional data (different moment types per function) PCE coefficients or Sobol indices may be stored in a matrix, but want to be able to write/read them one function at a time. Group a best point together with it’s functions, constraints, or store variables together in an array, functions together in an array Dealing with Dakota::String and Boost multi-array of string 7
Discussion: Results DB Next Steps What do you want from this capability as a user? As a developer? What kinds of queries do you want on this data? Important to be able to slice multiple ways, or can that be done in other tools? How do other tools handle this kind of output? Should we focus first on just getting the output out, then on efficiency issues, class reorganization, etc., or attempt all at once? 8