Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,

Similar presentations


Presentation on theme: "Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,"— Presentation transcript:

1 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. Photos placed in horizontal position with even amount of white space between photos and header Discussion: Dakota Results Database Brian M. Adams March 12, 2013

2 Why a Dakota Results Database?  Primary driver: Dakota executable users want more uniform, centralized access to output from Dakota iterative studies  Library mode users want the same, via C++ interface  Initially focused on results from an Iterator (method)  Run configuration (reproducibility) information  Extensions possible to interface, approximation, transformed evals; iteration history and details; metadata  For memory limited cases, push data out of core memory after computing, pull back in for results reporting (serialization may be more appropriate)  More broad design notes at https://software.sandia.gov/trac/dakota/wiki/DatabaseDesign https://software.sandia.gov/trac/dakota/wiki/DatabaseDesign 2

3 Initial High-level Requirements  Store results from most common studies; defer function evaluation data to restart database  Include enough metadata for user to directly locate/extract  In-core and file; options for when to sync between them  Initial file format goals both human-readable and machine parse-able: simple text, HDF5, YAML/XML, SQL  Avoid duplication of data  In-core database may replace class data  Don’t store labels many times  Avoid re-computation, reimplementation when possible 3

4 Progress through Jan. 31, 2012  Surveyed various data output by Dakota iterators (see Trac)  Initial discussion October 2012; design reviews and discussion on December 5, 2012  Initial implementation delivered in Dakota 5.3  In-core boost::any database, with option for array-based storage  Simple dump to pseudo-hierarchical annotated text file  Coverage of “most” results output: focused on most common  Option to add metadata with any archived result  Demonstrated archiving LHS moments at compute, loading at print  Does not address concerns with duplication, out-of-core, re- computation, re-implementation. No YAML or HDF5.  Show example of text results output for hybrid optimization, sampling, PCE, helper iterator (PCE, EGO) 4

5 Current Abstractions  ResultsManager: manages in-core and file based databases under the hood  Post data to ResultsManager through API using concrete types  Under the hood, gets stored in boost::any or passed to file  ResultsEntry: used to retrieve a results from the database  If in-core active, manages a reference to the stored data  If not, loads from file and manages a reference to a contained data object  Allows retrieval of a single entry in an array to support per-function restore of data 5

6 Storage Types: dakota_results_types.hpp  Data key: method_name, method_id, execution number, data label typedef tuple ResultsKeyType;  Data value: boost::any, currently supporting RealMatrixArray of:RealMatrix RealVector(typically per-function)RealVector StringVectorStringVector  Metadata: metadata label, vector of strings typedef map > MetaDataType; 6

7 Initial Design: Lessons / Challenges  Unique identifiers for all methods/instances run, including helper iterators  Structure/hierarchy vs. flexibility/extensibility  Best storage of data likely different than current class member and output organization  When to do per-function vs. contiguous data set  How to handle highly ragged or conditional data (different moment types per function)  PCE coefficients or Sobol indices may be stored in a matrix, but want to be able to write/read them one function at a time.  Group a best point together with it’s functions, constraints, or store variables together in an array, functions together in an array  Dealing with Dakota::String and Boost multi-array of string 7

8 Discussion: Results DB Next Steps  What do you want from this capability as a user?  As a developer?  What kinds of queries do you want on this data? Important to be able to slice multiple ways, or can that be done in other tools?  How do other tools handle this kind of output?  Should we focus first on just getting the output out, then on efficiency issues, class reorganization, etc., or attempt all at once? 8


Download ppt "Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,"

Similar presentations


Ads by Google