Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,

Slides:



Advertisements
Similar presentations
CT213 – Computing system Organization
Advertisements

Linkage Editors Difference between a linkage editor and a linking loader: Linking loader performs all linking and relocation operations, including automatic.
Moving Data Lesson 23. Skills Matrix Moving Data When populating tables by inserting data, you will discover that data can come from various sources.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Spark: Cluster Computing with Working Sets
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Yevgeny Petrilin Shay Dan Shadi Ibrahim. GUI : Graphical User Interface DAQ :Data Acquisition Data Acquisition device  a self-powered system that communicated.
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Physical Database Monitoring and Tuning the Operational System.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Team Dosen UMN Physical DB Design Connolly Book Chapter 18.
Federated Searching Pre-Conference Workshop - The federated searching cookbook Qin Zhu HP Labs Research Library February 18, 2007.
Automated Computer Account Management in Active Directory June 2 nd, 2009 Bill Claycomb Systems Analyst Sandia National Laboratories Sandia is a multiprogram.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
Crystal Yellow Agile Software Methodology For ParaView Development Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Lecture 9 Methodology – Physical Database Design for Relational Databases.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
M1G Introduction to Database Development 6. Building Applications.
Chapter 16 Methodology – Physical Database Design for Relational Databases.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Dr. Mohamed Osman Hegazi 1 Database Systems Concepts Database Systems Concepts Course Outlines: Introduction to Databases and DBMS. Database System Concepts.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
6 th Annual Focus Users’ Conference Manage Integrations Presented by: Mike Morris.
Relational Databases Database Driven Applications Retrieving Data Changing Data Analysing Data What is a DBMS An application that holds the data manages.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Microsoft ® Business Solutions–Navision ® 4.0 Development II - C/SIDE Solution Development Day 2.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)
SciDAC SSS Quarterly Report Sandia Labs August 27, 2004 William McLendon Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed.
® IBM Software Group © 2007 IBM Corporation Best Practices for Session Management
1 Introduction to Oracle Chapter 1. 2 Before Databases Information was kept in files: Each field describes one piece of information about student Fields.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition Copyright © 2004 Pearson Education, Inc. Slide 2-1 Data Models Data Model: A set.
ABSTRACT The JDBC (Java Database Connectivity) API is the industry standard for database- independent connectivity between the Java programming language.
LAMMPS Users’ Workshop
Methodology – Physical Database Design for Relational Databases.
Database Management Systems (DBMS)
© Geodise Project, University of Southampton, Data Management in Geodise Zhuoan Jiao, Jasmin Wason & Marc Molinari { z.jiao,
Sandia is a multi-program laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Interface for Glyco Vault Functionality and requirements. Initial proposal. Maciej Janik.
Object storage and object interoperability
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Add Cool Visualizations Here Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary.
CS 440 Database Management Systems Stored procedures & OR mapping 1.
Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.
Enhanced Presentation of Tomographic Data Carmen Watts Clayton 1, Bernice Mills 2, George Buffleben 2, Thien Vu-Nguyen 3 1 STAR, California State University.
3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Automated File Server Disk Quota Management May 13 th, 2008 Bill Claycomb Computer Systems Analyst Infrastructure Computing Systems Department Sandia is.
Virtual Directory Services and Directory Synchronization May 13 th, 2008 Bill Claycomb Computer Systems Analyst Infrastructure Computing Systems Department.
SciDAC SSS Quarterly Report Sandia Labs January 25, 2005 William McLendon Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Module 11: File Structure
(on behalf of the POOL team)
Open Source distributed document DB for an enterprise
CS179G, Project In Computer Science
Bryan Burlingame 28 November 2018
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Photos placed in horizontal position with even amount of white space between photos and header Discussion: Dakota Results Database Brian M. Adams March 12, 2013

Why a Dakota Results Database?  Primary driver: Dakota executable users want more uniform, centralized access to output from Dakota iterative studies  Library mode users want the same, via C++ interface  Initially focused on results from an Iterator (method)  Run configuration (reproducibility) information  Extensions possible to interface, approximation, transformed evals; iteration history and details; metadata  For memory limited cases, push data out of core memory after computing, pull back in for results reporting (serialization may be more appropriate)  More broad design notes at 2

Initial High-level Requirements  Store results from most common studies; defer function evaluation data to restart database  Include enough metadata for user to directly locate/extract  In-core and file; options for when to sync between them  Initial file format goals both human-readable and machine parse-able: simple text, HDF5, YAML/XML, SQL  Avoid duplication of data  In-core database may replace class data  Don’t store labels many times  Avoid re-computation, reimplementation when possible 3

Progress through Jan. 31, 2012  Surveyed various data output by Dakota iterators (see Trac)  Initial discussion October 2012; design reviews and discussion on December 5, 2012  Initial implementation delivered in Dakota 5.3  In-core boost::any database, with option for array-based storage  Simple dump to pseudo-hierarchical annotated text file  Coverage of “most” results output: focused on most common  Option to add metadata with any archived result  Demonstrated archiving LHS moments at compute, loading at print  Does not address concerns with duplication, out-of-core, re- computation, re-implementation. No YAML or HDF5.  Show example of text results output for hybrid optimization, sampling, PCE, helper iterator (PCE, EGO) 4

Current Abstractions  ResultsManager: manages in-core and file based databases under the hood  Post data to ResultsManager through API using concrete types  Under the hood, gets stored in boost::any or passed to file  ResultsEntry: used to retrieve a results from the database  If in-core active, manages a reference to the stored data  If not, loads from file and manages a reference to a contained data object  Allows retrieval of a single entry in an array to support per-function restore of data 5

Storage Types: dakota_results_types.hpp  Data key: method_name, method_id, execution number, data label typedef tuple ResultsKeyType;  Data value: boost::any, currently supporting RealMatrixArray of:RealMatrix RealVector(typically per-function)RealVector StringVectorStringVector  Metadata: metadata label, vector of strings typedef map > MetaDataType; 6

Initial Design: Lessons / Challenges  Unique identifiers for all methods/instances run, including helper iterators  Structure/hierarchy vs. flexibility/extensibility  Best storage of data likely different than current class member and output organization  When to do per-function vs. contiguous data set  How to handle highly ragged or conditional data (different moment types per function)  PCE coefficients or Sobol indices may be stored in a matrix, but want to be able to write/read them one function at a time.  Group a best point together with it’s functions, constraints, or store variables together in an array, functions together in an array  Dealing with Dakota::String and Boost multi-array of string 7

Discussion: Results DB Next Steps  What do you want from this capability as a user?  As a developer?  What kinds of queries do you want on this data? Important to be able to slice multiple ways, or can that be done in other tools?  How do other tools handle this kind of output?  Should we focus first on just getting the output out, then on efficiency issues, class reorganization, etc., or attempt all at once? 8