Q5cost A common data format for program interchange in the Quantum Chemistry domain Elda Rossi CINECA – Bologna (Italy)

Slides:



Advertisements
Similar presentations
Pulan Yu School of Informatics Indiana University Bloomington Web service based Varuna.Net.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Looking for a (standard) Common Format for (Quantum)
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Database Planning, Design, and Administration
Database System Concepts and Architecture
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 15 Programming and Languages: Telling the Computer What to Do.
A New Computing Paradigm. Overview of Web Services Over 66 percent of respondents to a 2001 InfoWorld magazine poll agreed that "Web services are likely.
CPSC 695 Future of GIS Marina L. Gavrilova. The future of GIS.
Computers: Tools for an Information Age
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
Mgt 240 Lecture Website Construction: Software and Language Alternatives March 29, 2005.
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
Programming Concepts and Languages Chapter 12 – Computers: Understanding Technology, 3 rd edition 1November
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
Programming Languages: Telling the Computers What to Do Chapter 16.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
COMPUTER SOFTWARE Section 2 “System Software: Computer System Management ” CHAPTER 4 Lecture-6/ T. Nouf Almujally 1.
Publish Your Work BIM Curriculum 04. Topics  External Collaboration  Sharing the BIM model  Sharing Documents  Sharing the 3D model  Reviewing 
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
HDF-EOS Workshop VII, An XML Approach to HDF-EOS5 Files Jingli Yang 1, Bob Bane 1, Muhammad Rabi 1, Zhangshi Yin 1, Richard Ullman 1, Robert McGrath.
Input/OUTPUT [I/O Module structure].
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
ITEC224 Database Programming
Exploring the Applicability of Scientific Data Management Tools and Techniques on the Records Management Requirements for the National Archives and Records.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Development of ORBIT Data Generation and Exploration Routines G. Shelburne K. Indireshkumar E. Feibush.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
© 2013, published by Flat World Knowledge 10-1 Information Systems: A Manager’s Guide to Harnessing Technology, version 2.0 John Gallaugher.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Markup in Atomic and Molecular Simulations: Implementation & Issues Jon Wakelin Dept. Earth Sciences University of Cambridge.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
The european ITM Task Force data structure F. Imbeaux.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 4 Computer Software.
May 2003National Coastal Data Development Center Brief Introduction Two components Data Exchange Infrastructure (DEI) Spatial Data Model (SDM) Together,
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
© 2013, published by Flat World Knowledge Chapter 10 Understanding Software: A Primer for Managers 10-1.
Jay Lofstead Input/Output APIs and Data Organization for High Performance Scientific Computing November.
Taming the Big Data in Computational Chemistry #euroCRIS2015 Barcelona 9-11-XI-2015 Carles Bo ICIQ (BIST) -
Alexandria University Faculty of Science Computer Science Department Introduction to Programming C++
Rule Engine for executing and deploying the SAGE-based Guidelines Jeong Ah Kim', Sun Tae Kim 2 ' Computer Education Department, Kwandong University, KOREA.
INFSO-RI JRA2 Test Management Tools Eva Takacs (4D SOFT) ETICS 2 Final Review Brussels - 11 May 2010.
Lecture #1: Introduction to Algorithms and Problem Solving Dr. Hmood Al-Dossari King Saud University Department of Computer Science 6 February 2012.
Joel Håkansson Henry Larsson Björn Westling. Contact information  TPB – The Swedish Library of Talking Books and Braille, Stockholm  Joel Håkansson.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
PROGRAMMING (1) LECTURE # 1 Programming and Languages: Telling the Computer What to Do.
® Sponsored by Improving Access to Point Cloud Data 98th OGC Technical Committee Washington DC, USA 8 March 2016 Keith Ryden Esri Software Development.
Java Programming: From the Ground Up
CSCI-235 Micro-Computer Applications
Antonio Laganà University of Perugia (UNIPG), IT
Programming Concepts and Languages
Chapter 4 Computer Software.
COMPUTER SOFT WARE Software is a set of electronic instructions that tells the computer how to do certain tasks. A set of instructions is often called.
Principles of Programming Languages
Toward an Ontology-Driven Architectural Framework for B2B E. Kajan, L
Presentation transcript:

Q5cost A common data format for program interchange in the Quantum Chemistry domain Elda Rossi CINECA – Bologna (Italy)

- Bologna (Italy) The problem  The problem was driven by a common practice in the group: a scientific problem needs several programs to be solved:  specific in-house codes written by some of the partners (FCI, CASDI, NEVPT, PROP, TouChain).  open source or commercial programs used for producing the standard quantities (COLUMBUS, DALTON, MOLCAS, MOLPRO, GAMESS, …)  None of them shares the same data format  Tricky procedure  Get all the programs  Install them on the local computer  Translate the data DaltonTransform TouChain FCI

- Bologna (Italy) The solution: a computing/collaborative Grid  Each program remains “at home”  Each program maintains its own proprietary data format  Each program must understand a common data format for interchange  A set of “translation utilities” takes care of translating data from/to the proprietary formats and the common format  A sort of “grid environment” makes programs communicate and exchange data through a single interface

- Bologna (Italy) The final vision  Resources (mainly applications)  geographically distributed  on heterogeneous platforms (resource GRID !!!)  A central (web) interface for accessing them  A way to define the flowchart and to run it (workflow!!!)  Dynamic control over the execution (steering)  Data are collected in a (virtually) central repository

- Bologna (Italy) The first problem: Data Format  One of the main challenges is a lack of community standards for data representation.  Two main strategies:  To develop translators that map one format to another. This requires n**2 translators, additional data formats requires 2n more translators (theoretically be very unscalable)  Support the development a community standard for the field and develop translators towards the common formats.  Other – more advanced – alternatives exist  if all the data formats being used are represented as XML, third party techniques such as XSLT can be used to perform the translations.  If not, metadata languages such as Data Format Description Language (DFDL) can be used to describe the data.

- Bologna (Italy) The Common Data Format  Define a data format for interoperability  What do we need  Complete  Flexible  Good performance  Near to chemists  External  How to use it  Data repository  Input/output wrappers for data conversion  A format for interchange, not to be used as an internal format  We are interested in functionality (that has to be general and complete). Performance and efficiency, although important, are not the main focus.

- Bologna (Italy) The data model  We identified two different kinds of information  small data quantities, mainly ASCII coded, like atom labels, geometry, symmetry, basis sets and so on  large datasets, normally binary, like integrals and expansion coefficients.

- Bologna (Italy) Small/large data  Small data: several initiatives, all based on XML (human readability and machine comprehension).  We have adopted the same approach  QC-ML (Quantum Chemistry Markup Language)  For processing QC-ML files we need a specific FORTRAN library  f77xml  Large datasets: XML is not convenient, mainly due to its verbosity.  HDF5 was adopted  Q5cost  For processing Q5cost files we need a specific FORTRAN library  Q5cost

- Bologna (Italy) Small data Symmetry: the symmetry of the system in terms of group name and other symmetry data; Geometry: the atomic composition of the system and its cartesian coordinates; Basis: the basis set information, either given by name or fully defined. Base facts  All these data are rather "small" and can be effectively described using a mark-up language for enhancing readability and standardisation.  A hierarchical scheme was designed and described based on XML  we called it QC-ML (Quantum Chemistry Mark-up Language). Derived facts Energies Properties Integrals Coefficients ….,. Large !!

- Bologna (Italy) QC-ML : an XML format for QC Base facts Derived facts

- Bologna (Italy) Large bin data  We looked for a suitable technology that can merge  portability,  efficiency,  FORTRAN binding,  data compression, and  easy access to information.  Usability is also important but not critical HDF5 was considered the right technology

- Bologna (Italy) What is HDF5 ?  HDF5 is a Format and software for scientific data produced by NCSA/University of Illinois  Support any kind of data for digital storage regardless of origin and size  Stores data in a highly organized and hierarchical format  High efficient chunked I/O  Allows inclusion of metadata (attributes)  Platform independent file format  Widely used in scientific or visualization codes

- Bologna (Italy) prop The hierarchical structure All the data objects are related within a hierarchical structure and logical containment relationship System overlap oneint twoint index(:,2) value(:) oneint twoint Title * Num_sym * Ctime (s) Atime (s) Q5version (s) QCML-ref Nuclear_energy Geometry Basis_set Name Comment Rank Symmetry Real/Complex index(:,2) value(:) index(:,4) value(:) index(:,?) value(:,2) index(:,2) value(:) index(:,4) value(:) index(:,?) value(:,?) WF tag_... AO Num_orb_sym * num_orb_tot (s) Labels tag_... MO Num_orb_sym * num_orb_tot (s) labels AO/SAO_ref_tag * SAO trans.matrix Classification Occ_num Symmetry *(d) SCF_energy overlap oneint twoint index(:,2) value(:) index(:,2) value(:) index(:,4) value(:) index(:,?) value(:,2) tag_... SAO Num_orb_sym * num_orb_tot (s) Labels AO_ref_tag* AO/SAO trans.matrix * Needed (s)system * (d)Needed/default Q5Cost 0.9.6

- Bologna (Italy) Tools: library Q5Cost_MOOneInt_create Creates a new container for MO one electron (file_id, error, [order], [user_tag] ) Q5Cost_MOOneInt_append Appends MO one electron integral values to the file (file_id, index, value, howmany, error, [user_tag] ) Q5Cost_MOOneInt_read Reads MO one electron integral values to the file (file_id, offset, howmany, index, value, error, [user_tag] ) Q5Cost_MOOneInt_clear Appends MO one electron integral values to the file ( file_id, error, user_tag ) Chemists prefer Fortran:  F90 the general HDF5 library distributed by NCSA, we did the specific Q5cost interface  Other “bindings” produced authomatically by a python script: C, C++, F77

- Bologna (Italy) Tools:  Q5dump: allows you to see the content of a Q5 file  Wrappers: a program, specifically designed for each QC code in the chain, capable of retrieving information from, and writing information to, the file in accordance with the defined syntax. It ises the Q5cost library.  input wrapper reads data from the repository file and converts them into the QC code specific input,  output wrapper reads data from the QC code specific output and adds them to the repository file.

- Bologna (Italy) Wrappers  FullCI: embedded  TouChain: wrapper via MolCOST, embedded in progress  MRCC: in progress  Dalton: wrapper  GamessUS: embedded in progress  MolCAS: wrapper  Columbus: wrapper  OpenBabel: in progress  Molekel (via OpenBabel): in progress

- Bologna (Italy) Conclusion and future work  I discussed our solution for code integration (in QC context)  the data model consists in two parts  Small data  XML  Large data  HDF5  A hierarchical structure of both of them has been proposed  Some sw tools (Fortran libraries) are available designed on the data model  Enhancement of the data model  Collaboration towards a common data model  Put everything on the “grid” XMLHDF5

- Bologna (Italy) Acknowledgment COST D37 project: Grid Computing in Chemistry- GRIDCHEM  DeciQ Working Group: Code interoperability in Computational Quantum Chemistry – Elda Rossi, Andrew Emerson – CINECA – Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna – Renzo Cimiraglia, Celestino Angeli, Chiara Pastore - Università di Ferrara – Daniel Maynau, Stefano Evangelisti, Anthony Scemama - IRSAMC – Toulouse – Vallet Valerie, JeanPierre Flament (University of Lille) –José Sanchez-Marin - Universitat de Valencia – Peter Szalay, Attila Tajiti - Eötvös Loránd University, Budapest – Kállay Mihaly (Budapest University of Technology and Economics) –Baldridge Kim (University of Zürich) –Kenneth Ruud (University of Tromsø) –Stefano Borini (now at DTU – Denmark)