Markup in Atomic and Molecular Simulations: Implementation & Issues Jon Wakelin Dept. Earth Sciences University of Cambridge.

Slides:



Advertisements
Similar presentations
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
Advertisements

Teaching with Greenfoot
1 XML Data Management Course Outline and Organisation Werner Nutt.
JAXB Java Architecture for XML Binding Andy Fanton Khalid AlBayat.
University of Leeds Department of Chemistry The New MCM Website Stephen Pascoe, Louise Whitehouse and Andrew Rickard.
Information Retrieval in Practice
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Extensible Markup Language (XML). Why XML? XML's set of tools allows developers to create web pages - and much more. XML allows developers to set standards.
LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
Overview of Search Engines
Technical Track Session XML Techie Tools Tim Bornholt.
Introduction to XSLT & its use in Grainger Library full-text & metadata projects Thomas G. Habing Grainger Engineering Library Presentation to ASIS&T,
PHP and XML TP2653 Advance Web Programming. PHP and XML PHP5 – XML-based extensions, library and functionalities (current XAMPP PHP version is )
CLARIN tools for workflows Overview. Objective of this document  Determine which are the responsibilities of the different components of CLARIN workflows.
Parallel Programming in Java with Shared Memory Directives.
An Extension to XML Schema for Structured Data Processing Presented by: Jacky Ma Date: 10 April 2002.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
Scientific Markup Languages Birds of a Feather A 10-Minute Introduction to XML Timothy W. Cole Mathematics Librarian & Professor of.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
Copyright © 2012 Accenture All Rights Reserved.Copyright © 2012 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
XML and its applications: 4. Processing XML using PHP.
XML eXtensible Markup Language w3c standard Why? Store and transport data Easy data exchange Create more languages WSDL (Web Service Description Language)
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Extensible Markup and Beyond
1 XML Data Management Course Outline and Organisation Werner Nutt.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
C++ Implementation ( Version 1 – Text Interface ) Elimination of services of our system. Elimination of services of our system. General Flow of the program.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Intro to XML Originally Presented by Clifford Lemoine Modified by Box.
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
The european ITM Task Force data structure F. Imbeaux.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
SIMO SIMulation and Optimization ”New generation forest planning system” Antti Mäkinen & Jussi Rasinmäki Dept. of Forest Resource Management.
XML Steve Fisher/RAL. 20 October 2000XML - Steve Fisher/RAL2 Warning Information may not be all completely up to date.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
Cross Language Clone Analysis Team 2 October 13, 2010.
XML eXtensible Markup Language. XML A method of defining a format for exchanging documents and data. –Allows one to define a dialect of XML –A library.
COSC617 Project XML Tools Mark Liu Sanjay Srivastava Junping Zhang.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Chapter One An Introduction to Programming and Visual Basic.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
The HDF Group Data Interoperability The HDF Group Staff Sep , 2010HDF/HDF-EOS Workshop XIV1.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
Dom and XSLT Dom – document object model DOM – collection of nodes in a tree.
Business Rules for MeF By Greg Martinez & Donna Mucilli.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Using the ARCS Grid and Compute Cloud Jim McGovern.
Taming the Big Data in Computational Chemistry #euroCRIS2015 Barcelona 9-11-XI-2015 Carles Bo ICIQ (BIST) -
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Connecting Architecture Reconstruction Frameworks Ivan Bowman, Michael Godfrey, Ric Holt Software Architecture Group University of Waterloo CoSET ‘99 May.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Lecture #1: Introduction to Algorithms and Problem Solving Dr. Hmood Al-Dossari King Saud University Department of Computer Science 6 February 2012.
XML & JSON. Background XML and JSON are to standard, textual data formats for representing arbitrary data – XML stands for “eXtensible Markup Language”
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Software Engineering Algorithms, Compilers, & Lifecycle.
Information Retrieval in Practice
Java XML IS
CO4301 – Advanced Games Development Week 2 Introduction to Parsing
Introduction an Open Source, Open Data international collaboration, based entirely in the internet started following a CECAM meeting in Zaragoza:
Choice of Programming Language
Computer Programming.
CS 240 – Advanced Programming Concepts
XML and its applications: 4. Processing XML using PHP
Presentation transcript:

Markup in Atomic and Molecular Simulations: Implementation & Issues Jon Wakelin Dept. Earth Sciences University of Cambridge

Overview Background – The problem A solution An Implementation Demo Summary

Background (1) The computational chemistry and physics communities have more data than ever before Advances in Computer power Access to HPC facilities Algorithmic & scientific advances Better exploitation of existing facilities (Grid) High throughput computing (Condor) Same factors have lead to qualitative changes in data Can now attempt new kinds of calculations

Background (2) The majority of this data is reused Starts in a database Passed into a program Post-processed Visualized, etc… Most notably… Structures/Coordinates Forcefields/Interatomic potentials Basis Sets Pseudopotentials

Background (3) So the nature of our data has changed but the way we deal with it has not Still rely on bespoke text and binary formats Issues such as interoperability, data management and data reuse are tackled in an informal or ad hoc manner Binary markup languages (NetCDF, HDF)

A solution XML Allows the user to describe data of arbitrary structure Or… allows the user to structure his/her data arbitrarily Provides us with a known format (i.e. it is easy to parse) Many free tools and standards ~7 years old, so fairly well road tested CML (Chemical Markup Language) Extensions to CML core for simulations - CMLComp CML is not tied to a particular chemistry or physics program

What will markup do for us? Facilitate data exchange Between chemistry and physics software, but also… Easier to extract data to databases Facilitates other tasks such as data-mining Make data producers more accountable Schemas and related technologies Dictionaries Reduce Software development (eventually) No need to support multiple formats No need to write ‘converters’ Standard libraries for processing CML

Data Exchange (examples) Equilibrate MD in DLPOLY then continue in SIESTA Visualize output from Gaussian in Jmol Compare timings between VASP and CASTEP Take structure from ICSD and relax in SIESTA Develop forcefield in GULP use in DLPOLY Calculate property X in Dalton and property Y in GAMES And so on… in fact while these examples should be familiar to us all, they are essentially trivial, however… Grid/Condor facilitate hi-throughput computing Often want to create complex workflow schemes E.g. using Condor’s DAG Manager But there is no prescription for how to handle the data as it ‘flows’

In.xml COD E Out.xm l Parse r In.txt COD E Out.xm l In.xml In.txt COD E Out.txt Parse r In.txt COD E Out.txt Out.xm l In.xml Parse r Design 1 Design 2Design 3

Design 1 Only option when you don’t have access to source code Input: XSL or program using SAX, DOM Output: JumboMarker Programs Using this Design: MOPAC, Gaussian Pros & Cons +Generality – it will work for any code! +Don't need access to the source code - Requires more user intervention - Parsing text to create XML! - Need to know all combinations that the code can throw at you - Is at the mercy of changes to the output by the code developers.

Design 2 When you have access to the source code When you are using Fortran Input: XSL, program with SAX, DOM Output: Jumbo90, WXML Examples: SIESTA, GULP, DLPOLY Pros and Cons +Avoid Tricky text => XML conversion +Only have to maintain a single program +Simpler from point of view of end user - End user still has to convert CML => text

Design 3 When you have access to the source code Input/Output: DOM Examples: Jmol, JChemEdit, openBabel Pros and Cons +Simplest for end user - Most Chem/Phys programs still written in Fortran Limited XML support for Fortran - CML is not the file format of your program A CML file is not guaranteed to contain all the info you need Alternatively it may contain to much “Towards a common data and command …”

Implementation - Output An F90 library for creating well-formed XML WXML (A. Garcia) An F90 library for formatting CML Jumbo90 Provides convenience routines for creating CML elements Has been used in SIESTA, GULP, DLPOLY We should look to auto-generate these libraries But output is the easy part...

Implementation - Input Could link to libxml2 (C Library) Could implement SAX or DOM in Fortran Several groups have tried this A. Garcia has an F90 SAX parser We have built an F95 DOM parser on top of this Currently supports DOM 1.0 Could we go one step further? Could we implement a CML-DOM in Fortran? Generic W3 DOM Vs. language specific DOM E.g. MathML-DOM, SVG-DOM, CML-DOM

XML as a tree Jon Smith Perso n Nam e Stats FstSec Heigh t Wiegh t Jon Smith 2060

Generic DOM Tree Elemen t Text = person = stats = weight = 60

Generic DOM Implementation in F95 Inheritance Vs. flattened view Similarities with C’s libxml implementation Using Linked-lists/pointers Functions return pointer to data structures Remember to use pointer syntax!!!! Things to do No Validation No Xpath No 16 bit strings Benefits Portable Live nodes

Demo siesta.xml – H 2 O siesta.xml siesta.html – H 2 O siesta.html siesta.html – Pyrophyllite siesta.html gulp.html – Al/Cu cluster gulp.html

Summary Began with three Observations: Quantitative and Qualititative changes in our Data Data exchange is essential (even in the simplest calculation) Bespoke data formats and ad hoc solutions for data exchange Changing the way we deal with data, will: Facilitate data exchange and interoperability Make data and data producers more accountable Reduce code development (but not yet) Implementation Design depends on: access to source, programming language Output – Jumbo90/WXML Input – F90 implementations of SAX/DOM/CML-DOM

Acknowledgments P. Murray-Rust & A. Garcia NERC