New Tools for Storing and Accessing Spectroscopic Data The Development of an XML Schema for the HITRAN Database Dr Christian Hill Department of Physics.

Slides:



Advertisements
Similar presentations
Christian Endres, Universität zu Köln 69 th ISMS, Urbana-Champaign, 2014 IMPROVED INFRASTUCTURE FOR CDMS AND JPL MOLECULAR SPECTROSCOPY CATALOGUES Christian.
Advertisements

THE PROGRAM COMPLEX FOR COMPUTATION OF SPECTROSCOPIC CHARACTERISTICS OF ATOMIC AND MOLECULAR GASES IN UV, VISIBLE AND IR SPECTRAL RANGE FOR A WIDE RANGE.
University of Leeds Department of Chemistry The New MCM Website Stephen Pascoe, Louise Whitehouse and Andrew Rickard.
Quick-and-dirty.  Commands end in a semi-colon ◦ If you forget, another prompt line shows up  Either continue the command or…  End it with a semi-colon.
Theoretical work on the water monomer and dimer Matt Barber Jonathan Tennyson University College London September 2009.
U of R eXtensible Catalog Team MetaCat. Problem Domain.
VI-4 JPL Catalog Upgrades: New Tools, New Formats and New Interfaces BRIAN J. DROUIN, SHANSHAN YU, JOHN C. PEARSON, Jet Propulsion Laboratory, California.
Molecular Databases: Evolution and Revolution Laurence S. Rothman Iouli E. Gordon Harvard-Smithsonian Center for Astrophysics Atomic and Molecular Physics.
1 Nassau Community CollegeProf. Vincent Costa Acknowledgements: Introduction to Database Management, All Rights ReservedIntroduction to Database Management.
THE VU AGENDA EXCELLENT, ENGAGED AND ACCESSIBLE Victoria University Alesco Custom Business Rules.
IST Databases and DBMSs Todd S. Bacastow January 2005.
Dale Roberts 1 Department of Computer and Information Science, School of Science, IUPUI Dale Roberts, Lecturer Computer Science, IUPUI
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
Chapter 4: Organizing and Manipulating the Data in Databases
Collaborating with VAMDC Guy Rixon RADAM database workshop, Caen, October 2013.
Attribute Data in GIS Data in GIS are stored as features AND tabular info Tabular information can be associated with features OR Tabular data may NOT be.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
 Introduction Introduction  Purpose of Database SystemsPurpose of Database Systems  Levels of Abstraction Levels of Abstraction  Instances and Schemas.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
VAMDC Virtual Atomic and Molecular Data Centre (.org) Coordinator: M.L. Dubernet, Paris GREAT-ESF Workshop, August.
Chapter 9 Designing Databases Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Introduction to Microsoft Access 2003 Mr. A. Craig Dixon CIS 100: Introduction to Computers Spring 2006.
Chapter 4: Organizing and Manipulating the Data in Databases
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
DCE (distributed computing environment) DCE (distributed computing environment)
Class 1Intro to Databases Goals of this class Understand the architecture behind web database applications Gain a basic understanding of what relational.
VAMDC tutorial for prospective data-providers Guy Rixon meeting, IPR, November 2013.
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
VAMDC use-case for the RDA Data Citation Working Group C.M. Zwölf and VAMDC consortium 6 th RDA Plenary PARIS September 2015.
Intro to XML Dr. Lam TECM5191. Why XML? Text CHRISLAM138 to
Chapter No 4 Query optimization and Data Integrity & Security.
1 © 1999 Microsoft Corp.. Microsoft Repository Phil Bernstein Microsoft Corp.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
What have we learned?. What is a database? An organized collection of related data.
SIMO SIMulation and Optimization ”New generation forest planning system” Antti Mäkinen & Jussi Rasinmäki Dept. of Forest Resource Management.
XML Engr. Faisal ur Rehman CE-105T Spring Definition XML-EXTENSIBLE MARKUP LANGUAGE: provides a format for describing data. Facilitates the Precise.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
69 th International Symposium on Molecular Spectroscopy / Champaign-Urbana, Illinois, USA, June 16–20, 2014 CH 4, C 2 H 4, SF 6 AND CF 4 CALCULATED SPECTROSCOPIC.
Database Management Systems (DBMS)
Introduction to Information and Computer Science
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
UCL DEPARTMENT OF SPACE AND CLIMATE PHYSICS MULLARD SPACE SCIENCE LABORATORY Taverna Plugin VAMDC and HELIO (part of the ‘taverna-astronomy’ edition) Kevin.
HITRAN in the XXI th Century: Beyond Voigt and Beyond Earth L.S. Rothman, a I.E. Gordon, a C. Hill, a,b R.V. Kochanov, a,c P. Wcisło, a,d J. Wilzewski.
RF11 THE JPL MILLIMETER AND SUBMILLIMETER SPECTRAL LINE CATALOG BRIAN J. DROUIN, SHANSHAN YU, JOHN C. PEARSON, Jet Propulsion Laboratory, California Institute.
Relational Database Systems Bartosz Zagorowicz. Flat Databases  Originally databases were flat.  All information was stored in a long text file, called.
XML Extensible Markup Language
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
A dynamic database of molecular model spectra
Databases and DBMSs Todd S. Bacastow January
Introduction to Databases by Dr. Soper extended with more examples
HITRANonline: A New Structure and Interface for HITRAN
Software Design and Architecture
ICT Database Lesson 1 What is a Database?.
Using Access and the Web
Microsoft Office Illustrated
Phil Bernstein Microsoft Corp.
Database Management System (DBMS)
Chapter 1: The Database Environment
Database Systems Instructor Name: Lecture-3.
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
DATABASES WHAT IS A DATABASE?
The ultimate in data organization
The Database Environment
Course Instructor: Supriya Gupta Asstt. Prof
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Presentation transcript:

New Tools for Storing and Accessing Spectroscopic Data The Development of an XML Schema for the HITRAN Database Dr Christian Hill Department of Physics and Astronomy, UCL

HITRAN format since E E Q 32f E E R 24f E E Q 49f * E E R 1e E E Q 27e E E R 73e E E R 45e E E Q 32f E E Q 18f E E R 46e ASCII text format: one line of 160 bytes per transition; Fixed-width formats for data fields: Fortran-friendly; Total database size (without supplementary data): 440 MB.

HITRAN format since E E Q 32f E E R 24f E E Q 49f * E E R 1e E E Q 27e E E R 73e E E R 45e E E Q 32f E E Q 18f E E R 46e E E E E E E E E E E E E+02 Molecule ID Isotopologue ID

HITRAN format since E E Q 32f E E R 24f E E Q 49f * E E R 1e E E Q 27e E E R 73e E E R 45e E E Q 32f E E Q 18f E E R 46e E E E E E E E E E E E E+02 Transition Frequency, /cm -1

HITRAN format since E E Q 32f E E R 24f E E Q 49f * E E R 1e E E Q 27e E E R 73e E E R 45e E E Q 32f E E Q 18f E E R 46e E E Q 32f E E R 24f E E Q 49f * E E R 1e E E Q 27e Transition Strength, S /cm -1 (molec.cm -2 )

HITRAN format since E E Q 32f E E R 24f E E Q 49f * E E R 1e E E Q 27e E E R 73e E E R 45e E E Q 32f E E Q 18f E E R 46e Q 49f “global” quanta: vibrational / electronic “local” quanta: rotational, symmetry

HITRAN format since E E Q 32f E E R 24f E E Q 49f * E E R 1e E E Q 27e E E R 73e E E R 45e E E Q 32f E E Q 18f E E R 46e * uncertainty codes reference codes line-mixing flag

HITRAN format since 2004 Limitations: Hard to extend to include e.g. – quantum numbers for complex states, – line-mixing data, – new line-broadening species (e.g. H 2 ), – parameters for lineshapes other than Voigt; Many states duplicated (participate in more than one transition); Arbitrary default entries indicating unavailable data (e.g. -1. for lower-state energy); Errors and inconsistencies hard to identify (format contains no semantic information).

VAMDC Virtual Atomic and Molecular Data Centre; EU Project funded under Framework Programme 7: Research Infrastructure; Aims to build “an interoperable e-infrastructure for the exchange of atomic and molecular data”; Development of tools for storing, searching and manipulating AM data from many different sources.

Relational Database Model States Table StateIDEnergyUncertaintyJKaKcv1v2v3… S1-H2O … S2-H2O … S3-H2O … S4-H2O …... Transitions Table TransIDUpperStateIDLowerStateID S… L1-H2O-1S2-H2O-1S1-H2O E-25… L2-H2O-1S3-H2O-1S7-H2O E-24… L3-H2O-1S4-H2O-1S12-H2O E-25… L4-H2O-1S9-H2O-1S29-H2O E-25… …

Relational Database Model States Table StateIDEnergyUncertaintyJKaKcv1v2v3… S1-H2O … S2-H2O … S3-H2O … S4-H2O …... Transitions Table TransIDUpperStateIDLowerStateID S… T1-H2O-1S2-H2O-1S1-H2O E-25… T2-H2O-1S3-H2O-1S7-H2O E-24… T3-H2O-1S4-H2O-1S12-H2O E-25… T4-H2O-1S9-H2O-1S29-H2O E-25… …

Relational Database Model Based on MySQL (free, open-source) Query using SQL = Structured Query Language Web interface: – Output formats: – Original HITRAN format (.par) – ASCII-text table of tab-delimited columns (.txt) – XSAMS (.xml) …

XSAMS Under development by the IAEA An XML format for distributing Atomic and Molecular Spectroscopic Data Enforces good practice: – Data sources (e.g. literature references) – Uncertainties – Compulsory units

XSAMS – Example: a molecular state of H2O A state of H2(16O) X

XSAMS – Example: a molecular state of H2O e-02 S145-H2O-1 S148-H2O e e-03 E1...

Advantages of Relational DB / XSAMS Easily extensible, for example: – more complex molecular states, – parameters for multiple lineshapes (Voigt, Galatry, …), – line-mixing effects; Data provenance: – each item of data can be given a source, – each data set requested from the online database can be given a timestamp and reproduced at a later time; Easy to validate the data …

Disadvantages of XSAMS: Extremely verbose: typically 50× larger file sizes; More computational power required to write and parse XML than “fixed” formats; Doesn’t play nicely with Fortran (yet). But: Compresses well typically 50×! Can be transformed into other formats.

HITRAN data validation Introduction of a data model gives meaning to each item of data: – Can validate the quantum numbers assigned to each state (e.g. ensure J ≥ K), – Can verify transitions obey certain selection rules (e.g. on parity: + − for electric dipole transitions); States are stored separately from Transitions: -Can verify that the same state is always given the same energy.

HITRAN data validation example: H 2 S HITRAN.par format: E E X Lower state in XSAMS format:

HITRAN data validation example: NH 3 Two transitions in HITRAN.par format: E E s s s s E E s s s s

HITRAN data validation example: NH 3 Two transitions in HITRAN.par format: E E s s s s E E s s s s

HITRAN data validation example: NH 3 Two transitions in HITRAN.par format: E E s s s s E E s s s s

Inconsistencies identified Many examples of the same state being given different energies (affects the transition intensity temperature- dependence) NH 3 – 6 states have K > J – 933 lines have inconsistent inversion symmetry labels (a and s) OH – 1096 states show an incorrect correlation of Hund’s case (a) and case (b) quantum numbers (also for NO) H 2 S – 53 states with K a >J HOCl – 2104 states do not have K a + K c = J or J+1

Acknowledgements VAMDC consortium Prof Jonathan Tennyson, UCL