Standard Archive Format for Europe Gian Maria PinnaESA WGISS-21 Budapest 8 May 2006.

Slides:



Advertisements
Similar presentations
METS Awareness Training An Introduction to METS Digital libraries – where are we now? Digitisation technology now well established and well-understood.
Advertisements

Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
Database Planning, Design, and Administration
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
DOSTAG Meeting #51, 3 rd May 2007 Access to ESA Earth Observation data (past and current missions)
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
TC3 Meeting in Montreal (Montreal/Secretariat)6 page 1 of 10 Structure and purpose of IEC ISO - IEC Specifications for Document Management.
Database Planning, Design, and Administration Transparencies
ESA Web Mapping Activities CEOS-WGISS ESRIN 7-10-May-2002 Christophe Caspar – Giuseppe Tandurella.
Automatic Evaluation of Migration Quality in Distributed Networks of Converters Miguel Ferreira Supervisors Ana Alice Baptista.
Chapter 9 Database Design
Lecture Nine Database Planning, Design, and Administration
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
Chapter 1 Introduction to Databases
November 2011 At A Glance GREAT is a flexible & highly portable set of mission operations analysis tools that increases the operational value of ground.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
ESA Report to DAI/IPR WG Gian Maria Pinna ESA/ESRIN DAI/IPR meeting Colorado Springs January 2007.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
ISO/TC211 Geographic Information/Geomatics Implementing ISO Metadata David Danko Work Item 15—Project Leader
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
MEDIN Partners Meeting Sept 2010 DASSH – The Archive for Marine Species and Habitats Dan Lear DASSH Project Co-ordinator Marine.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
FP OntoGrid: Paving the way for Knowledgeable Grid Services and Systems WP8: Use case 1: Quality Analysis for Satellite Missions.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
EXPECTATIONS OF TURKISH ENVIRONMENTAL SECTOR FROM INSPIRE Ministry of Environment and Forestry June, 2010 Özlem ESENGİN Ahmet ÇİVİ Tuncay DEMİR.
MASSACHUSETTS INSTITUTE OF TECHNOLOGY NASA GODDARD SPACE FLIGHT CENTER ORBITAL SCIENCES CORPORATION NASA AMES RESEARCH CENTER SPACE TELESCOPE SCIENCE INSTITUTE.
1 Minggu 9, Pertemuan 17 Database Planning, Design, and Administration Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
1 The NERC DataGrid DataGrid The NERC DataGrid DataGrid AHM 2003 – 2 Sept, 2003 e-Science Centre Metadata of the NERC DataGrid Kevin O’Neill CCLRC e-Science.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
IODE Ocean Data Portal - technological framework of new IODE system Dr. Sergey Belov, et al. Partnership Centre for the IODE Ocean Data Portal MINCyT,
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Database Planning, Design, and Administration Transparencies
ASI-Eumetsat Meeting Matera, 4-5 Feb CNM Context Matera, February 4-5, 20092ASI-Eumetsat Meeting.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
ESA UNCLASSIFIED – For Official Use ESA AVHRR Data Holdings and Ongoing Curation Activities Mirko Albani, Sergio Folco.
Information Systems Engineering. Lecture Outline Information Systems Architecture Information System Architecture components Information Engineering Phases.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Data SG – Archive Task Team - 9th May 2002 CEOS ICF Baseband Data Archive InterChange Format Baseband Data Archive InterChange Format Commitee on Earth.
METS: Implementing a metadata standard in the digital library Richard Gartner Oxford University Library Services
Draft GEO Framework, Chapter 6 “Architecture” Architecture Subgroup / Group on Earth Observations Presented by Ivan DeLoatch (US) Subgroup Co-Chair Earth.
WGISS /09/2015 DATA PRESERVATION – CNES APPROACH B. Chausserie-Laprée.
CCSDS MOIMS Falls Meeting 2007 – Colorado Springs - June 2006 SAFE Status Progress status & f Stéphane Mbaye
CCSDS MOIMS Springs Meeting 2006 – Rome - June 2006 XFDU & SAFE - ESA return from experience ESA return from experience & f Stéphane Mbaye
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Firmware - 1 CMS Upgrade Workshop October SLHC CMS Firmware SLHC CMS Firmware Organization, Validation, and Commissioning M. Schulte, University.
ESA UNCLASSIFIED – For Official Use Data Stewardship Interest Group WGISS-40 Meeting Preservation of SW & Documents at CEOS Agencies Approaches and Lessons.
ESA Report to DAI/IPR WG Gian Maria Pinna DAI/IPR meeting Toulouse 2-5 November 2004.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
EO Dataset Preservation Workflow Data Stewardship Interest Group WGISS-37 Meeting Cocoa Beach (Florida-US) - April 14-18, 2014.
Open Grid Services for Earth Observation Pedro Gonçalves.
Cloud-based e-science drivers for ESAs Sentinel Collaborative Ground Segment Kostas Koumandaros Greek Research & Technology Network Open Science retreat.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
ESA UNCLASSIFIED – For Official Use Data Stewardship Interest Group EO Data Associated Knowledge Preservation WGISS-41 Meeting - Canberra, (AUS) 14–18.
National Aeronautics and Space Administration 1 CCSDS Information Architecture Working Group Daniel J. Crichton NASA/JPL 24 March 2005.
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
IODE Ocean Data Portal - technological framework of new IODE system Dr. Sergey Belov, et al. Partnership Centre for the IODE Ocean Data Portal.
Jordi Farres HMA-WG Meeting ESRIN, 23 Jan 2013
Paul Eglitis [IEEE] and Siri Jodha S. Khalsa [IEEE]
Gian Maria Pinna, Vincenzo Beruti ESA
Data Stewardship Interest Group WGISS-42 Meeting
ESA Report to DAI/IPR WG
The VITO Earth Observation LTDA Facility
Metadata The metadata contains
Nancy Y. McGovern Digital Preservation Officer, ICPSR IASSIST 2007
Presentation transcript:

Standard Archive Format for Europe Gian Maria PinnaESA WGISS-21 Budapest 8 May 2006

LTA Strategy and Rationale  The mandate for ESA’s EO missions is to maintain the archive for at least 10 years after the end of the mission  The management of its vast amount of heterogeneous datasets poses problems for their long ‑ term preservation  The ESA’s EO Ground Segment Department has recognized since many years the need to establish a clear strategy for the management of the long ‑ term archives.

LTA Strategy and Rationale  This strategy has as its main goals:  archive maintenance in order to ensure data integrity  archive and data management rationalization  data conversion to new technologies in order to reduce cost of operations  enhancement of data access  standardization of the format in which the datasets are preserved  Achievements aim at simplifying the exchange and interoperability of data between ESA and external operators

Long-Term Preservation Challenge The data to be preserved for the long term require a special attention, which reflects in costly operations for their exploitation and maintenance:  datasets have to be regularly converted into new media technology, to prevent the problems created by their obsolescence  being the long term archive normally based on datasets with a very low processing level, higher level products have to be generated by processing systems  in the case of distribution of the data holdings directly from the archive, it is normal practice that they are converted or reformatted into a format more oriented to the end-user utilization  it is common to have to extract from the archive a portion of the dataset (subsetting) to create a “child” product  more and more the data are distributed to the end-users, and exchanged among data holders, in electronic format over network infrastructures (private Intranets, public Internet, academic networks, etc.)

Cost of LTA Operations One of the reasons that contribute to the high operations and maintenance costs of the long term archives is the excessive proliferation of diverse and heterogeneous data formats, caused by mainly three reasons:  the lack of an agreed standard in the Earth Observation community, reason for which the formats tended to be specific for the sensor(s) each missions carried on board  legacies from old ground segments architectures, which tended not to reuse previously developed elements  the non-mature status until recently of the information technologies and standards used to describe and package the data, preventing the creation of a format able to satisfy at the same time the requirements for the long-term data preservation and their handling in the processing centres

HARM (Historical Archive Rationalization and Management) HARM aims at achieving the following main goals:  analysis and definition of a new ESA standard long term archive data format  SAFE  conversion of archived data from the original format into the new format and transcription onto the ESA’s automated long term archive, based on the ESA’s MMFI (Multi Mission Facility Infrastructure) architecture  rationalization of the archives by centralization of historical data (non-active mission) and archive of on-going missions into specialized centres  optimization of the archives, reducing operator intervention for system maintenance and improvement of archive capacity through removal of redundant data (whenever applicable and due to orbit overlap, duplicated transcriptions etc.)  generation of the related metadata and inventory information, to be ingested in the centralized ESA catalogue

ESA’s goal is to make SAFE:  Compliant with ISO 14721:2003 OAIS (Open Archival Information System) reference model  Compliant with emerging CCSDS/ISO XFDU (XML Formatted Data Units) packaging format  Designed to act as a standard format for archiving and conveying data within ESA Earth Observation archiving facilities Although the primarily goal of SAFE is to handle EO data with processing levels close to the usually called “level 0”, no a-priori limitations exist for the packaging of higher level products

SAFE Information Model

SAFE Product Information  Acquisition period: the acquisition period metadata that provide the time extents of all the data contained in the SAFE product  Platform/Sensor identification: the platform identifies the system (satellite/aircraft) that acquired the EO data wrapped by the SAFE product  Product History: a processing log collecting the historical information dedicated to the maintenance and the traceability of the product

SAFE Product Data/Metadata  Orbital information: the reference to the trajectory of the platform that acquired the data. This information may locate one or several orbit paths, the corresponding cycle, track, etc.  Geolocation information: the information locating the product footprint on the Earth’s surface, either as a series of tie points or by reference to a world reference system of the acquiring platform. The Geolocation information, may also attach additional information to each localized element, including cloud coverage vote, meteorological information, etc.  Quality/Fixity information: information about the quality of an EO dataset. SAFE makes use of techniques (i.e. XPath) that allows the precise location of the corrupted or missing elements up to the bit level  Representation information: any data contained in a SAFE product shall be accompanied with its representation information formally and numerically exploitable

SAFE Logical Model  A SAFE product is a logical tree of “Content Units” forming the so-called “Information Package Map”  The root Content Unit has predefined associations to the information applicable to the overall product, i.e. at least the “Acquisition Period”, the “Platform/Sensor Identification” and the “Product History”  The structure of the children Content Units is less constrained and depends mainly of the logical view of the wrapped data. In most cases, one Content Unit matches one EO dataset and its accompanying metadata. Several Content Units may, however, share the same metadata.

SAFE Logical Model

SAFE Physical Model

SAFE Physical Components  Manifest file: an XML document conforming the XFDU Manifest file. It contains the definition of the Information Package Map, the wrapped Metadata Objects, the wrapped Data Objects and references to the external files containing the Metadata and Data Objects  Binary or XML files: the data or metadata object contents. Currently, only two types of files have been identified, i.e. binary matching MIME octetstream definition and XML documents. Each of these files shall be accompanied with one or more XML Schema document controlling its content  XML Schema files: the representation information of the data held by a SAFE product. In order to represent the binary information, SAFE also defines specific markups that annotate the XML Schema documents to provide information on the physical structure (SDF markups). Thanks to these specific annotations, the contents of the binary files are described up to the bit level with a common technique as for XML documents

Preservation of Binary Objects  Annotated XML Schemas  Structured Data File (SDF) markups <xsd:element name="record" type="xs:int" minOccurs="0" maxOccurs="unbounded"> 12 <xsd:element name="record" type="xs:int" minOccurs="0" maxOccurs="unbounded"> 12

SDF Markups (1)

SDF Markups (2)

SAFE Volume Set IdentificationVolumeTitle PGSI-GSEG-EOPG-FS Volume 1Core Specifications PGSI-GSEG-EOPG-FS Volume 2Recommendation for specializations PGSI-GSEG-EOPG-FS Volume 3 Book 1ENVISAT ASAR Products PGSI-GSEG-EOPG-FS Volume 3 Book 2ENVISAT MERIS Products PGSI-GSEG-EOPG-FS Volume 3 Book 3ENVISAT AATSR Products PGSI-GSEG-EOPG-FS Volume 3 Book 4ENVISAT DORIS Products PGSI-GSEG-EOPG-FS Volume 3 Book 5ENVISAT GOMOS Products PGSI-GSEG-EOPG-FS Volume 3 Book 6ENVISAT MIPAS Products PGSI-GSEG-EOPG-FS Volume 3 Book 7ENVISAT RA2 Products PGSI-GSEG-EOPG-FS Volume 3 Book 8ENVISAT MWR Products PGSI-GSEG-EOPG-FS Volume 3 Book 9ENVISAT SCIAMACHY Products PGSI-GSEG-EOPG-FS Volume 3 Book 10ENVISAT Auxiliary Data Products

SAFE Volume Set IdentificationVolumeTitle PGSI-GSEG-EOPG-FS Volume 4ERS Products PGSI-GSEG-EOPG-FS Volume 5Landsat Products PGSI-GSEG-EOPG-FS Volume 6Terra/Aqua MODIS Products PGSI-GSEG-EOPG-FS Volume 7NOAA AVHRR Products PGSI-GSEG-EOPG-FS Volume 8SeaStar SeaWiFS Products PGSI-GSEG-EOPG-FS Volume 9Nimbus CZCS Products PGSI-GSEG-EOPG-FS Volume 10SPOT HRV(IR) Products PGSI-GSEG-EOPG-FS Volume 11JERS SAR/OPS Products PGSI-GSEG-EOPG-FS Volume 12MOS MESSR/VTIR Products PGSI-GSEG-EOPG-FS Volume 13IRS-P3 MOS Products

SAFE I/O API In addition to the Core SAFE specifications and the Specializations, the HARM project is implementing a set of software APIs dedicated to SAFE product users. The main use cases of these software components are:  Create a SAFE product  Open a SAFE product  Remove a SAFE product  Move a SAFE product considering or not the referenced objects  Validate a SAFE product including or not the external object contents  Add/Remove/Browse Content Units of the Information Package Map  Add/Remove/Retrieve the Metadata Objects  Add/Remove/Retrieve the Data Objects  Browse/Select/Extract information from either Binary or XML objects  Identify/Sort Quality Information to a specific part of the objects In the course of 2006, ESA will develop and make available a SAFE Toolbox, containing tools to handle SAFE products.

SAFE I/O API  XFDU Java API: this API provides the general features common to all XFDU packages. This API has been developed jointly with the SAFE Java API and is presently used in the framework of the activity of the CCSDS Information Packaging and Registries (IPR) Working Group in support to the validation of the XFDU recommendation developed by the group  XFDU C++ Wrapper: a C++ wrapper on top of the XFDU Java API. Most of the functionality is preserved from the Java API, apart the capability of browsing the binary contents  Data Request Broker (DRB): from which SAFE inherits the capability to access data independently from their formats. DRB supports in particular the SDF markup language used for annotating the XML Schema documents acting as representation information of the SAFE product objects.

SAFE Benefits  SAFE is mainly devoted to the long-term preservation of data, as its full compliance with the CCSDS/ISO OAIS RM and XFDU standards reveal  SAFE permits an easier and more effective migration of data to other future standards  SAFE permits an easier reformatting to other formats, including end-user formats for products distribution  SAFE will easy the maintenance of the SW that use the data, even historical, thus decreasing the chances of obsolescence and improving their usability

SAFE Benefits  SAFE being self describing, it greatly facilitates the format maintenance  SAFE format documents are built automatically from the XML Schemas, so the maintenance of the SAFE format is more robust with respect to “old-fashion” formats  SW that access SAFE-formatted datasets will benefit from the usage of the same SAFE XML Schemas for reading and writing the datasets. This greatly improves the overall cycle of creation/ingestion/quality control/ transformation/distribution of the datasets.

(under construction)