16-17 Oct 2003IVOA Data Access Layer, Strasbourg IVOA Data Access Layer (DAL) Working Group Doug Tody National Radio Astronomy Observatory International V IRTUAL O BSERVATORY A LLIANCE
16-17 Oct 2003IVOA Data Access Layer, Strasbourg IVOA Data Access Layer (DAL) DAL Working Group Priorities –Update simple image access (SIA) to V1.1 –Introduce simple spectral access (SSA) V1.0 –Introduce web services versions of DAL services –Drive VO technology development as required for DAL (e.g., dataset identifiers, data models, VOTable)
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Simple Spectral Access (SSA) Goals –provide uniform access to both 1D spectra and SEDs –simplify interface for both data providers and client applications –powerful "multiwavelength" spectral analysis capability Spectral survey –use-cases to drive interface design –identify early data providers and application developers Current issues –spectral data model –interface design issues –spectral dataset representation
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Cambridge SIA V1.1 Priorities Essential –Registry integration –Pixflags support for lossy compressed data (e.g., HCOMPRESS) Image Characterization –Image provenance and identification (collection ID, dataset ID, virtual data provenance, replica support) –Spectral bandpass (already present; may need tweaking for consistency) –Time of observation –Spatial resolution –Limiting flux (harder; may not make V1.1) Other –VO technology integration (normalize UCDs, data models, etc.) –Use of image attributes to refine query (e.g., band) –Default for case where there are multiple versions of same dataset –Spatial bandpass - 3 –Image type (future- v2) –Logical hierarchies to describe complex metadata (as in IDHA – v2)
16-17 Oct 2003IVOA Data Access Layer, Strasbourg DAL Interface Issues Next version of SIA requires progress in the following areas: –dataset identifiers –component data models, dataset characterization –data model, dataset representation These are actually required for all DAL services, not just SIA
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Global Dataset Identifiers Required to identify data returned by DAL services –Images: data collection ID, dataset ID –Catalogs: catalog ID, record ID Will enable –replica management and selection –virtual data management and characterization Discussion being led by Registries group
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Data Model Representation All DAL data access is data model based Must be able to represent data models unambiguously in VOTable VOTable UTYPE proposal to provide "pointer into data model“ Discussion will be in VOTable group
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 20038
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Agenda for DAL Working Group Strasbourg, October 2003 DAL Recap –Service class hierarchy –Concept of different views of same data SIA V1.1 / DAL Interface Issues –image identifiers, virtual data –component data models (UTYPE, UCD normalization) –getImage acref templating (Francois) SSA Straw man –SSA overview / interface (Doug) –SED introduction (Markus) –1D spectral data model (Jonathan) –discussion of SSA issues Review process for development of SSA specification Update DAL Priorities and Schedule After a brief review of the services architecture, most of the discussion in this WG meeting focused on enhancement to SIA and the general DAL infrastructure, and the scope and design of the simple spectra access (SSA) interface.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg DAL Scope: Types of Data (Cambridge) Dataset Time Series Catalog Source Catalog Event List Visibility Data Image NDImage 1D Spectrum SED Primary DAL Services Concept of DAL service architecture from Cambridge. Reviewed and reaffirmed without objection.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg SIA V1.1 / DAL Infrastructure Some key issues –Registry integration, service metadata –Image identifiers (data collection ID, dataset ID) –Data characterization (coverage, bandpass, resolution, etc.) –Data provenance, virtual data characterization –Data model representation, UCD normalization –Templating the URL access reference For the most part these issues actually affect all DAL/VO data access and are not specific to SIA. Some hot topics affecting SIA and all DAL services were discussed.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Globally unique dataset (resource) identifier –ivoa:// / # naming authority (namespace):authority-ID data collection:resource-ID dataset or record:dataset-ID –Images: data collection, dataset –Catalogs: table, record ID Key points –data tagged by a unique global identifier –global identifiers may exist independently of any specific registry –identifiers of published data are persistent –authority IDs are globally unique, globally allocated –each authority controls name allocation within their namespace –caveat: this only works in a simple way for physical datasets Dataset Identifiers Required for many aspects of data access: publication, data provenance, replication, virtual data.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Replica management Data Replication –data replication required for efficient access, data backup –replica management and selection of datasets is enabled by dataset IDs How it works: –replica manager service can harvest individual registries and build a replica catalog –query replica manager service to discover replicas –query individual service to confirm existence, get metadata, get data
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Virtual data Data access layer –most data is virtual data derived from external data sources –data access services subset, transform, or otherwise generate data Dataset IDs –will allow data provenance to be specified –dataset A derived from datasets B, C by operation P This is an essential step to allow us to describe virtual data, but how we do so? [TBD] Current "acref" URL is a kind of virtual data reference –e.g., " 11.2&SIZE=0.1&...” –acref implicitly specifies data provenance –may also be unstable, contain irrelevant access-specific details Use of a getData method instead of an explicit acref URL might allow virtual data generation to be standardized for a given access protocol Dataset identifiers will provide the basis for describing virtual data, but how we do so is still TBD. Most likely doing so will involve defining the generation operation and inputs.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Component data models Required in DAL to –characterize complex objects –represent data for transport and analysis Modeling complex objects –Standard and custom component data models are aggregated to model more complex objects, e.g., datasets Providing a means to determine the ‘quality’ of data will be essential to enable automated data analysis via the VO. Dataset characterization via component data models will provide this capability.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Sample Component Data Models Observation metadata –observatory, instrument, project, observer, etc. Standard 'coverage' metadata –sky, time, bandpass, etc. Dataset characterization –time of observation (lo, high, refvalue) –spectral bandpass (lo, high, refvalue, ID) –spatial bandpass (lo, high; resolution?) –sensitivity or limiting flux (flux 'bandpass'?) –observable World coordinate systems Storage models The data models WG is actively working to define these component data models.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg UTYPE, UCD normalization Proposed VOTable FIELD, PARAM, GROUP Attributes: name="application-name" -- A name freely defined by an application id="ID-name" -- An XML identifier unique within a document ref="ID-ref" -- Reference to an ID elsewhere in document ucd="ucd-name" -- The Unified Content Descriptor ("fuzzy") utype="ns:datamodel-name" -- The uniform attribute type related to a data-model; "ns" represents an optional namespace attribute. A possible alternative would be to use a namespace within UCD, but this would overload UCD and interfere with its current usage. UTYPE or something like it is required to represent data models rigorously in VOTable for data analysis in the VO.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Sample Data Model: Spectral Bandpass UTYPEUCDNameID INST_FILTER_CODE user-definednone UnitUNITS user-definednone RefValueINST_FILTER_REF user-definednone HiLimitINST_FILTER_MAX user-definednone LoLimitINST_FILTER_MIN user-definednone ResponseDATA_LINK user-definednone
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Access reference templating Motivation –SIA query response table can get very large if there are a matrix of options for each possible output image. Some Possible Solutions –ACREF template –getData method Thoughts –templating the acref is a form of getData method –should we just add a getData method instead? –but what is settable may be dataset dependent –metadata can flag attributes which can be set in template –acref would be template string –hence can collapse what could be P1*P2*PN redundant entries
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Access reference templating Motivation –SIA query response table can get large if there are a matrix of options for each possible output image. –Should be easier to recognize simple variations on the same image. –A simple one step ‘getImage’ method could be useful. Proposals –Parameter substitution on acref template (F. Bonarrel) e.g., image format, compression, image generation parameters –Formal getData method No clear consensus at this point. Further discussion is needed. Some form of templating could be good so long as it does not complicate the interface for the client. Some felt that the current approach is ok. XPATH or similar technology should be investigated for implementation.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Simple Spectral Access (SSA) SSA Overview (Doug) –Goals, Interface, Data Formats SED Introduction (Markus) Spectral data model (Jonathan) SSA Issues (all) General Agreements on SSA –Provide uniform interface for both 1D spectra and SEDs –Develop uniform data model for both 1D spectra and SEDs –Service interface will be similar to SIA, CS (query/response, getData) –Data output formats will include at least text, VOTable, FITS, graphics SSA will provide an opportunity to learn how to 1) map VO data models into multiple external representations, and 2) package actual datasets in XML/VOTable, including representing data models in XML.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg SSA Interface Issues Registry integration Query Query response Dataset retrieval Data model Data representation
16-17 Oct 2003IVOA Data Access Layer, Strasbourg SSA Interface Issues Registry integration –Service metadata query –SSA service metadata SSA service verifier –Verify service is correct –Read service metadata, enter into a registry Agreed without objection. Service verification and registration of service metadata should be provided for all DAL services.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg SSA Interface Issues Query –Query attributes pos, size, spectral resolution, bandpass, time velocity, redshift, spectral class, object name, etc. Spatial resolution, Other? –Query interface Simple keyword queries (now) Query language (ADQL) queries (later) General agreement that the query is an important aspect of SSA. Spectra are generally more highly processed than, e.g., images, and may have attributes such as velocity, redshift, etc., which one would like to query on. Implementation of a general query mechanism for SSA may require something like ADQL.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg SSA Interface Issues Query Response –form VOTable as in SIA –this is a flat summary table for simplicity –alternative would be sequence of structured objects Not discussed due to lack of time. Unless a reason is found to deviate the expectation is that the query response will be a flat VOTable as with SIA.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg SSA Interface Issues Dataset Retrieval –one getData method per spectrum/SED –data format options text, xml, votable, fits, graphics, html,... Agreed that spectra output formats should include at least text, VOTable, FITS, and graphics. How data is represented in each format is a different issue.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg SSA Interface Issues Data model –Uniform model for 1D spectra and SED –As simple as it can be while solving this problem –Range of observables –… The general spectral data model as presented by Jonathan was well received by the WG and will serve as the basis for further development of SSA via a subgroup with members from both DAL and DM.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg SSA Interface Issues Dataset representation –text (# keyword = value, records) –votable (how do we represent datset in XML?) –fits (table or image?) Most of the discussion here was of what FITS format to use. It was agreed that a FITS table was the most general, but would be harder for existing applications to use and would duplicate what VOTable will already provide. Use of a simple linearized 1D spectrum represented as a FITS image will be investigated.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Process Process for development of SSA spec –Spectral survey –Discuss SSA design issues (this meeting) –Initial draft specification –Discuss, revise draft specification –Initial implementations
16-17 Oct 2003IVOA Data Access Layer, Strasbourg Priorities and Schedule SSA V1.0 –Initial specification –Initial implementations DAL Technology –Component data models –Data model representation SIA V1.1 Web service implementations