The Virtual Solar-Terrestrial Observatory The Virtual Observatory Peter Fox HAO/ESSL/NCAR November 28, 2005.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

What does LOFAR have to do with the Virtual Observatory (VO)? LOFAR Science Day 16 December 2003 Melbourne David Barnes The University of Melbourne.
Earth System Curator Spanning the Gap Between Models and Datasets.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
A New Computing Paradigm. Overview of Web Services Over 66 percent of respondents to a 2001 InfoWorld magazine poll agreed that "Web services are likely.
Presenting Provenance Based on User Roles Experiences with a Solar Physics Data Ingest System Patrick West, James Michaelis, Peter Fox, Stephan Zednik,
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
SpaceGRID and EGSO Satu Keski-Jaskari Maria Vappula Parallal Computing – Seminar
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Peter Fox CSCI Week 9, October 27, 2008.
Provenance-Aware Faceted Search Deborah L. McGuinness 1,2 Peter Fox 1 Cynthia Chang 1 Li Ding 1.
Semantic Web Cyberinfrastructure for Virtual Observatories Deborah L. McGuinness Acting Director and Senior Research Scientist Knowledge Systems, AI Laboratory.
Internet Basics Dr. Norm Friesen June 22, Questions What is the Internet? What is the Web? How are they different? How do they work? How do they.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
McGuinness Geon 5/5/2005 SOLAR-TERRESTRIAL ONTOLOGIES (for VSTO and Beyond) Peter Fox 1, Deborah McGuinness 3, Don Middleton 2, Stan Solomon 1, Jose Garcia.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness TA Weijing Chen Semantic eScience Week 10, November 7, 2011.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Joanne Luciano With Peter Fox and Li Ding CSCI Week 10, November.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
EGY Virtual Observatory Working Group report Chair: Peter Fox (HAO/ESSL/NCAR), Co-chairs: Volodya Papitashvili (Michigan),
The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY Peter Fox 1 Don Middleton 2, Stan Solomon 1, Deborah McGuinness 3, Jose Garcia 1, Luca Cinquini.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
1 The Virtual Observatory Exposed Peter Fox * * HAO/ESSL/NCAR Thanks to Deborah McGuinness $#, Luca Cinquini %, Patrick West *, Jose Garcia *, Tony Darnell.
Semantically-Enabled Science Data Integration (SESDI) and The Virtual Solar-Terrestrial Observatory (VSTO) Semantically-enabled (large-scale) Scientific.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Earth System Grid: A Visualisation Solution Gary Strand.
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Semantically-Enabled Virtual Observatories: VSTO Highlights for Observational Data Deborah McGuinness Acting Director and Senior Research Scientist Knowledge.
Fox 2 AISRP April 4-6, 2005  Earth System Grid  Grid-enabled OPeNDAP  Architecture - Server and Application access  Framework experience.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Solar Terrestrial Ontologies – in Support of Virtual Observatories and Large Scale Semantic Scientific Data Integration Deborah McGuinness Co-Director.
The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY - Exploring paradigms for interdisciplinary data-driven science Peter Fox 1 Don Middleton 2,
A Data Access Framework for ESMF Model Outputs Roland Schweitzer Steve Hankin Jonathan Callahan Kevin O’Brien Ansley Manke.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Cyberinfrastructure to promote Model - Data Integration Robert Cook, Yaxing Wei, and Suresh S. Vannan Oak Ridge National Laboratory Presented at the Model-Data.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Access Control for NCAR Data Portals A report on work in progress about the future of the NCAR Community Data Portal Luca Cinquini GO-ESSP Workshop, 6-8.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Event and Feature Catalogs in the Virtual Solar Observatory Joseph A. Hourclé and the VSO Team SP54A-07 : 2008 May 30.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
1 CLASS – Simple NOAA Archive Access Portal SNAAP Eric Kihn and Rob Prentice NGDC CLASS Developers Meeting July 14th, 2008 Simple NOAA Archive Access Portal.
Distributed Data Servers and Web Interface in the Climate Data Portal Willa H. Zhu Joint Institute for the Study of Ocean and Atmosphere University of.
Distributed Archives Interoperability Cynthia Y. Cheung NASA Goddard Space Flight Center IAU 2000 Commission 5 Manchester, UK August 12, 2000.
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
The AstroGrid-D Information Service Stellaris A central grid component to store, manage and transform metadata - and connect to the VO!
NATIONAL AERONAUTICS AND SPACE ADMINISTRATION ESDS Reuse Working Group Earth Science Data Systems Reuse Working Group Case Study: SHAirED Services for.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Data Browsing/Mining/Metadata
The Earth System Grid: A Visualisation Solution
improve the efficiency, collaborative potential, and
Active Data Management in Space 20m DG
Distributed web based systems
Earth Data Search Tool Demo
Unit# 5: Internet and Worldwide Web
HAO/SCD: VO, metadata, catalogs, ontologies, querying
Data Management Components for a Research Data Archive
Presentation transcript:

The Virtual Solar-Terrestrial Observatory The Virtual Observatory Peter Fox HAO/ESSL/NCAR November 28, 2005

The Virtual Solar-Terrestrial Observatory Outline  Virtual Observatories - history and definition(s), I*Ys  Some examples - within disciplines  When is a VO, not a VO?  Beyond disciplines, the emerging need  What is missing, i.e. the enabling technology  Challenges and interoperability  Examples: VSTO and SESDI  What’s ahead

The Virtual Solar-Terrestrial Observatory Encyclopedia - we’ve made it!  Virtual observatory is a collection of integrated astronomical data archives and software tools that utilize computer networks to create an environment in which research can be conducted. Several countries have initiated national virtual observatory programs that will combine existing databases from ground-based and orbiting observatories and make them easily accessible to researchers. As a result, data from all the world's major observatories will be available to all users and to the public. This is significant not only because of the immense volume of astronomical data but also because the data on stars and galaxies has been compiled from observations in a variety of wavelengths: optical, radio, infrared, gamma ray, X-ray and more. Each wavelength can provide different information about a celestial event or object, but also requires a special expertise to interpret. In a virtual observatory environment, all of this data is integrated so that it can be synthesized and used in a given study. 

The Virtual Solar-Terrestrial Observatory Yet more definitions  AVO: A virtual observatory (VO) is a collection of interoperating data archives and software tools which utilize the internet to form a scientific research environment in which astronomical research programs can be conducted. In much the same way as a real observatory consists of telescopes, each with a collection of unique astronomical instruments, the VO consists of a collection of data centres each with unique collections of astronomical data, software systems and processing capabilities.  From the Grid: virtual observatory - astronomical / solar / solar terrestrial data repositories made accessible through grid and web services.  Workshop: A Virtual Observatory (VO) is a suite of software applications on a set of computers that allows users to uniformly find, access, and use resources (data, software, document, and image products and services using these) from a collection of distributed product repositories and service providers. A VO is a service that unites services and/or multiple repositories.  VxOs

The Virtual Solar-Terrestrial Observatory Virtual Observatories  Conceptual examples:  In-situ: Virtual measurements  Related measurements  Remote sensing: Virtual, integrative measurements  Data integration  Systems or frameworks?  Brokers, or data providers, or service providers? »VOTables, VOQueries, etc. a sytnax for exchange  Holding metadata? Who imposes the catalog, or vocabulary?

The Virtual Solar-Terrestrial Observatory  Take the user ‘there’ and make it possible for data providers to get their data to a user in need  May hold data, but mostly not  Once there, enable them to browse, sample, collect, produce, partly analyze, etc. a variety of resources: data, ancillary data, documentation, plots, etc.  Enable them to come away with something, a) useful, b) they could not, may not have been able to, find by other, i.e. direct, means  If you are going to be in the middle then …  Need to encapsulate things you normally may not, e.g. directory organization of data files in a selected dataset, formats, names,…  Provide some acceptable, understandable levels of service(s)  Build in a set of supporting services that the user and provider are not exposed to  Be prepared to learn from what they do while using the VO and adapt to new and changing needs  There’s more, of course…. Virtual Observatory Concepts

The Virtual Solar-Terrestrial Observatory VOs and data providers  Not a VO:  When you hand off a user to another site  Only one dataset  When you do not deliver, or do not arrange for delivery of the data  When your curation role is not evident  DP:  Acquire data and produce data products (static or dynamic).  Preserve data in useable forms.  Distribute data, and provide easy machine (API) and Internet browser access.  Support a communication mechanism – should support a standards-based messaging system (e.g., ftp, http, SOAP, XML)  Produce, document, and make easily available metadata for product finding and detailed data granule content description. Ideally, maintain a catalogue of detailed data availability information.  Assure the validity and quality of the data.  Document the validation process.  Provide quality information (flags).  Maintain careful versioning including the processing history of a product.  Maintain an awareness of standards (such as community accepted data models), and adhere to them as needed.  Provide software required to read and interpret the data; ideally the routines used by the PI science team should be available to all.

The Virtual Solar-Terrestrial Observatory What should a VO do?  Make “standard” scientific research much more efficient.  Even the PI teams should want to use them.  Must improve on existing services (Mission and PI sites, etc.). VOs will not replace these, but will use them in new ways.  Enable new, global problems to be solved.  Rapidly gain integrated views from the solar origin to the terrestrial effects of an event.  Find data related to any particular observation.  (Ultimately) answer “higher-order” queries such as “Show me the data from cases where a large CME observed by SOHO was also observed in situ.”

The Virtual Solar-Terrestrial Observatory What the NASA community wants  Provide coordinated discovery and access to data and service resources for a specific scientific discipline  Identify relevant data sources and appropriate repositories.  Allow queries that yield data granules or pointers to them.  Provide a user interface to access resources both through an API (or equivalent machine access) and a web browser application.  Handle a wide range of provider types, as needed.  Understand the data needs of its focus area:  Recruit potential new providers.  Provide support and "cookbooks" for easy incorporation of providers.  Help to assure high data quality and completeness of the product set.  Resolve issues of multiple versions of datasets.

The Virtual Solar-Terrestrial Observatory More from NASA  Provide documentation for metadata:  Set standards for metadata and query items  Assist providers, and review metadata.  Maintain a global knowledge of data availability.  Possibly maintain collection catalog metadata.  Provide an API or other means for the VxO to appear to others as a single provider.  Potentially provide value-added services (can be done by providers or elsewhere):  Data Subsetting:  Averaging of data  Filtering  Data Merging  Format Conversion  Provide access to event lists and ancillary data.  Collect statistical information and community comments to assess success.

The Virtual Solar-Terrestrial Observatory VSO and the ‘small box’

The Virtual Solar-Terrestrial Observatory

CEDAR

The Virtual Solar-Terrestrial Observatory

NCAR LBNL LLNL ISI ANL ORNL GSI CAS server CAS client MyProxy clientMyProxy server TOMCAT SECURITY services GRAM METADATA services FRAMEWORK services Auth metadata RLS MySQL RLS MySQL RLS MySQL RLS MySQL NERSC HPSS NCAR MSS DISK ORNL HPSS DATA storage The Earth System Grid THREDDS catalogs Xindice MySQL OGSA-DAISMCS TRANSPORT services gridFTP server/client HRM openDAPg server ANALYSIS & VIZ services NCL openDAPg clientLAS server CDAT openDAPg client MONITORING services SLAMON daemon TOMCAT AXIS

The Virtual Solar-Terrestrial Observatory

Emerging needs  Interdisciplinary science and engineering (not just between adjacent fields)  Interdisciplinary data assimilation, integration  Web service workflow orchestration (beyond syntax)  Vortals as well as portals (specific to general)  Agency (NASA) and community efforts (eGY, IHY, IPY, IYPE)

The Virtual Solar-Terrestrial Observatory ACOS at the MLSO Near real-time data from Hawaii from a variety of solar instruments, as a valuable source for space weather, solar variability and basic solar physics

The Virtual Solar-Terrestrial Observatory CISM Goal: To create a physics-based numerical simulation model that describes the space environment from the Sun to the Earth. THE USES OF SPACE WEATHER MODELING A scientific tool for increased understanding of the complex space environment. A specification and forecast tool for space weather prediction. An educational tool for teaching about the space environment.

The Virtual Solar-Terrestrial Observatory CEDARWEB Community data archive, documents, and support.

The Virtual Solar-Terrestrial Observatory User requirements  CEDAR  Search must return data (i.e. no null searches)  Search across instruments, models  Know about special time periods, campaigns, etc.  Allow selections based on (appropriate) geophysical conditions, e.g. Kp index  Usual format returned and in correct units  Must be able to easily re-create the search, access  Visual browsing  MLSO  Same as CEDAR !!  + sampling interval choice, e.g. minutely, daily, average, best of the day, synoptic

The Virtual Solar-Terrestrial Observatory Challenges and interoperability  Semantic misunderstanding  E.g. sunspot number and variations in solar radiation: over 90% of researchers outside the sub-field of solar radiation think: sunspot number is a measure of solar radiation  In reality: a sunspot number is a measure of the number of sunspots appearing on the visible solar surface, a sunspot is an indicator of the location of strong solar magnetic fields, strong magnetic fields are collectively known as solar activity, sunspots are observed to produce a localized decrease in the solar radiation output, etc.  How to ‘explain’ this to a computer?  Interfaces are built by computer scientists with syntax that often works within a discipline but rarely across them

The Virtual Solar-Terrestrial Observatory Concept and user needs Goal - find the right balance of data/model holdings, portals and client software that a researchers can use without effort or interference as if all the materials were available on his/her local computer. The Virtual Solar-Terrestrial Observatory (VSTO) is a: distributed, scalable education and research environment for searching, integrating, and analyzing observational, experimental and model databases in the fields of solar, solar-terrestrial and space physics VSTO comprises a: System-like framework which provides virtual access to specific data, model, tool and material archives containing items from a variety of space- and ground-based instruments and experiments, as well as individual and community modeling and software efforts bridging research and educational use

The Virtual Solar-Terrestrial Observatory User needs In discussions with data providers and users, the needs are clear: ``Fast access to `portable' data, in a way that works with the tools we have; information must be easy to access, retrieve and work with.'’  Few clicks, get what I want, whose tools? MY tools Too often users (and data providers) have to deal with the organizational structure of the data sets which varies significantly --- data may be stored at one site in a small number of large files while similar data may be stored at another site in a large number of relatively smaller files. There is an equally large problem with the range of metadata descriptions for the data. Users often only want subsets of the data and struggle with getting it efficiently. One user expresses it as: ``(Please) solve the interface problem.'’  Encapsulate more

The Virtual Solar-Terrestrial Observatory What’s new in the VSTO? Datasets alone are not sufficient to build a virtual observatory: VSTO integrates tools, models, and data VSTO addresses the interface problem, effectively and scalably VSTO addresses the interdisciplinary metadata and ontology problem - bridging terminology and use of data across disciplines VSTO leverages the development of schema that adequately describe the syntax (name of a variable, its type, dimensions, etc. or the procedure name and argument list, etc.), semantics (what the variable physically is, its units, etc.) and pragmatics (or what the procedure does and returns, etc.) of the datasets and tools. VSTO provides a basis for a framework for building and distributing advanced data assimilation tools

The Virtual Solar-Terrestrial Observatory Virtual Observatory: Need better glue Basic problem: schema are categorized rather than developed from an object model/class hierarchy -> significantly limits non-human use. However, they all form the basis to organize catalog interfaces for all types of data, images, etc. This limits data systems utilizing frameworks and prevents frameworks from truly interoperating (SOAP, WSDL only a start) Directories, e.g. NASA GCMD, CEDAR catalog, FITS (flat) keyword/ value pairs, are being turned into ontologies (SWEET, VSTO) Markup languages, e.g. ESML, SPDML, ESG/ncML are excellent bases

The Virtual Solar-Terrestrial Observatory Methodologies  Use-cases  User requirements  Semantics - ‘what does this mean’  Data integration  Ontologies  Rapid prototyping

The Virtual Solar-Terrestrial Observatory HAO and SCD from NCAR, McGuinness Assoicates: Peter Fox, Don Middleton, Stan Solomon, Deborah McGuinness, Jose Garcia, Patrick West, Luca Cinquini, James Benedict, Tony Darnell and soon  Application domains - CEDAR, CISM, ACOS  Realms (ontologies):  Covers middle atmosphere to the Sun + SPDML  Mesh with Earth Realm (SWEET)  Mesh with GEON  Use-cases and user requirements CISMACOSCEDAR SWEET VSTO +SPDML

The Virtual Solar-Terrestrial Observatory VSTO Use-case 1 UC1: Plot the observed/measured Neutral Temperature (Parameter) looking in the vertical direction for Millstone Hill Fabry-Perot interferometer (Instrument) from January 2000 to August 2000 (Temporal Domain) as a time series. Precondition: portal application is authorized to access the backend data extraction and plotting service 1.User accesses the portal application 2.User goes through a series of views to select (in order) the desired observatory, instrument, record-type (kind of data), parameter, start and stop dates, and the plot type is inferred. At each step, the user selection determines the range of available options in the subsequent steps. NB, an alternate path is selection of start and stop dates, then instrument, etc. 3. The application validates the user request: verifying the logical correctness of the request, i.e. that Millstone Hill is an observatory that operates a type of instrument that measures neutral temperature (i.e. check that Millstone Hill observatory and check that the range of the measures property on the Millstone Hill Fabry Perot Interferometer subsumes neutral temperature). Also, the application must verify that no necessary information is missing from the request. 4. The application processes the user request to locate the physical storage of the data, returning for example a URL-like expression: find Millstone Hill FPI data of the correct type (operating mode; defined by CEDAR KINDAT since the instrument has two operating modes) in the given time range (Millstone Hill FPI 1701 TemporalDomain [January 2000, August 2000] ) 5. The application plots the data in the specified plot type (a time series). This step involves extracting the data from records of one or more files, creating an aggregate array of data with independent variable time (of day or day+time depending on time range selected) and passing this to a procedure to create the resulting image.

The Virtual Solar-Terrestrial Observatory

Demo

The Virtual Solar-Terrestrial Observatory Have you heard these questions?  What do you mean by that?  What did you mean by that?  What does this mean?  How did you get this, please explain?  Does this also mean … ?  Doesn’t this contradict … ?   Leads to:  Inference  Reasoning  Explanation

The Virtual Solar-Terrestrial Observatory Paradigm shift for NASA?  From: Instrument based  To: Measurement based  Requires: ‘bridging the discipline data divide’  Overall vision for SESDI: To integrate information technology in support of advancing measurement-based processing systems for NASA by integrating existing diverse science discipline and mission-specific data sources. SWEET VolcanoClimate SESDI

The Virtual Solar-Terrestrial Observatory Semantic connectors  The SESDI re-useable component interfaces. The stub on each end of the connector is based on the GEON Ontology-Data registration technology and contains articulated axioms derived from the knowledge gained in the unit-level data registration. Includes integrity checks, domain and range, etc. SWEET Process-oriented semantic content represented in SWSL Articulation axioms

The Virtual Solar-Terrestrial Observatory What’s ahead?  Virtual Observatories provide both framework and data system elements, users are already confusing VOs and data providers  Many VO’s are noting the need for better glue, scalability, expandability, etc.  Success (to date) in utilizing formal methods for interface specification and development using ontologies  Success in breaking all of the free tools! Commercial tools are under consideration  Challenges exist for reasoning and interface with scientific datatypes, e.g. complex spatial and temporal concepts  For VSTO (and SESDI) - more use-cases, populate the interfaces and test for scalability and interoperation in production settings