Download presentation
Presentation is loading. Please wait.
Published byArnold Cunningham Modified over 9 years ago
1
Basic Propositions of the RVO Information Infrastructure Project On behalf of the RVOII project report co-authors Leonid Kalinichenko Institute of Informatics Problems of RAS leonidk@synth.ipi.ac.ru
2
RVO Information Infrastructure Project Report In May 2005 in Russia the RVO Information Infrastructure (RVOII) project report has been published as a result of joint efforts of the Special Astrophysical Observatory of RAS (SAO RAS), of the Institute of Astronomy of RAS (INASAN) and the Institute of Informatics Problems of RAS (IPI RAS) supported by one year grant of the Russian Foundation for Basic Research (RFBR). RVOII is aimed at integrated representation of information in various problem domains of astronomy and support of scientific problems solving. The project report contains analysis of various kinds of astronomical information resources accumulated in around the World and specifically in Russia, analysis of technological and architectural recommendations of the IVOA, analysis of classes of scientific problems that need the VO facilities, analysis of of correspondence and sufficiency of the IVOA standards for the identified RVO activities; analysis of existing components and services that can be re-used for RVOII implementation. Based on the analysis performed, a structural design of the RVO information infrastructure has been developed. Strategically the program of development of RVOII is oriented on tight coordination of works with the activity on the development of the International VO.
3
Talk outline o Objectives of the RVO project o Representation of problem domains in natural sciences in information systems o Information Resources in Astronomy; Russian astronomical resources o Analysis of Projects of Virtual Observatories o Information Infrastructure Forming Standards o Classes of astrophysical problems for VO o Virtual observatory architecture according to IVOA o Subject mediation infrastructure planned for problem domains representation in RVO o Information infrastructure of RVO o AstroGrid as the core of the RVOII infrastructure o First trial of AstroGrid community centre o Analysis of possibility of extending of AstroGrid with subject mediation facilities
4
Objectives of the RVO project (1) Main objectives of the RVO project : to provide the Russian astronomical community with the facilities of integration of the Russian astronomical resources into the VO; to provide the Russian astronomical community with the facilities of integrated access to the data accumulated in the International astronomical data resources; to provide the Russian astronomical community with the facilities of problem domains definition for solving of various classes of the astronomical problems, computational facilities, facilities for information analysis and data mining, facilities for automation of scientific research in astronomy; to support a set of standards agreed with the international community and providing for the interoperability of heterogeneous data and facilities for the problem solving;
5
Objectives of the RVO project (2) to develop strategically important classes of astronomical problems based on the VO technology and develop processes (workflows) and mediators for the respective research support; to develop organizational measures for development and usage of the VO technology in Russia agreed with the international community, for coordinating of the Astronomical Data Centers in Russia and abroad, for coordination of research based on the VO technology; to develop a set of measures for creation of RVO as an important educational resource for the Russian Universities; to form in Russia the sustainable community of astronomers actively using VO in their scientific research; to contribute to the high level of research based on VO technology in Russia in the strategically important areas of astronomy.
6
Subject Domain in Natural Science Material System Def in NL Domain Terminology and Concepts (abstract, methodological, concrete) Theory (Model) 1. T1 Signature Concretization A of T1 Concretization B of T1 (attributes, types, classes, processes) [simulators] … Semantics of T1…Tn constituents Observable/Measurable Characteristics Methods and Instruments for observa- tion, experimentation, measurement, data analysis, discovery T1 Measurable Characteristics (attributes, types, classes, procs) Observations, simulations, measurements for T1 Explaining, forecasting Semantics Interpreta- tions T2, …, Tn measu- rable characteristics Theories (Models) T2, …, Tn Problems, methods of solutions, algorithms, programs, workflows Simulation
7
From the Report to the President of USA “Computational Science: Ensuring America’s Competitiveness” The President’s Information Technology Advisory Committee (PITAC) in May 2005 completed the report where it states: “No single researcher has the skills required to master all the computational and application domain knowledge needed to gather data from databases or experimental devices, create geometric and mathematical models, create new algorithms, implement the algorithms efficiently on modern computers, and visualize and analyze the results. To model such complex systems faithfully requires a multidisciplinary team of specialists, each with complementary expertise and an appreciation of the interdisciplinary aspects of the system, and each supported by a software infrastructure that can leverage specific expertise from multiple domains and integrate the results into a complete application software system. Computational researchers need enabling, scalable, interoperable application software to conduct examinations of their ideas and data”.
8
Information Resources in Astronomy o World-wide resources overview o Optical surveys and catalogs o Infrared and radio range surveys o Archives of observations o Data centers o Surveys o Robotic Telescopes o Russian astronomical resources
9
From Tera to Petabytes Large Synoptic Survey Telescope (LSST) ranging from Earth's vicinity to the edge of the optical universe. It will reach 24th mag in 10 seconds, and will survey up to 14,000 square degrees three times per month. Over a period of years, 30,000 square degrees will be surveyed in multiple bands and the co-added images will go to 27th magnitude. High technology in microelectronics, large optics fabrication and metrology, and software. Comparing the LSST (8.4 m) telescope with the SDSS, and allowing also for its increased pixel sampling and resolution, the advantage in figure of merit is by a factor of close to 200 Data products will consist of photometric catalogs which will be continuously updating during the survey, a moving object database, images in at least 5 bands (updated on a regular schedule), the huge time-tagged processed image database, totally will climb to around 15 Petabytes.
10
Russian astronomical resources Main providers of astronomical data in Russia: Special Astrophysical Observatory of RAS (SAO RAS) Sternberg Astronomical Institute of the Moscow State University (SAI MSU) Main (Pulkovo) Astronomical observatory of RAS (MAO) Institute of Applied Astronomy (IAA RAS) Institute of Terrestrial Magnetism, Ionosphere and Radiowave Propagation of the RAS (IZMIRAN) Institute of Solar-Terrestrial Physics of the Siberian Branch of Russian Academy of Sciences (ISTP SB RAS) Space Research Institute (IKI) of the RAS Astronomical Institute of Saint-Petersbourg State University (AI SPbSU) Ural State University (USU) Puschino Radioastronomical Observatory of Astro Space Center of the LPI RAS (PrAO ASC LPI RAS) Russian Robotic Telescopes
11
Russian astronomical resources SubjectNumber of resourcesNumber of institutions Stellar systems73 Stars229 Solar system218 Sun238 Radioastronomy74 Cosmic rays43 Multi subject archives75 TOTAL9119 (Russia and fSU) Russian and fSU astronomical data resources classified by subject
12
Analysis of Projects of Virtual Observatories o NVO o AstroGrid o EURO-VO
13
EURO-VO Participants French VO, as represented by the Centre de Données astronomiques de Strasbourg (CDS), Strasbourg, France European Southern Observatory, Garching, Germany European Space Agency, Paris, France UK AstroGrid Consortium, as represented by the University of Edinburgh, Edinburgh, UK German Astrophysical Virtual Observatory (GAVO), as represented by the Max Planck Institute for Extraterrestrial Physics (MPE), Garching, Germany Istituto Nazionale di Astrofisica, Rome, Italy Nederlandse Onderzoekschool voor Astronomie, Leiden, The Netherlands Laboratorio de Astrofísica Espacial y Física Fundamental, Madrid, Spain
14
Information Infrastructure Forming Standards The OAI-Protocol for Metadata Harvesting (OAI-PMH) defines a mechanism for harvesting records containing metadata from repositories A Web service is defined as a standardized way of integrating Web-based applications using the XML, SOAP, WSDL, and UDDI open standards over an Internet protocol backbone Grid technology Compute/File Grid Information Grid Hybrid Grid Semantic Grids Web Services Resource Framework (WSRF) to make grid resources accessible within a web services architecture.
15
Classes of astrophysical problems for VO o Class of problems solvable applying database search technique o Classes of general problems for VO (cosmology, formation and development of galaxies, formation and evolution of stars, sun and planets, etc.) o Theoretical research and VO (VirtU – the Virtual Universe project as an example) o Co-existence of theoretical and observational archives and services in VO
16
The relationship between the TVO, TOI and AstroGrid
17
From the Report to the President of USA “Computational Science: Ensuring America’s Competitiveness” Astrophysical scientific problems mentioned in the PITAC Report: Discovering Brown Dwarves via Data Mining Scientists creating the NVO confirmed the existence of the new brown dwarf in 2003. The new discovery was quite unexpected from data that had been publicly available for at least 18 months. NVO researchers emphasized that a single new brown dwarf discovered, Is not as scientifically significant as the rapidity of the new discovery and the tantalizing hint it offers for the potential of NVO. Dark Matter, Dark Energy, and the Structure of the Universe A team at the University of Illinois has conducted large-scale cosmological computational simulations that show the distribution of cold dark matter in a model of cosmic structure formation incorporating the effects of a cosmological constant (Lambda) on the expansion of the universe. The simulation contained 17 million dark matter particles in a cubic model universe that is 300 million light-years on a side. Supernova Modeling The TeraScale Supernova Initiative (TSI), a national, multi-institution, multidisciplinary collaboration of astrophysicists, nuclear physicists, applied mathematicians, and computer scientists. TSI’s principal goals are to understand the mechanism(s) responsible for the explosions of core collapse supernovae and all the phenomena associated with these stellar explosions.
18
Requirements for scientific results publishing To publish means to make data/service products in repositories available through services that are accessible via a VO supplied sites To allow independent checks of conclusions based on theoretical results, reproducing certain results. To allow comparisons with similar results/methodologies or with the corresponding data by observers/theoreticians. To make theoretical results more easily accessible and understandable for observers. Journals may require links to actual data products and/or software used in published work. To allow querying of publications, real and simulated data products in a uniform manner (joint queries on a structured content items and on metadata – on observations and publications) To check observable classes as interpretations of theories (models), to make analysis of inconsistencies of observations and theoretical models.
19
Data Mining as a part of PSE Two basic classes of models: predictive and descriptive Predictive: one of the observational features is chosen as the target. The model provides a way of calculating the target as a function of the rest of the features: Y=F(X1, …,Xn). Two approaches – classification (predicts a class to which an object may belong with a certain probability) and regression (predicts a value of the target). (Naïve Bayes, Adaptive Bayes, Support Vector Machines (SVM), regression, searching for essential attributes, etc.) Descriptive: a) Clusterization applying certain criteria of similarity (in contrast with classification features and classes of partitioning are unknown), b) Associative model (looking for stable associations) For each model many algorithms exist (classification and regression decision trees, genetic algorithms, neuron nets, discriminant analysis, enhanced K- means, O-cluster, association search etc.) Technology of data mining: 1) problem statement, 2) data preparation, 3) model development and choosing the algorithm, 4) evaluation and interpretation. Not all models allow interpretation (e.g., neuron nets). But if rules are applied, they give a way for interpretation Problem statements are required !
20
VO architecture according to IVOA o VO architecture overview o Data Modeling o A unified domain model for astronomy, for use in VO o Data model for quantity o IVOA Observation data model o Simple Spectral Data Model o Simulation Data Model o Unified Content Descriptors o Metadata Registries for VO o VOTable Format Definition o Data Access Layer o DAL Architecture o Simple Image Access Protocol Specification o Simple Spectral Access Specification o IVOA Query Language o IVOA SkyNode Interface
21
International Virtual Observatory Alliance Partners AstroGrid (UK) (http://www.astrogrid.org); Australian Virtual Observatory (http://avo.atnf.csiro.au); Astrophysical Virtual Observatory (EU) (http://www.euro-vo.org); Virtual Observatory of China (http://www.china-vo.org); Canadian Virtual Observatory (http://services.cadc-ccda.hia-iha.nrc- cnrc.gc.ca/cvo/); German Astrophysical Virtual Observatory (http://www.g-vo.org/); Hungarian Virtual Observatory (http://hvo.elte.hu/en/); Italian Data Grid for Astronomical Research (http://wwwas.oat.ts.astro.it/idgar/IDGAR-home.htm); Japanese Virtual Observatory (http://jvo.nao.ac.jp/); Korean Virtual Observatory (http://kvo.kao.re.kr/); National Virtual Observatory (USA) (http://us-vo.org/); Russian Virtual Observatory (http://www.inasan.rssi.ru/eng/rvo/); Spanish Virtual Observatory (http://laeff.esa.es/svo/); Virtual Observatory of India (http://vo.iucaa.ernet.in/~voi/).
22
IVOA Infrastructure Controversies (just one example) 1.Euro-VO and NVO objectives: how to consolidate them and support with a complete system of standards 2.Controversies in understanding of what Data Centre is (e.g., CDS vs AstroGrid definitions) 3.Absence of a Data Centre concept in the IVOA standards 4.Controversy between SkyQuery idea and Data Centres
23
Subject mediation infrastructure for problem domains representation in RVO o Information sources integration approaches o Principles of subject mediation o Subject mediation tools
24
Information sources integration approaches Virtual integration: Formation of a global schema as a result of integration of pre-selected set of source schemas (Global as View) Global schema is defined independently of existing sources as a subject domain schema (Local as View) Materialized integration (data warehouses) Combined methods (GLAV, applying partial materialization)
25
There exist two principally different approaches to the problem of integrated representation of multiple information resources for a researcher solving scientific problems: 1)moving from resources to a problem (an integrated representation of multiple resources is created independently of the problem) and 2)moving from a problem to the resources (a description of a problem class subject domain (in terms of concepts, data structures, functions and processes of problem solving) is created, in which the relevant to the problem resources are mapped). The first approach (used in SkyQuery) is not scalable with respect to the number of resources, global schema becomes not observable by researcher, completeness of information is doubtful. To implement the second approach a mediation technology is to be created. The mediator supports an interaction between a researcher and resources applying a description of the problem class subject domain (description of the mediator). Subject mediator approach (new technology) is considered as a part of RVOII. Subject Mediator Concept
26
Mediator Definition as a Subject Metainformation Consolidation For the mediator's scalability two separate phases of the mediator's functioning are distinguished: consolidation and operational. On the consolidation phase the efforts of the scientific community are focused on the mediator subject definition by declaring its metainformation. The metainformation created at the consolidation phase constitutes a definition of the subject domain of the mediator. During the operational phase arbitrary information collections can be registered at the mediator expressed in terms of the mediator. Process of the registration is autonomous and can be done by collection providers independently of each other. Users of the mediator know only the metainformation defining the mediator’s subject and formulate their queries in terms of the mediator’s subject.
27
Advantages of subject domain mediation 1. Semantic integration of heterogeneous information collections is reached 2. Users should know only subject definitions consolidated by a community 3. Information providers can disseminate their information for integration independently of each other and at any time. 4. Autonomous information collections are absolutely independent of the mediators and their consolidated metainformation definitions 5. Users have integrated access to all information registered up to the moment of a query. 6. Mediators form recursive structure. Multiple subjects can be semantically integrated defining mediators of the higher level.
28
Subject mediation tools (operational phase)
29
Information infrastructure of the RVO o Basic principles for the RVO infrastructure o The RVO layered infrastructure o Components of RVO
30
Basic principles for the RVO infrastructure Basic RVO infrastructural principle is to represent the architecture as a network of interoperating web services (Grid services as soon as suitable OGSA DAI or WSRF standard will mature). a multilevel hierarchy of services is the basis for the RVO architecture. The handling of remote and virtual data sources should be provided. The core will be set of simple, low level services that are easy to implement even by small projects. Thus the threshold to join the VO will be low. Large data providers may be able to implement more complex, high-speed services as well. The services can be combined into more complex compositions that talk to several services, and create more complex results. Move processing to the data is another principle motivated by large volume of the data and data intensive character of VO applications. Modular architecture that encourages code reuse and composition is another guiding principles for the RVO infrastructure. Conventional practice of applying global as view approach to data integration in the VO projects (e.g., SkyQuery) looks as not scalable. Emphasizing subject mediators to support representation and access to various subject domains in astronomy is a basic RVO principle.
31
The RVO layered infrastructure
32
Searchable metadata registries at Data Center and Virtual Observatory layers
33
SAO Data Center Infrastructure
34
INASAN Data Center Infrastructure
35
RVO Infrastructure
36
The RVO layered infrastructure
37
AstroGrid as the core of the RVOII infrastructure
38
AstroGrid as the architectural core for implementation of RVOII Analysis shows that usage of AstroGrid as the RVOII core provides for implementation of the RVOII principles (such as modularity of the architecture, grid interoperability of services, possibility of re-use and composition of services, development of multilayered architecture). Components of AstroGrid are analyzed to be directly applicable as the RVOII architecture core: Registry – for metadata based resource registration and search, MySpace – for management of sharable by researchers and tools data spaces, Workbench – for the VO user interface during problem solving, Community – for administration and management of VO users, JES – for the workflow engine, CEA – for constructing of interoperable applications (services); DSA – for a facility of data storage functionality inclusion into AstroGrid on the required level of system (task) implementation
39
AstroGrid existing and planned components
40
Community centre in Moscow (IPI RAS) for support of scientific astronomical problem solving over distributed repositories of astronomical information One of the first steps of implementation of RVOII is installation of Community centre in Moscow (at IPI RAS) for support of scientific astronomical problem solving over distributed repositories of astronomical information (containing data of observations, problem solving results, services for data and knowledge analysis). This Centre is positioned at the top layer of RVOII providing for its immediate usage for problem solving by scientists in astronomy.. The Centre has been created in October 2005 as an installation of the AstroGrid (1.1), developed recently in the UK and generously provided by the authors to be used for RVO.
41
First trial: application of AstroGrid for data analysis for the distant galaxy discovery problem
44
Superposition of radio images contours and optical images in Aladin
45
RVO facilities as a part of the International VO
46
Analysis of possibility of extending AstroGrid with subject mediation facilities Basic preliminary decisions: Mediators are registered in the Registry as CEA applications; At the mediator interface the methods for providing ADQL queries and mediator programs in a subset of the SYNTHESIS language are planned; CEA applications can be used as functions in the mediator programs; The results of the mediator programs are represented in a form of VOClass, for which VOTable is a strict subset; the results are stored in MySpace; The mediator programs can be used as tasks of the AstroGrid workflows; Adapters are embedded into AstroGrid either by means of the built-in application server for java applications or by means of DSA application server; For the mediator clients on the initial stage Portal and Workbench can be used; On the forthcoming stages a development of specific mediator client based on the ACR capabilities can be undertaken; Facilities for external applications calls are planned (e.g., for data mining facilities of Weka and/or Oracle).
47
Composed architecture
48
Links RVOII Report Briukhov D.O., Kalinichenko L.A., Zakharov V.N., Panchuk V.E., Vitkovsky V.V., Zhelenkova O.P., Dluzhnevskaya O.B., Malkov O.Yu., Kovaleva D.A Information Infrastructure of the Russian Virtual Observatory (RVO). Second Edition IPI RAN, May 2005 http://synthesis.ipi.ac.ru/synthesis/publications/rvoii/rvoii.pdf Объявление АстроГрида РВО как центра коллективного пользования, инструкция по регистрации http://synthesis.ipi.ac.ru/synthesis/projects// astromedia/astroannounce
49
BASIC INFORMATION TECHNOLOGY FOR VO IS COMING. SCIENTIFIC PROBLEM STATEMENTS AND MULTIDISCIPLINARY WORK ON THEIR SOLVING APPLYING VO IS REQUIRED
50
IVOA Architecture Diagram
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.