Download presentation
Presentation is loading. Please wait.
Published byRaymond Stephens Modified over 8 years ago
1
AstroGrid Datacenters AstroGrid Consortium Review Dec 2004 Martin Hill (AstroGrid@ROE)
2
Outline Challenge Challenge Approach Approach Developed: Developed: StorepointsStorepoints Describing dataDescribing data Query LanguageQuery Language StatusStatus VersioningVersioning Software: Publisher’s AstroGrid Library Software: Publisher’s AstroGrid Library
3
Problem Challenge Outline Large datasets (to Petabytes) Large datasets (to Petabytes) So? So? Distributed; Science comes from combining Distributed; Science comes from combining Bandwidth rising slower than Bandwidth rising slower than No/few established suitable standards No/few established suitable standards FITS images/‘tables’. Ambiguous headers. Ambiguous subformat, eg spectra.FITS images/‘tables’. Ambiguous headers. Ambiguous subformat, eg spectra. VOTable introduced. Ambiguous subformat eg spectra vs catalogue. Verbose.VOTable introduced. Ambiguous subformat eg spectra vs catalogue. Verbose. No/few established common terms No/few established common terms Involves Scientists… Involves Scientists…
4
Approach: ‘Publisher’s AstroGrid Library’ General solution to: General solution to: Discover problems faced, accumulate solutions in softwareDiscover problems faced, accumulate solutions in software Experimentally publish sets and types (not host).Experimentally publish sets and types (not host). Many smaller datasets owned by people without web skills (eg solar) so:Many smaller datasets owned by people without web skills (eg solar) so: Need 'easy‘/’unskilled’ installation Need 'easy‘/’unskilled’ installation Able to proxy; 3rd parties can publish data without requiring more work from owner (eg VizieR, Trace) Able to proxy; 3rd parties can publish data without requiring more work from owner (eg VizieR, Trace) ‘Free’ website, range of standard interfaces ‘Free’ website, range of standard interfaces Danger: too general (any query against any dataset producing any results). Danger: too general (any query against any dataset producing any results).
5
Existing Solutions Common task: publish RDBMs to web Common task: publish RDBMs to web Accumulated tools & skill-sets Accumulated tools & skill-sets No combined solution offering: No combined solution offering: Standard interface (eg query language)Standard interface (eg query language) Scientific values (errors, units)Scientific values (errors, units) Spatial querying (common)Spatial querying (common) VO Metadata for query and resultsVO Metadata for query and results
6
Developing Standards Resource metadata Resource metadata Query language (ADQL/s, ADQL/x) Query language (ADQL/s, ADQL/x) Web interfaces Web interfaces Working beyond standards Working beyond standards Feeding research to IVOA Feeding research to IVOA Parallel development Parallel development In the VO: eg Starlink, NVO, VizieRIn the VO: eg Starlink, NVO, VizieR External: SRB, Taverna, GridPP monitorExternal: SRB, Taverna, GridPP monitor ConvergenceConvergence
7
Protocols & Interfaces Human – web pages Human – web pages SOAP SOAP Toolkit IncompatibilitiesToolkit Incompatibilities Streaming awkward (via Toolkits)Streaming awkward (via Toolkits) Longer term benefits?Longer term benefits? ‘Raw Http post’ (eg servlets, CGI) ‘Raw Http post’ (eg servlets, CGI) SimplerSimpler More existing skills amongst AstronomersMore existing skills amongst Astronomers Mixed (eg SIAP, SkyNode) Mixed (eg SIAP, SkyNode) Don’t Choose – Implement Don’t Choose – Implement Mix & Match, Plug & Play: Mix & Match, Plug & Play:
10
Releasing Deploy early – if temporarily Deploy early – if temporarily Independent & Integrated Access Independent & Integrated Access Versioning: Versioning: Servers & clients, ie new clients can still use old servers, and new servers work with old clients.Servers & clients, ie new clients can still use old servers, and new servers work with old clients. Add and ‘deprecate’, don’t changeAdd and ‘deprecate’, don’t change Delete intelligentlyDelete intelligently (Remove quickly unused i/fs, eg CEA if CEA upgrades, JSPs) (Remove quickly unused i/fs, eg CEA if CEA upgrades, JSPs) Need hosts… Need hosts… Hosts need hardwareHosts need hardware Publishers need to know their dataPublishers need to know their data
11
Describing Data Registry ‘Resource’ documents Registry ‘Resource’ documents IVO Tabular Sky Service IVO Tabular Sky Service Units, UCDsUnits, UCDs Solar vs Sky vs… Solar vs Sky vs… Images vs Catalogues Images vs Catalogues Concept extended for ‘RdmsMetadata’ Concept extended for ‘RdmsMetadata’ UCD1+ -> Dictionaries & OntologiesUCD1+ -> Dictionaries & Ontologies Relationships (simple: errors)Relationships (simple: errors) Queryable Queryable Mirrors vs Copies Mirrors vs Copies
12
Query Language SQL -> ADQL/xml SQL -> ADQL/xml Defined common functions – CIRCLE & XMATCH (sky not solar) Defined common functions – CIRCLE & XMATCH (sky not solar) Working on: Working on: XQLXQL UnitsUnits Investigating: UCDs instead of columnsInvestigating: UCDs instead of columns Cross-dataset queryingCross-dataset querying
13
Results Query+Metadata+RawResults = VoResults Query+Metadata+RawResults = VoResults FITS vs VOTable vs HDF vs CSV vs HTML vs… FITS vs VOTable vs HDF vs CSV vs HTML vs… All of them All of them Results -> queryable data -> inputs Results -> queryable data -> inputs
14
Data Analysis Faster feasible Faster feasible < 10^6s OK. 10^8 not…< 10^6s OK. 10^8 not… Joins Joins Polar coordinate matches (+ HTM, HealPix).Polar coordinate matches (+ HTM, HealPix). Cross-match algorithmsCross-match algorithms Distributed queries Distributed queries Breaking down queryBreaking down query Moving the right dataMoving the right data Combining the resultsCombining the results (Clive Page)
15
Status Readily available Readily available Debugging; developer Debugging; developer Debugging; astronomer Debugging; astronomer Inform User Inform User
16
Storepoints No data persistence at PALs No data persistence at PALs Web server machines not data storage onesWeb server machines not data storage ones Large result setsLarge result sets No workspace, memory models, etcNo workspace, memory models, etc Streaming outputs Streaming outputs SRB, GridFTP not ready. SRB, GridFTP not ready.
17
Identifying Storepoints Concepts Concepts MySpace FTP SRB HTTP GridFTP Community HomeSpace VoSpace (Registered) FTP, File, MySpace + extend. FTP, File, MySpace + extend. 3 rd iteration; 2 nd in use 3 rd iteration; 2 nd in use SRB GridFTP MySpace SRB
19
Data Service Architecture Datacenter Implementation Slinger Axis Cone SIAP Plugin Manager /XML/CSV zip/plain email/file/ftp /myspace AstroGrid CEA SkyNode JSP
20
Publishers’ AstroGrid Library ‘Easy to publish to the VO’ ‘Easy to publish to the VO’ Web Application, includes: Web Application, includes: SOAP (AstroGrid, CEA, prepped for SkyNode)SOAP (AstroGrid, CEA, prepped for SkyNode) CGI (SIAP, NVO-cone search, SSA)CGI (SIAP, NVO-cone search, SSA) HTML pages (cone search, query builder, status monitor)HTML pages (cone search, query builder, status monitor) Features Features Asynchronous (‘stateful’) & Synchronous QueriesAsynchronous (‘stateful’) & Synchronous Queries QueuesQueues Comprehensive Status (incl historical)Comprehensive Status (incl historical) Variety resultsVariety results Fully ‘Streamed’ – no curation issuesFully ‘Streamed’ – no curation issues Server ‘Plugins’, including: Server ‘Plugins’, including: RDBMS (JDBC)RDBMS (JDBC) FITS file collectionFITS file collection eXist (XML)eXist (XML) Helper Tools Helper Tools Metadata GeneratorsMetadata Generators Ready-made website accessReady-made website access
21
Situation Now Installed: Installed: SuperCOSMOS Science Archive (RDBMS)SuperCOSMOS Science Archive (RDBMS) astrogrid.roe.ac.uk:8080/pal-ssa/ astrogrid.roe.ac.uk:8080/pal-ssa/ astrogrid.roe.ac.uk:8080/pal-twomass/ astrogrid.roe.ac.uk:8080/pal-twomass/ astrogrid.roe.ac.uk:8080/pal-usnob/ astrogrid.roe.ac.uk:8080/pal-usnob/ 6dF – Spectra6dF – Spectra grendel12.roe.ac.uk:8080/pal-6df/ grendel12.roe.ac.uk:8080/pal-6df/ Wide Field SurveyWide Field Survey TRACE (FITS files, Solar, under test)TRACE (FITS files, Solar, under test) Proxy (bespoke special plugins) Proxy (bespoke special plugins) All NVO-cone-compatible DBs (test)All NVO-cone-compatible DBs (test) VizieRVizieR Evaluated/ing at: Evaluated/ing at: ESOESO RAL (solar)RAL (solar) JBO (Merlin)JBO (Merlin) Reviewing Query Language, metadata documents, etc Reviewing Query Language, metadata documents, etc
22
Future Quality… Quality… Metadata ‘wizards’ Metadata ‘wizards’ Sell to hosts; deploy to Leicester, JBO, ESO, RAL, The World.... Sell to hosts; deploy to Leicester, JBO, ESO, RAL, The World.... Explicit and Investigative Queries Explicit and Investigative Queries Distributed queries & combining results (NVO Exec plans) Distributed queries & combining results (NVO Exec plans) Full SIA, SSA interface Full SIA, SSA interface More user & admin web pages More user & admin web pages Local authorisation Local authorisation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.