A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and Sheffield by Guy Rixon on
GDW description: access-grid lecture AstroGrid: the UK Virtual Observatory } Seven UK astronomy departments collaborating to build a Virtual Observatory (VO) for the use of the entire astronomical community.
GDW description: access-grid lecture IVOA: the community of VO projects
GDW description: access-grid lecture Purpose of the virtual observatory To combine data from all sources into a data grid. Data grid Private files Archives Live feeds Bibliographies Data sets can be images (mainly in files) or tabular (mainly in RDBMS).
GDW description: access-grid lecture Example of VO use “ Find brown dwarf candidates: combine optical (e.g. APM catalogue) and IR (e.g. 2MASS) data to select by colour. Combine multi-epoch data to determine proper motions; select high-PM fraction of colour- selected sample. Then use that sample to…” Optical archive IR archive 2 nd epoch Colour sample Refined sample 3 rd epoch
GDW description: access-grid lecture VO as collection of web sites: no good Each site has different query protocol Results only go to browser, not to RDBMS, reprocessing Results in HTML etc not machine readable Basic web sites are not sufficient for the VO.
GDW description: access-grid lecture Grid metaphor: electricity supply Loadsa complex equipment Simple delivery to consumer Get your power from any supplier: commodity
GDW description: access-grid lecture Commodities in astronomy data grid Common s/w on desktop Algorithms Archives Writeable Storage Registry of resources (Processors) Bulk data transport; machine- readable results; combined inside grid Metadata transport
GDW description: access-grid lecture AstroGrid topology PortalRegistry AlgorithmsWriteable storageArchives Workflow
GDW description: access-grid lecture Difficult RDBMS operations “Select objects with V-K > 4.5…” (i.e. find ‘red’ objects). U, B, V, ROptical archive service IR archive service J, H, K ? No std. way of combining DBs. No std. way of storing results in RDBMS ?
GDW description: access-grid lecture Need for data warehouse Join across internet RDBMS Join inside warehouse DB 1000x speed gains
GDW description: access-grid lecture GDW topology extends AstroGrid Portal File storageArchive Workflow Registry Grid-DB (OGSA-DAI) Warehouse controller Grid-DB (OGSA-DAI)
GDW description: access-grid lecture GDW people Kona Andrews (Cambridge) Elizabeth Auden (MSSL) Martin Hill (Edinburgh) Tony Linde (Leicester) Clive Page (Leicester) Guy Rixon (Cambridge) Noel Winstanley (Jodrell Bank)
GDW description: access-grid lecture Current system Portal File storageArchive Workflow Registry Grid-DB (OGSA-DAI) Warehouse controller Grid-DB (OGSA-DAI) Link not implemented yet DB tables preloaded; read-only DB Link temporarily redirected
GDW description: access-grid lecture Next system (3Q2004) Portal File storageArchive Workflow Registry Grid-DB (OGSA-DAI) Warehouse controller Grid-DB (OGSA-DAI) Limited choice Links implemented properly (GridFTP) Two dedicated installations inside AstroGrid; multi- user
GDW description: access-grid lecture Ultimate system (2005+) Portal File storageArchive Workflow Registry Warehouse controller Grid-DB (OGSA-DAI) AstroGrid UK e-Science grid / EGEE One node per user; any storage node
GDW description: access-grid lecture Assessment Basic idea is sound Coding of GDW was quite simple Very difficult to get it all integrated Problems with OGSA-DAI: Performance Data-size limits Can’t get higher functions to work yet Proceed? Yes; need to experiment further Still expect to get science out of it
GDW description: access-grid lecture Can one use it? Beta testers invited Wait for release of “Iteration 4.1” system (soon!) Wait for release of “Iteration 5” system (3Q2004) to see GDW useful for science AstroGrid final release is at the end of
GDW description: access-grid lecture That’s all folks!