WFCAM Science Archive Critical Design Review, April 2003 The SuperCOSMOS Science Archive (SSA) WFCAM Science Archive prototype Existing ad hoc flat file.

Slides:



Advertisements
Similar presentations
Michael Pizzo Software Architect Data Programmability Microsoft Corporation.
Advertisements

IMPLEMENTATION OF INFORMATION RETRIEVAL SYSTEMS VIA RDBMS.
C6 Databases.
Lecture-7/ T. Nouf Almujally
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
VISTA/WFCAM pipelines summit pipeline: real time DQC verified raw product to Garching standard pipeline: instrumental signature removal, catalogue production,
For Mapping Biodiversity Data Data Management Options.
WFAU Science Archives surveys.roe.ac.uk WFAU Demonstration Nick Cross, Rob Blake, Ross Collins, Mike Read, Eckhard Sutorius, Mark Holliman, Stellios Voutsinas,
Nicholas Cross, Rob Blake, Ross Collins, Mark Holliman, Mike Read, Eckhard Sutorius, Nigel Hambly, Andy Lawrence, Bob Mann, Keith Noddle Wide Field Astronomy.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
1 Introduction to Database Management Systems Lila Rao Graham.
15 December 2008Science from UKIDSS II WFCAM Science Pipeline Update WFCAM Science Pipeline Update Jim Lewis, Mike Irwin & Marco Riello Cambridge Astronomy.
Geographic Information Systems
VISTA pipelines summit pipeline: real time DQC verified raw product to Garching standard pipeline: instrumental signature removal, catalogue production,
“DOK 322 DBMS” Y.T. Database Design Hacettepe University Department of Information Management DOK 322: Database Management Systems.
8/28/97Information Organization and Retrieval Files and Databases University of California, Berkeley School of Information Management and Systems SIMS.
WFAU Tapes === Store 6 x dual PCs 3GHz Xeons 6 x 1 Tbyte local disks 2 Tbyte ingest buffer 10 Tbytes of raw 20 Tbytes of processed WFCAM archive storage.
SESSION 7 MANAGING DATA DATARESOURCES. File Organization Terms and Concepts Field: Group of words or a complete number Record: Group of related fields.
Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...
RIZWAN REHMAN, CCS, DU. Advantages of ORDBMSs  The main advantages of extending the relational data model come from reuse and sharing.  Reuse comes.
18 April 2007 Second Generation VLT Instruments 1 VIRCAM & CPL: Lessons Learned Jim Lewis and Peter Bunclark Cambridge Astronomy Survey Unit.
Data provenance in astronomy Bob Mann Wide-Field Astronomy Unit University of Edinburgh
A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria Supervisor: Dr. Jian Yang.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
IST Databases and DBMSs Todd S. Bacastow January 2005.
VO as a Data Grid, NeSC ‘03 WFCAM Science Archive Nigel Hambly Wide Field Astronomy Unit Institute for Astronomy, University of Edinburgh.
Cool white dwarfs in the Sloan & SuperCOSMOS Sky Surveys Nigel Hambly, Wide Field Astronomy Unit, IfA, University of Edinburgh.
Cube Enterprise Database Solution presented to MTF GIS Committee presented by Minhua Wang Citilabs, Inc. November 20, 2008.
Introduction –All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar.
Introduction to Database Systems Motivation Irvanizam Zamanhuri, M.Sc Computer Science Study Program Syiah Kuala University Website:
Web-Enabled Decision Support Systems
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
Databases and Database Management Systems
6 Chapter Databases and Information Management. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits.
Science Archive for Sky Surveys Data Providers and the VO - NeSC 2003 March Wide Field Astronomy Unit Institute for Astronomy.
Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.
© 2007 by Prentice Hall 1 Introduction to databases.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
10 June 2002Towards an International VO - Garching bei Munchen 1 ASTRO-WISE An Astronomical Wide-field Imaging System for Europe Konrad Kuijken, Edwin.
LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
DC2 Post-Mortem/DC3 Scoping February 5 - 6, 2008 DC3 Goals and Objectives Jeff Kantor DM System Manager Tim Axelrod DM System Scientist.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Chapter 9 Database Systems Introduction to CS 1 st Semester, 2014 Sanghyun Park.
Data resource management
MSG, GERB calibration and data status May ‘07 J E Russell Imperial College, London
Creating and Maintaining Geographic Databases. Outline Definitions Characteristics of DBMS Types of database Relational model SQL Spatial databases.
Esri UC 2014 | Technical Workshop | Editing Versioned Geodatabases : An Introduction Cheryl Cleghorn and Shawn Thorne.
Intro to GIS | Summer 2012 Attribute Tables – Part 1.
Foundations of Business Intelligence: Databases and Information Management.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
P51UST: Unix and SoftwareTools Unix and Software Tools (P51UST) Version Control Systems Ruibin Bai (Room AB326) Division of Computer Science The University.
Pan-STARRS PS1 Published Science Products Subsystem Presentation to the PS1 Science Council August 1, 2007.
Lecture 10 Creating and Maintaining Geographic Databases Longley et al., Ch. 10, through section 10.4.
IPHAS Early Data Release E. A. Gonzalez-Solares IPHAS Consortium AstroGrid National Astronomy Meeting, 2007.
GCSE ICT How data is stored. How is data stored? Data can be stored in paper-based systems including: –Reference books –Dictionaries –Encyclopaedias –Directories.
SQL Basics Review Reviewing what we’ve learned so far…….
Data Resource Management Lecture 8. Traditional File Processing Data are organized, stored, and processed in independent files of data records In traditional.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Critical Design Review, April 2003
Introduction to Database Management Systems
Database Design Hacettepe University
Introducing Citilabs’ Scenario Based Master Network Data Model
First Public Data Releases from the VISTA Science Archive
Survey Results Respondents: 39 of 51 – 76%
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

WFCAM Science Archive Critical Design Review, April 2003 The SuperCOSMOS Science Archive (SSA) WFCAM Science Archive prototype Existing ad hoc flat file archive (inflexible, restricted access) re-implemented in an RDBMS Catalogue data only (no image pixel data) 1.26 Tbytes of catalogue data Implement a working service for users & developers to exercise prior to arrival of Tbytes of WFCAM data

WFCAM Science Archive Critical Design Review, April 2003 SSA has several similarities to WSA: spatial indexing is required over celestial sphere many source attributes in common, eg. position, brightness, colour, shape, … multi-colour, multi-epoch merged source information results from multiple measurements of the same source

WFCAM Science Archive Critical Design Review, April 2003 Entity – Relationship Models (ERMs): generalised, DBMS – independent simple, pictorial summary of relational design ERMs map directly to table design

WFCAM Science Archive Critical Design Review, April 2003 SSA relational model: very simple relational model total of 5 entities Catalogues have ~256 byte records with mainly 4-byte attributes, ie. 50 to 60 per record so 2 tables dominate the DB - SurveyCat: 0.82 Tbyte - MergedCat: 0.44 Tbyte

WFCAM Science Archive Critical Design Review, April 2003 SSA has been implemented and 1% of data ingested: as prototype for V1.0 WSA Windows/SQL Server => “SkyServer” real-world queries used to exercise SSA 100% ingested and online by end Q test-bed for user access tools and archive scientist curation

WFCAM Science Archive Critical Design Review, April 2003 Development method: “20 queries approach” a set of real-world astronomical queries, expressed in SQL includes joint queries between the SSA and SDSS currently have been exercised in the EDR region: - SSA: 13 million records; ~ 3 Gbyte - SDSS: 14 ; ~22

WFCAM Science Archive Critical Design Review, April 2003 WSA has significant differences, however: catalogue and pixel data; calibration and other extensive metadata; science – driven, nested survey programmes (as opposed to SSA “atlas” maps of whole sky) result in complex data structure; curation & update within DBMS (whereas SSA is a finished data product ingested once into the DBMS).

WFCAM Science Archive Critical Design Review, April 2003 WSA key requirements: flexibility: - ingested data are rich in structure - ingest occurs daily, curation daily/weekly/monthly … - many varied usage modes - protect proprietorial rights for many data scalability: - ~2 Tbyte of new catalogue & ancillary data per year rapid response: - need to maintain rapid response despite increasing data volumes

WFCAM Science Archive Critical Design Review, April 2003 Schematic picture of the WSA: Pixels: - one flat – file image store; access layer restricts public access - filenames and all metadata are tracked in DBMS tables with unrestricted access Catalogues: - WFAU incremental (no public access) - Public, released DBs - external survey datasets also held

WFCAM Science Archive Critical Design Review, April 2003 WFCAM pixel data pixel data consist of multiframes and combiframes in WSA parlance; stored as flat files (not BLOBs in the DBMS) metadata are stored in the DBMS library calibration frames are held default image products are held

WFCAM Science Archive Critical Design Review, April 2003 WFCAM multiframe - any pipeline product that: retains instrumental “paw print” as distinct images (WSA calls these “detector frames”) is not made up from other ingested frames (eg. microstep interleave is a multiframe) WSA includes difference images as multiframes

WFCAM Science Archive Critical Design Review, April 2003 WFCAM combiframe – any pipeline or archive product that: is the result of combination process on stored multiframes - eg. pipeline dither/stack/mosaic product - eg. archive default stack/mosaic product (NB: combiframe may still reflect the “paw print” so can have multiframe characteristics)

WFCAM Science Archive Critical Design Review, April 2003 Multiframe ERM: Programme & Field => vital library calibration multiframes stored & related primary/extension HDU keys logically stored & related this will work for VISTA

WFCAM Science Archive Critical Design Review, April 2003 Combiframe ERM: every combiframe has provenance linking to multiframes individual calibration frames not reqd. but individual confidence frames are combiframe may consist of multiframe-like detector combiframes

WFCAM Science Archive Critical Design Review, April 2003 Astrometric and photometric calibration data: require to store calibration information (SRAD) recalibration is required – esp. photometric (SRAD) old calibration coefficients must be stored (SRAD) time-dependence (versioning) complicates the relational model Calibration data are related to images; source detections are related to images and hence their relevant calibration data

Multiframe calibration data: “set-ups” define nightly detector & filter combinations: - extinctions have nightly values - zps have detector & nightly values coefficients split into current & previous entities Versioning & timing recorded highly non-linear systematics are allowed for via 2D maps

WFCAM Science Archive Critical Design Review, April 2003 Combiframe calibration data: no “set-ups”: each image separately calibrated; detector combiframes are catered for “luptidude” parameters stored for each image separately

WFCAM Science Archive Critical Design Review, April 2003 Catalogue data: general model related back through progenitor image to calibration data detection list for each programme (or set of sub-surveys) merged source entity is maintained merge events recorded list re-measurements derived

WFCAM Science Archive Critical Design Review, April 2003 Example: UKIDSS LAS, GPS & GCS LAS, GPS & GCS share one detection & one list re-measurement entity individual merged source and source re-measurement entities note curation information: - merge log (one per prog.) - current/repeat detections - primary/secondary (eg. overlaps) - new/old merge image flag (to trigger list re-measurement)

Critical Design Review, April 2003 Non-WFCAM data: general model each non-WFCAM survey has a stored catalogue cross-neighbour table: - records nearby sources between any two surveys - yields associated (“nearest”) source non-WFCAM list measurements where image data are available (NB: V2.0 requirement)

Critical Design Review, April 2003 Example: UKIDSS LAS & relationship to SDSS UKIDSS LAS overlaps with SDSS list measurements: - at positions defined by IR source, but in optical image data; - do not currently envisage implementing this the other way (ie. optical source positions placed in IR image data)

WFCAM Science Archive Curation – set of entities to track in-DBMS processing: archived programmes have: - required filter set - required join(s) - required list – driven measurement product(s) - release date(s) - final curation task - one or more curation timestamps a set of curation procedures is defined for the archive

The V2.0 R&D programme for the WSA: scalability issues speed: wish to maintain query response performance as catalogue data accumulate to many Tbytes - goal is ~100sec for Tbyte trawls (ie. non-indexed) data volume: wish to cope ultimately with single tables of size 10s of Tbytes or more …

But: we’re poor academics! limited financial resources - can’t afford rolls-royce SAN-type solution, for example staff resources limited - need low maintenance systems