Data Management Challenges in Gaia

Slides:



Advertisements
Similar presentations
Lightning Imager and its Level 2 products Jochen Grandell Remote Sensing and Products Division.
Advertisements

Parallel and Distributed Simulation Global Virtual Time - Part 2.
GENIUS kick-off - November 2013 GENIUS kick-off meeting The Gaia context: DPAC & CU9 X. Luri.
Title: ICON L0 from MOC to SOC automation Author: MOC/SOC Resides: SOC Description: This is a script that copies the ICON L0 files from the MOC to the.
GRID Activities at ESAC Science Archives and Computer Engineering Unit Science Operations Department ESA/ESAC – Madrid, Spain.
NOAO/Gemini Data workshop – Tucson,  Hosted by CADC in Victoria, Canada.  Released September 2004  Gemini North data from May 2000  Gemini.
Rocio Guerra European Space Astronomy Centre 1 Gaia: la Galaxia en un Petabyte Mao- Menorca – 2 nd October 2009 ESAC and the Gaia Catalogue
15 December 2008Science from UKIDSS II WFCAM Science Pipeline Update WFCAM Science Pipeline Update Jim Lewis, Mike Irwin & Marco Riello Cambridge Astronomy.
RVS First Look ( WP6200) J.-M. Désert, G. Hébrard, A. Lecavelier, R. Ferlet, A. Vidal-Madjar Institut d’astrophysique de Paris (IAP) Workshop RVS Calibration.
RFA 18 – Automation of Operations Software Specific Request –Specify plans and requirements for automation of operations software, and describe the software.
Source detection at Saclay Look for a fast method to find sources over the whole sky Provide list of positions, allowing to run maximum likelihood locally.
30 March 2006Birmingham workshop1 The Gaia Mission A stereoscopic census of our Galaxy.
Spacecraft working group report SuperDARN workshop 2011 Rob Fear, Jim Wild & the spacecraft working group.
Data provenance in astronomy Bob Mann Wide-Field Astronomy Unit University of Edinburgh
Astronomical GRID Applications at ESAC Science Archives and Computer Engineering Unit Science Operations Department ESA/ESAC.
United Nations SPOC Meeting Miami, February 19-23, 2008 Local User Terminals Manuel Valenzuela.
ELISA Data Processing Centre Volker Beckmann APC, Francois Arago Centre A. Petiteau, E. Porter, G. Auger, E. Plagnol, P. Binétruy.
Gaia, next frontier in Astronomy Jose Hernandez Gaia Data and Calibration Engineer European Space Astronomy Centre (ESAC) Madrid, Spain.
DPC SGS-IR – ESTEC – January 2007 Anna Gregorio University of Trieste and INAF – Osservatorio Astronomico di Trieste On behalf of the Planck LFI.
Data Management Subsystem: Data Processing, Calibration and Archive Systems for JWST with implications for HST Gretchen Greene & Perry Greenfield.
AERONET Web Data Access and Relational Database David Giles Science Systems and Applications, Inc. NASA Goddard Space Flight Center.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
MASSACHUSETTS INSTITUTE OF TECHNOLOGY NASA GODDARD SPACE FLIGHT CENTER ORBITAL SCIENCES CORPORATION NASA AMES RESEARCH CENTER SPACE TELESCOPE SCIENCE INSTITUTE.
Maria Teresa Crosta and Francois Mignard Small field relativistic experiment with Gaia: detection of the quadrupolar light deflection.
Data Management Subsystem Jeff Valenti (STScI). DMS Context PRDS - Project Reference Database PPS - Proposal and Planning OSS - Operations Scripts FOS.
US Planck Data Analysis Review 1 Peter MeinholdUS Planck Data Analysis Review 9–10 May 2006 Where we need to be 2 months before launch- Instrument view.
Remote sensing and in situ measurements in the Global Earth Observing System of Systems Curtis Woodcock, Boston University.
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Archiving Standards.
Consortium Meeting La Palma October PV-Phase & Calibration Plans Sarah Leeks 1 SPIRE Consortium Meeting La Palma, Oct. 1 – PV Phase and.
06-1L Suzaku Suzaku UG - 13 Dec, 2007 Suzaku processing and archive Lorella Angelini/HEASARC.
Centre for eResearch The University of Auckland Research Data–Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, Centre for eResearch 24.
SCIOPS 2013 Reinhard Hanuschik, ESO Garching The VLT Quality Control Loop.
GLAST Science Support CenterNovember, 2005 GSSC User Committee Meeting Tools for Mission and Observation Planning Robin Corbet, GSSC
Planck Science Team – UniMi-Milano – 02 – 04 November 2005 Andrea Zacchei / Davide Maino Planck ST #25 – UniMi, November 2005 Inputs from: SISSA,
Swift HUG April Swift data archive Lorella Angelini HEASARC.
GALFA Software Group GALFA-HI –Narrow-band spectral line data cubes of HI emission GALFA-RRL –Narrow-bad spectral line data cubes of recombination lines.
ADMIT: ALMA Data Mining Toolkit  Developed by University of Maryland, University of Illinois, and NRAO (PI: L. Mundy)  Goal: First-view science data.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
C2d Data flow diagram BCD from SSC Texas SAO Quality Analysis and Improved Calibrated Data Mapping team.
Astrophysical Data. Data in Astrohysics 3 2D Images Spectra Tabular data 3D Cube data Simulation data Time series.
Planck Report on the status of the mission Carlo Baccigalupi, SISSA.
1 November 17, 2009 Jim Russell - EOPM Review GATS AIM Data Accessibility John McNabb, AIM Project Data Center Manager Hampton University.
The Gaia-ESO Survey Sofia Randich INAF-Arcetri Survey Co-PIs: Gerry Gilmore & Sofia Randich 350+ Co-Is (mostly from Europe, but not only) 90++ institutes.
1 Transiting Exoplanet Survey Satellite Daryl Swade Archive Team Meeting June 16, 2014.
EOVSA Pipeline Processing System J. McTiernan EOVSA Prototype Review 24-Sep-2012.
CH. 6 SELF CHECK QUIZ ARE YOU PREPARED FOR THE TEST?
06-1L Suzaku Suzaku UG - 3 May, 2007 Suzaku processing and archive Lorella Angelini/HEASARC.
1 Unbiased All-Sky Search (Michigan) [as of August 17, 2003] [ D. Chin, V. Dergachev, K. Riles ] Analysis Strategy: (Quick review) Measure power in selected.
ADMIT: ALMA Data Mining Toolkit  Developed by University of Maryland, University of Illinois, and NRAO (PI: L. Mundy)  Goal: First-view science data.
Denise Acquah November 28, th period. Landsat 7 Mission Picture.
1 COROT Science Week, Berlin, December 2003 COROT Week 5 Corotweeks' progress reports Operational orbit & its environment (I) Mission constraints.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
ESAC | Chris Watson | ESA/ESAC | Page 1 “Day in the life” testing of LL VMs CHRIS WATSON ESAC.
SINFONI data reduction using the ESO pipeline. Instrument design and its impact on the data (I) integral field spectrometer using mirrors brickwall pattern.
AGILE Data Center Activities C. Pittori (coord), F. Lucarelli, F. Verrecchia (INAF), G. Fanari (TPZ/Serco) 14 th AGILE WS, June 20-21, 2016.
A.Zanichelli, B.Garilli, M.Scodeggio, D.Rizzo
Calibration meeting summary
ESA's Ground Station Network Prospects for operations of the Lagrange missions S. Kraft OPS-L , K.-J. Schulz OPS-GS 08/03/2017.
ROCC Operations and The PSA RSP Archive Concept Review 28 September 2017 Tanya Lim Archive Scientist.
PDS4 Data From The Rover RSP Archive Concept Review 28 September 2017
ECT data sharing plan Which products do you intend to make available for use by other teams in their data analysis? When will these products be available?
“Running Monte Carlo for the Fermi Telescope using the SLAC farm”
Point Sources Jacob Feintzeig WIPAC − May 21, 2014
USGS Agency Status Landsat Operations Jenn Lacey 21 July 2016
Galaxy Evolution Explorer
Lunar Observation Data for GIRO Landsat–8 Operational Land Imager
Data analysis of photometric observations by HDAC  onboard Cassini: 3D mapping and in-flight calibrations Yuri Skorov, Horst Uwe Keller, Karl-Heinz.
Laura Bright David Maier Portland State University
WGISS Connected Data Assets Oct 24, 2018 Yonsook Enloe
Recent works on AMS TANG Zhi-Cheng.
Presentation transcript:

Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science Operations Centre ESA/ESAC/SRE-OOO

Gaia Observing strategy Data flow and Pipelines Data Challenges Outline Gaia Observing strategy Data flow and Pipelines Data Challenges Data Tracking Tools Examples

Gaia Observing strategy Survey Mission at L2 Scan the sky along great circles Accumulate the data on-board Download to Earth every night Full Sky Observed every 6 months Repeat it for at least 5 years => 10 Full Sky Maps

Gaia Observing strategy

Gaia: Some numbers after 5 years 100 Tb of raw data We expect to observe 109 Sources (could end up being 2x109) Spectra for 2x108 sources 80 Observations per source on average: 1011 Astro/Photo Observations 2x1010 Spectra

Operaciones ESOC DPAC ESAC Satellite New Norcia Cebreros Launcher Mission Operation Centre (MOC) ESOC Science Operation Centre (SOC) ESAC Launcher Satellite Data Processing & Analysis Consortium DPAC

Data flow ESOC DPAC ESAC Malargüe Satellite New Norcia Cebreros Mission Operation Centre (MOC) ESOC Science Operation Centre (SOC) ESAC Launcher Satellite Data Processing & Analysis Consortium DPAC Malargüe

Figure Courtesy A. Brown, DPAC Data flow Figure Courtesy A. Brown, DPAC

Data Processing Cycles Daily Pipelines MOC SOC MDB-00 MDB-01 MDB-02 <=8.5 Mbit/s DPCs DPCs DPCs

Sheer number of Observations Ensuring No Data Loss Some Challenges Sheer number of Observations Ensuring No Data Loss Managing the Daily Data Flow Data Tracking DPCs Autonomous and Geographically Distributed

Single Data Model/ICD with DPCs MDB Dictionary Tool on-line: Tools: Data Modeling Single Data Model/ICD with DPCs MDB Dictionary Tool on-line: Keeps track of versions, changes,… Immediate visibility Automatic generation of DM classes, DB schema, Data Consumers… DM evolution controlled by CCB

Data Management and Tracking All Data tagged with a barcode Named “Solution Identifier” It is just a Long (64bit) Number Each solutionId has some metadata

Data Tracking: solutionId Used to identify data Who, when, where generated the data What SW version, environment, run number, at what time We also use it to manage the daily data flow Related data gets same solutionId, this is a form of doing “data binning”

Data Tracking: solutionId Track Data Provenance Verify correct calibrations get used Find what was affected by incorrect data Remove incorrect data from the pipelines

Data Integrity and Completness Current Numbers: 10.4x109 Astro/Photo Observations 1.3x109 Spectra Received 6.3 Tb RAW Science Data 144Gb of HouseKeeping Data 21Tb Generated in the processing Typically the daily pipelines are writing thousands object/sec

Data Integrity and Completeness Challenges: Ensuring there are no data leakages Data consistency and completeness Within the pipelines and wrt the MDB DPCB DPCC MOC SOC MDB DPCG DPCT DPCI

All Gaia Data can be related to On Board Time, examples: Time Data Binning All Gaia Data can be related to On Board Time, examples: At time x the source image crosses CCD At time y Charge Injections occur Spacecraft attitude Use OBMT to collapse records of the same time together and count the number of Objects per bin

Time Data Binning

Time Data Binning Galactic Centre Crossings FOV-P FOV-F Galaxy Tail

We can then compare the TimeLine data at different points Time Data Binning Data Binning gets done on the fly as the pipeline stores it, no overhead We can then compare the TimeLine data at different points We can also check Data Consistency All the checks can be automated and alarms raised if problems found

Examples: Omega Centauri

Time Data Binning Galactic Plane Omega Centauri Crossing (FOV-P)

Omega Centauri observation 100,000 Observations 50 sec

Omega Centauri observation

Questions? NGC 1818 in LMC