Download presentation
Presentation is loading. Please wait.
Published byCornelius Rice Modified over 9 years ago
1
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO DES Data Management Ray Plante & Joe Mohr NCSAUI Astronomy
2
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Data Management Operations Sites: CTIO, La Serena, NCSA, Fermilab Common access to data by pipelines & users through archive Grid-based processing approach for automation and flexibility Community-friendly data release strategy Data Acquisition Short-term Archive Long-term Archive (NCSA) (Semi-) Replicated Archive Quick Quality Assessment & Image Correction Scheduling Operato r Mountaintop Public Access Proprietary Access transferingest Single Frame Calibration Pipeline Co-add Pipeline Science Analysis Pipelines Source Extraction Pipeline Grid-based Environment dependency transfer Basic Calibration & Quality Assessment La Serena SNe Analysis Short-term Archive Fermilab NOAO
3
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Overview of Operations CTIO (Mountaintop) –Quick quality assessment La Serena –10% of each night: repeat observations of SNe fields –Automated reduction/analysis to find SNe and follow light curves NCSA/UofI –Main processing facility, leveraging existing NCSA hardware Fermilab –Survey planning and simulation –augment hardware environment as needed (reprocessing) expect data to be transferred over network
4
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Data Release Strategy Single pointing images are automatically released ~1 year after acquisition Level 0: raw, uncalibrated data Level 1: calibrated, single-frame images Science products released twice: –Halfway point of the survey –One year after the end of the survey Level 2: Co-added/mosaiced images Level 3: “Pre-science” catalogs (object catalog) Science results released upon publication by science team Level 4: photo-z catalog, cluster catalog, etc.
5
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO The DES Archive Automated ingest of data products Common infrastructure for proprietary and public access –Difference is a matter of authorization Management via the Data Model –Tracking survey status, available products –Monitor data: included among level 0 products –Exposing sufficient metadata for external use Interactive Access –Primarily for public –Search and retrieval tools Leverage existing NCSA archives (BIMA/CARMA, ADIL, Quest, …) Programmatic Access –Access by DES pipelines –External Access by Virtual Observatory apps via standard interfaces Partial archive replication at partner sites
6
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Grid-based Pipeline Framework Purpose: –Support fully automated processing –Provide platform-independent execution environment Example topology #1: La Serena: Calibration, SNe analysis NCSA: Level 1-3 processing (including full Calibration) Photo-z catalog Fermilab: Cluster finding
7
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Grid-based Pipeline Framework Example topology #2 La Serena: Level 1 (Full Calibration,) SNe analysis NCSA: Level 2-3 processing Fermilab: Photo-z catalog Cluster finding Example topology #3 (Reprocessing) NCSA: Level 1-4 processing Common execution environment allows processing to be easily moved between sites
8
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Enabling Automation Critical for handling large data rate –Run all processing as data becomes available –Automated release of Level 0-1 data Automation based on events that trigger processing –First event: data lands in La Serena cache –Engages a pipeline on a set of data products Techniques for recovery from failure Biggest Challenge: automated quality assessment –Quantify measures of quality –Flag obvious problems –Filter down cases requiring human inspection –Attach quality measures to metadata
9
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Platform-independent execution environment “Application” level should not worry about what machine it is running on –Common way of initiating a pipeline application and passing in its inputs –Transparent access to data –Common logging –Transparent parallelism –Common exit strategy Status Declaration of output products
10
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Grid-based Capabilities Needed Data access through logical identifiers Automated archiving of processed products Workflow management Process monitoring, error detection, error recovery Transparent support for local authentication/authorization mechanisms Grid-based job execution –Hides local batch complication
11
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Leverage existing technologies and experience Astronomical pipelines –Community code: IRAF, SExtractor, … –NOAO Mosaic Pipeline, OPUS –BIMA Data Archive and Pipeline Real-time data ingest and automated release Grid-based image processing on NCSA platforms –Quest2 pipeline Deployed existing pipeline on TeraGrid platforms in ~2 weeks Grid technology used for replicating data between Caltech and NCSA NCSA and Fermilab programs for Grid Infrastructure Emerging collaboration between NCSA and NOAO –NCSA to provide data management services to NOAO archives NOAO, NCSA, Fermilab are partners in the National Virtual Observatory (NVO)
12
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Software Design & Engineering Data Management Steering Group –Cross-institution working group handles high-level design, policies & plan Design Deliverables –Currently underway: detailed DM requirements, high-level design, Work Package definitions Design & Development Process –Design Reviews –Coding Standards and software reviews –Testing framework, Data Challenge Definition –Reporting and effort tracking Must choose weight appropriate for project Process for Pipeline Framework Design –Understand constraints from target platforms and existing software technologies –Define reference platform: the environment that software must run in –Design data access, archive, and processing framework
13
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Schedule DevelopmentTesting/Data Challenges Year 1 Software Engineering Archive-based Collection Access Pipeline Systems framework, simulation, single-frame calibration Testing framework Year 2 Archive-based Collection Access Pipeline Systems framework, simulation, single-frame calibration, co-add Challenge I: hand-run test of existing software calibration thru object extract. Year 3 Pipeline Systems co-add, object extraction Operations software Challenge II: Automated test of archive and pipeline framework Data ingest thru single-frame cal. End of Year 3 Challenge III: Automated test of chained pipelines One year’s worth of data Data ingest thru co-add Year 4 Operations software Photo-z catalog Address data challenge issues, rerun as nec. Year 5 Operations
14
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Summary Data Management with strong support for community access –Common public/proprietary access infrastructure –Aggressive release schedule, with fast release of basic data products –Long-term archive plan –Building on NOAO-NCSA relationship Grid-based pipeline architecture –Automation, flexibility –Key to supporting geographically distributed processing Leverage existing software and technologies
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.