Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO DES Data Management Ray Plante.

Slides:



Advertisements
Similar presentations
Software Quality Assurance Plan
Advertisements

The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
NOAO/Gemini Data workshop – Tucson,  Hosted by CADC in Victoria, Canada.  Released September 2004  Gemini North data from May 2000  Gemini.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Components and Architecture CS 543 – Data Warehousing.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Microsoft SharePoint 2013 SharePoint 2013 as a Developer Platform
The Web 2.0 and the NOAO NVO Portal Christopher J. Miller Data Products Program CTIO/NOAO.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Upcoming Enhancements to the HST Archive Mark Kyprianou Operations and Engineering Division Data System Branch.
Commissioning the NOAO Data Management System Howard H. Lanning, Rob Seaman, Chris Smith (National Optical Astronomy Observatory, Data Products Program)
Hunt for Molecules, Paris, 2005-Sep-20 Software Development for ALMA Robert LUCAS IRAM Grenoble France.
Data Management Subsystem: Data Processing, Calibration and Archive Systems for JWST with implications for HST Gretchen Greene & Perry Greenfield.
MASSACHUSETTS INSTITUTE OF TECHNOLOGY NASA GODDARD SPACE FLIGHT CENTER ORBITAL SCIENCES CORPORATION NASA AMES RESEARCH CENTER SPACE TELESCOPE SCIENCE INSTITUTE.
DISTRIBUTED COMPUTING
Cluster Reliability Project ISIS Vanderbilt University.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The Dark Energy Survey Middleware LSST Workflow Workshop 09/2010.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
DR Software: Essential Foundational Elements and Platform Components UCLA Smart Grid Energy Research Center (SMERC) Industry Partners Program (IPP) Meeting.
Chapter 4 Realtime Widely Distributed Instrumention System.
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
Doug Tody E2E Perspective EVLA Advisory Committee Meeting December 14-15, 2004 EVLA Software E2E Perspective.
Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger September 29, 2009.
LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George.
John Peoples for the DES Collaboration BIRP Review August 12, 2004 Tucson1 DES Management  Survey Organization  Survey Deliverables  Proposed funding.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
DC2 Post-Mortem/DC3 Scoping February 5 - 6, 2008 DC3 Goals and Objectives Jeff Kantor DM System Manager Tim Axelrod DM System Scientist.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
EScience May 2007 From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories R. Chris Smith NOAO/CTIO, LSST.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
What the Data Products Program Offers Users Todd Boroson Dick Shaw Presentation to NOAO Users Committee October 23, 2003.
User Working Group 2013 Data Access Mechanisms – Status 12 March 2013
LSST VAO Meeting March 24, 2011 Tucson, AZ. Headquarters Site Headquarters Facility Observatory Management Science Operations Education and Public Outreach.
LIGO-G E LIGO Scientific Collaboration Data Grid Status Albert Lazzarini Caltech LIGO Laboratory Trillium Steering Committee Meeting 20 May 2004.
CPSC 171 Introduction to Computer Science System Software and Virtual Machines.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Pan-STARRS PS1 Published Science Products Subsystem Presentation to the PS1 Science Council August 1, 2007.
Proposal: staged delivery of Scheduler and OpSim V1 (2016) meet most of the SRD requirements – Deliver a system that can be extended with an improved scheduler.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
N. RadziwillEVLA Advisory Committee Meeting May 8-9, 2006 NRAO End to End (e2e) Operations Division Nicole M. Radziwill.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Mountaintop Software for the Dark Energy Camera Jon Thaler 1, T. Abbott 2, I. Karliner 1, T. Qian 1, K. Honscheid 3, W. Merritt 4, L. Buckley-Geer 4 1.
Evolving Security in WLCG Ian Collier, STFC Rutherford Appleton Laboratory Group info (if required) 1 st February 2016, WLCG Workshop Lisbon.
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
Distributed Archives Interoperability Cynthia Y. Cheung NASA Goddard Space Flight Center IAU 2000 Commission 5 Manchester, UK August 12, 2000.
George Kola Computer Sciences Department University of Wisconsin-Madison Data Pipelines: Real Life Fully.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
EVLA Data Processing PDR Pipeline design Tim Cornwell, NRAO.
Paul Alexander1 DS3 Deliverable Status 4 th SKADS Workshop, Lisbon, 2-3 October 2008 DS3 Deliverables Review.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
ISWG / SIF / GEOSS OOS - August, 2008 GEOSS Interoperability Steven F. Browdy (ISWG, SIF, SCC)
Online Software November 10, 2009 Infrastructure Overview Luciano Orsini, Roland Moser Invited Talk at SuperB ETD-Online Status Review.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
U.S. ATLAS Grid Production Experience
Optical Survey Astronomy DATA at NCSA
Joseph JaJa, Mike Smorul, and Sangchul Song
Leigh Grundhoefer Indiana University
Project Information Management Jiwei Ma
Technical Capabilities
Gordon Erlebacher Florida State University
Salesforce.com Salesforce.com is the world leader in on-demand customer relationship management (CRM) services Manages sales, marketing, customer service,
Presentation transcript:

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO DES Data Management Ray Plante & Joe Mohr NCSAUI Astronomy

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Data Management Operations Sites: CTIO, La Serena, NCSA, Fermilab Common access to data by pipelines & users through archive Grid-based processing approach for automation and flexibility Community-friendly data release strategy Data Acquisition Short-term Archive Long-term Archive (NCSA) (Semi-) Replicated Archive Quick Quality Assessment & Image Correction Scheduling Operato r Mountaintop Public Access Proprietary Access transferingest Single Frame Calibration Pipeline Co-add Pipeline Science Analysis Pipelines Source Extraction Pipeline Grid-based Environment dependency transfer Basic Calibration & Quality Assessment La Serena SNe Analysis Short-term Archive Fermilab NOAO

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Overview of Operations CTIO (Mountaintop) –Quick quality assessment La Serena –10% of each night: repeat observations of SNe fields –Automated reduction/analysis to find SNe and follow light curves NCSA/UofI –Main processing facility, leveraging existing NCSA hardware Fermilab –Survey planning and simulation –augment hardware environment as needed (reprocessing) expect data to be transferred over network

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Data Release Strategy Single pointing images are automatically released ~1 year after acquisition Level 0: raw, uncalibrated data Level 1: calibrated, single-frame images Science products released twice: –Halfway point of the survey –One year after the end of the survey Level 2: Co-added/mosaiced images Level 3: “Pre-science” catalogs (object catalog) Science results released upon publication by science team Level 4: photo-z catalog, cluster catalog, etc.

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO The DES Archive Automated ingest of data products Common infrastructure for proprietary and public access –Difference is a matter of authorization Management via the Data Model –Tracking survey status, available products –Monitor data: included among level 0 products –Exposing sufficient metadata for external use Interactive Access –Primarily for public –Search and retrieval tools Leverage existing NCSA archives (BIMA/CARMA, ADIL, Quest, …) Programmatic Access –Access by DES pipelines –External Access by Virtual Observatory apps via standard interfaces Partial archive replication at partner sites

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Grid-based Pipeline Framework Purpose: –Support fully automated processing –Provide platform-independent execution environment Example topology #1: La Serena: Calibration, SNe analysis NCSA: Level 1-3 processing (including full Calibration) Photo-z catalog Fermilab: Cluster finding

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Grid-based Pipeline Framework Example topology #2 La Serena: Level 1 (Full Calibration,) SNe analysis NCSA: Level 2-3 processing Fermilab: Photo-z catalog Cluster finding Example topology #3 (Reprocessing) NCSA: Level 1-4 processing Common execution environment allows processing to be easily moved between sites

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Enabling Automation Critical for handling large data rate –Run all processing as data becomes available –Automated release of Level 0-1 data Automation based on events that trigger processing –First event: data lands in La Serena cache –Engages a pipeline on a set of data products Techniques for recovery from failure Biggest Challenge: automated quality assessment –Quantify measures of quality –Flag obvious problems –Filter down cases requiring human inspection –Attach quality measures to metadata

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Platform-independent execution environment “Application” level should not worry about what machine it is running on –Common way of initiating a pipeline application and passing in its inputs –Transparent access to data –Common logging –Transparent parallelism –Common exit strategy Status Declaration of output products

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Grid-based Capabilities Needed Data access through logical identifiers Automated archiving of processed products Workflow management Process monitoring, error detection, error recovery Transparent support for local authentication/authorization mechanisms Grid-based job execution –Hides local batch complication

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Leverage existing technologies and experience Astronomical pipelines –Community code: IRAF, SExtractor, … –NOAO Mosaic Pipeline, OPUS –BIMA Data Archive and Pipeline Real-time data ingest and automated release Grid-based image processing on NCSA platforms –Quest2 pipeline Deployed existing pipeline on TeraGrid platforms in ~2 weeks Grid technology used for replicating data between Caltech and NCSA NCSA and Fermilab programs for Grid Infrastructure Emerging collaboration between NCSA and NOAO –NCSA to provide data management services to NOAO archives NOAO, NCSA, Fermilab are partners in the National Virtual Observatory (NVO)

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Software Design & Engineering Data Management Steering Group –Cross-institution working group handles high-level design, policies & plan Design Deliverables –Currently underway: detailed DM requirements, high-level design, Work Package definitions Design & Development Process –Design Reviews –Coding Standards and software reviews –Testing framework, Data Challenge Definition –Reporting and effort tracking Must choose weight appropriate for project Process for Pipeline Framework Design –Understand constraints from target platforms and existing software technologies –Define reference platform: the environment that software must run in –Design data access, archive, and processing framework

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Schedule DevelopmentTesting/Data Challenges Year 1 Software Engineering Archive-based Collection Access Pipeline Systems framework, simulation, single-frame calibration Testing framework Year 2 Archive-based Collection Access Pipeline Systems framework, simulation, single-frame calibration, co-add Challenge I: hand-run test of existing software calibration thru object extract. Year 3 Pipeline Systems co-add, object extraction Operations software Challenge II: Automated test of archive and pipeline framework Data ingest thru single-frame cal. End of Year 3 Challenge III: Automated test of chained pipelines One year’s worth of data Data ingest thru co-add Year 4 Operations software Photo-z catalog Address data challenge issues, rerun as nec. Year 5 Operations

Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO Summary Data Management with strong support for community access –Common public/proprietary access infrastructure –Aggressive release schedule, with fast release of basic data products –Long-term archive plan –Building on NOAO-NCSA relationship Grid-based pipeline architecture –Automation, flexibility –Key to supporting geographically distributed processing Leverage existing software and technologies