December 11, 2006 DES DM - Mohr The DES DM Team Tanweer Alam 1 Dora Cai 1 Joe Mohr 1,2 Jim Annis 3 Greg Daues 1 Choong Ngeow 2 Wayne Barkhouse 2 Patrick.

Slides:



Advertisements
Similar presentations
ADASS XVII Sep 2007The NOAO Pipeline Applications Francisco Valdes (NOAO) Robert Swaters (UMd) Derec Scott (NOAO) Mark Dickinson (NOAO)
Advertisements

VISTA/WFCAM pipelines summit pipeline: real time DQC verified raw product to Garching standard pipeline: instrumental signature removal, catalogue production,
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
15 December 2008Science from UKIDSS II WFCAM Science Pipeline Update WFCAM Science Pipeline Update Jim Lewis, Mike Irwin & Marco Riello Cambridge Astronomy.
Kian-Tat Lim Offline Computing November 12 th, LCLS Offline Data Management.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Cosmos Data Products Version Peter Capak (Caltech) May, 23, 2005 Kyoto Cosmos Meeting.
VISTA pipelines summit pipeline: real time DQC verified raw product to Garching standard pipeline: instrumental signature removal, catalogue production,
An Astronomical Image Mosaic Service for the National Virtual Observatory
An Astronomical Image Mosaic Service for the National Virtual Observatory / ESTO.
Introduction to Spitzer and some applications Data products Pipelines Preliminary work K. Nilsson, J.M. Castro Cerón, J.P.U. Fynbo, D.J. Watson, J. Hjorth.
18 April 2007 Second Generation VLT Instruments 1 VIRCAM & CPL: Lessons Learned Jim Lewis and Peter Bunclark Cambridge Astronomy Survey Unit.
A Primer on Image Acquisition and Data Reduction Using TheSky6, CCDSoft V5 and Microsoft Excel Thomas C. Smith Dark Ridge Observatory (DRO)
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Commissioning the NOAO Data Management System Howard H. Lanning, Rob Seaman, Chris Smith (National Optical Astronomy Observatory, Data Products Program)
GAUDI Ground-based Asteroseismology Uniform Database Interface E. Solano Bases de données en spectroscopie stellaire. Paris.
Why Build Image Mosaics for Wide Area Surveys? An All-Sky 2MASS Mosaic Constructed on the TeraGrid A. C. Laity, G. B. Berriman, J. C. Good (IPAC, Caltech);
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
The GAVO Cross-Matcher Application Hans-Martin Adorf, Gerard Lemson, Wolfgang Voges GAVO, Max-Planck-Institut für extraterrestrische Physik, Garching b.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
EdSkyQuery-G Overview Brian Hills, December
● DES Galaxy Cluster Mock Catalogs – Local cluster luminosity function (LF), luminosity-mass, and number-mass relations (within R 200 virial region) from.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The Dark Energy Survey Middleware LSST Workflow Workshop 09/2010.
March 2, 2007 DESDM Director's Review - Mohr Director’s Review of DESDM NCSA Director’s Review took place at NCSA on February 20, 2007 NCSA Director’s.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Using the NSA Presentation to NOAO Users Committee October 5, 2005.
CRISP & SKA WP19 Status. Overview Staffing SKA Preconstruction phase Tiered Data Delivery Infrastructure Prototype deployment.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
DC2 Post-Mortem/DC3 Scoping February 5 - 6, 2008 DC3 Goals and Objectives Jeff Kantor DM System Manager Tim Axelrod DM System Scientist.
1 System wide optimization for dark energy science: DESC-LSST collaborations Tony Tyson LSST Dark Energy Science Collaboration meeting June 12-13, 2012.
EScience May 2007 From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories R. Chris Smith NOAO/CTIO, LSST.
Data and storage services on the NGS Mike Mineter Training Outreach and Education
1 PreCam Update: Simulations Douglas Tucker, DES-Calib Telecon, 28 Jan 2008 Courtesy: NOAO/AURA/NSF UM Curtis-Schmidt SMARTS 1m SMARTS 1.5m SMARTS 0.9m.
Jim Annis for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Design of the Dark Energy Survey James Annis.
Quality Assurance Benchmark Datasets and Processing David Nidever SQuaRE Science Lead.
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
Data Analysis Software Development Hisanori Furusawa ADC, NAOJ For HSC analysis software team 1.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
John Peoples October 3, The Dark Energy Survey Structure, Management, and Oversight A presentation to the Directors of Fermilab, NCSA and NOAO.
April 2001 OPTICON workshop in Nice 1 The PSF homogenization problem in large imaging surveys Emmanuel BERTIN (TERAPIX)
Modeling and Correcting the Time- Dependent ACS PSF for Weak Lensing Jason Rhodes, JPL With: Justin Albert (Caltech) Richard Massey (Caltech) HST Calibration.
VST ATLAS: Requirements, Operations and Products Tom Shanks, Nigel Metcalfe, Jamie McMillan (Durham Univ.) +CASU.
From photons to catalogs. Cosmological survey in visible/near IR light using 4 complementary techniques to characterize dark energy: I. Cluster Counts.
C2d Data flow diagram BCD from SSC Texas SAO Quality Analysis and Improved Calibrated Data Mapping team.
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO DES Data Management Ray Plante.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Distributed Data for Science Workflows Data Architecture Progress Report December 2008.
IPHAS Early Data Release E. A. Gonzalez-Solares IPHAS Consortium AstroGrid National Astronomy Meeting, 2007.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Gravitational Lensing
National Center for Supercomputing Applications Dark Energy Survey Collaboration Meeting Data Management Status December 11, 2006 Chicago Cristina Beldica.
Data and storage services on the NGS.
Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
George Kola Computer Sciences Department University of Wisconsin-Madison Data Pipelines: Real Life Fully.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
26th October 2005 HST Calibration Workshop 1 The New GSC-II and it’s Use for HST Brian McLean Archive Sciences Branch.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
GSPC -II Program GOAL: extend GSPC-I photometry to B = V ˜ 20 add R band to calibrate red second-epoch surveys HOW: take B,V,R CCD exposures centered at.
1 DES Calibrations: Remaining Loose Ends for Year 1 (Update: 23 March 2012) Douglas L. Tucker.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Astronomy toolkits and data structures Andrew Jenkins Durham University.
National Center for Supercomputing Applications Data Management for the Dark Energy Survey OSG Consortium All Hands Meeting SDSC, San Diego CA March 2007.
U.S. ATLAS Grid Production Experience
Outline What users want ? Data pipeline overview
Presentation transcript:

December 11, 2006 DES DM - Mohr The DES DM Team Tanweer Alam 1 Dora Cai 1 Joe Mohr 1,2 Jim Annis 3 Greg Daues 1 Choong Ngeow 2 Wayne Barkhouse 2 Patrick Duda 1 Ray Plante 1 Cristina Beldica 1 Huan Lin 3 Douglas Tucker 3 1 NCSA 2 UIUC Astronomy 3 Fermilab Astronomers Grid Computing, Middleware, Portals Database development, maintenance, Archive web portal NVO lead at NCSA Senior Developer Oversight Group Randy Butler, Mike Freemon, and Jay Alameda (NCSA)

December 11, 2006 DES DM - Mohr Architecture Overview Components: Pipelines Archive Portals Development: 30 FTE-yrs total Current status: 13 FTE-yrs to date

December 11, 2006 DES DM - Mohr Where are we today? Iterative/Spiral Development Oct ‘04-Sep’05: initial design and development Oct ‘04-Sep’05: initial design and development basic image reduction, cataloguing, catalog and image archive, etc basic image reduction, cataloguing, catalog and image archive, etc Oct ‘05-Jan’06: DC 1 = deployed DES DM system v1 Oct ‘05-Jan’06: DC 1 = deployed DES DM system v1 Used Teragrid to reduce 700GB of simulated raw data [Fermilab] into 5TB of images, weight maps, bad pixel maps, catalogs Used Teragrid to reduce 700GB of simulated raw data [Fermilab] into 5TB of images, weight maps, bad pixel maps, catalogs Catalogued, ingested and calibrated 50M objects Catalogued, ingested and calibrated 50M objects Feb’06-Sep’06: refine & develop Feb’06-Sep’06: refine & develop full science processing through coaddition, greater automation, ingestion from HPC platforms, quality assurance, etc full science processing through coaddition, greater automation, ingestion from HPC platforms, quality assurance, etc Oct’06-Jan ‘07: DC 2 = deploy DES DM system v2 Oct’06-Jan ‘07: DC 2 = deploy DES DM system v2 Use NCSA and SDSC Teragrid platforms to process 500deg 2 in griz with 4 layers of imaging in each (equiv to 20% of SDSS imaging dataset, 350M objects) Use NCSA and SDSC Teragrid platforms to process 500deg 2 in griz with 4 layers of imaging in each (equiv to 20% of SDSS imaging dataset, 350M objects) Use DES DM system on workstation to reduce Blanco Cosmology Survey data ( from MOSAIC2 camera Use DES DM system on workstation to reduce Blanco Cosmology Survey data ( from MOSAIC2 camerahttp://cosmology.uiuc.edu/BCS Evaluate ability to meet DES data quality requirements Evaluate ability to meet DES data quality requirements DC1 Astrometry DC1 Photometry

December 11, 2006 DES DM - Mohr DES Archive Components of the DES Archive Components of the DES Archive Archive nodes: filesystems that can host DES data files Archive nodes: filesystems that can host DES data files Large number-- no meaningful limit Large number-- no meaningful limit Distributed-- assumed to be non-local Distributed-- assumed to be non-local Database: tracks data using metadata describing the files and file locations Database: tracks data using metadata describing the files and file locations Archive web portal: allows external (NVO) users to select and retrieve data from the DES archive Archive web portal: allows external (NVO) users to select and retrieve data from the DES archive Try it at Try it at

December 11, 2006 DES DM - Mohr Archive Filesystem Structure host:/${root}/Archiveraw/ ${nite}/ (des , des , etc) src/original data from telescope raw/split and cross-talk corrected data log/logs from observing and processing red/${runid}/ xml/location of main OGRE workflows etc/location of SExtractor config files, etc bin/all binaries required for job data/${nite}/ cal/biases, flats, illumination correction, etc raw/simply a link to appropriate raw data log/processing logs ${band1}/reduced images and catalogs for ${band1} ${band2}/and so on for each band … cal/ calibration data (bad pixel masks, pupil ghosts) coadd/ holds co-added data within ${project}, ${tilename}, ${runid}

December 11, 2006 DES DM - Mohr DES Database Image metadata: Image metadata: Many header parameters (including WCS params) Many header parameters (including WCS params) All image tags that uniquely identify the DES archive location All image tags that uniquely identify the DES archive location ${archive_site} (fnal, mercury, gpfs-wan, bcs, etc) ${archive_site} (fnal, mercury, gpfs-wan, bcs, etc) ${imageclass}= (raw, red, coadd, cal) ${imageclass}= (raw, red, coadd, cal) ${nite}, ${runid}, ${band}, ${imagename} ${nite}, ${runid}, ${band}, ${imagename} ${ccd_number}, ${tilename}, ${imagetype} ${ccd_number}, ${tilename}, ${imagetype} As long as we adopt a fixed archive structure we can very efficiently track extremely large datasets As long as we adopt a fixed archive structure we can very efficiently track extremely large datasets Simulation metadata: Simulation metadata: We could easily extend the DES archive to track simulation data We could easily extend the DES archive to track simulation data Need to adopt some logical structure and we could be up and running very rapidly Need to adopt some logical structure and we could be up and running very rapidly

December 11, 2006 DES DM - Mohr Data Access Framework With DC2 we are fielding grid data movement tools that are integrated with the DES archive With DC2 we are fielding grid data movement tools that are integrated with the DES archive ar_copy: copies dataset from one archive node to another ar_copy: copies dataset from one archive node to another ar_verify: file by file comparison of datasets on two archive nodes ar_verify: file by file comparison of datasets on two archive nodes ar_remove: deletes dataset from archive node ar_remove: deletes dataset from archive node These tools update file locations within the DES database These tools update file locations within the DES database Data selected using file tags: Data selected using file tags: ar_copy -imclass=raw -nite=des imagetype=src mercury gpfs-wan ar_copy -imclass=raw -nite=des imagetype=src mercury gpfs-wan ar_copy -imclass=red -runid=DES _des _01 mercury mss ar_copy -imclass=red -runid=DES _des _01 mercury mss Underlying grid-ftp tools can vary with archive node Underlying grid-ftp tools can vary with archive node Most sites use Trebuchet, data movement tools integrated with the Elf/OGRE middleware development project at NCSA Most sites use Trebuchet, data movement tools integrated with the Elf/OGRE middleware development project at NCSA FNAL uses globus-url-copy, because there’s an incompatibility with Trebuchet listing FNAL uses globus-url-copy, because there’s an incompatibility with Trebuchet listing Metadata in the DES db encode the grid-ftp technology as well as combinations of buffer sizes, number of parallel streams, etc for moving “large” and “small” files Metadata in the DES db encode the grid-ftp technology as well as combinations of buffer sizes, number of parallel streams, etc for moving “large” and “small” files Recent test by Greg Daues achieved 100MB/s for single copy… Typically we’ve combined 5 or 6 copies in parallel to achieve total data movement off Mercury of about 50MB/s Recent test by Greg Daues achieved 100MB/s for single copy… Typically we’ve combined 5 or 6 copies in parallel to achieve total data movement off Mercury of about 50MB/s

December 11, 2006 DES DM - Mohr Archive Portal: You will be redirected to NVO Login

December 11, 2006 DES DM - Mohr Archive Portal: Image Query

December 11, 2006 DES DM - Mohr DC2 Overview Transferred 10 nights of simulated data from FNAL Enstore Transferred 10 nights of simulated data from FNAL Enstore Roughly 3000 DECam exposures {500 deg2 in griz 4 layers deep plus 50 flats/biases each night} Roughly 3000 DECam exposures {500 deg2 in griz 4 layers deep plus 50 flats/biases each night} Currently: Processed 8 of 10 nights Currently: Processed 8 of 10 nights Use Convert_Ingest pipeline to split data {crosstalk corr in this stage} Use Convert_Ingest pipeline to split data {crosstalk corr in this stage} Typically 20 jobs, each running a couple of hours Typically 20 jobs, each running a couple of hours Raw data are 600GB for each night Raw data are 600GB for each night Submit 62 processing jobs for each of these nights Submit 62 processing jobs for each of these nights Each night produces 3.4TB, ~35 million catalogued objects for ingestion Each night produces 3.4TB, ~35 million catalogued objects for ingestion Each job takes around 11hrs… 1 CPU-month to reduce a night of data Each job takes around 11hrs… 1 CPU-month to reduce a night of data Stages: zerocombine, flatcombine, imcorrect, astrometry, remapping, cataloguing, fitscombine, ingestion Stages: zerocombine, flatcombine, imcorrect, astrometry, remapping, cataloguing, fitscombine, ingestion Currently some jobs fail because of failures in astrometric refinement… Currently some jobs fail because of failures in astrometric refinement… Ingest objects into the db Ingest objects into the db Move data from processing platforms to storage cluster and mass storage Move data from processing platforms to storage cluster and mass storage Then determine photometric solution for each band/night Then determine photometric solution for each band/night Update zeropoints for all objects/images for that night Update zeropoints for all objects/images for that night Total data production: 4.8TB raw, 27TB reduced, ~240 million objects Total data production: 4.8TB raw, 27TB reduced, ~240 million objects Still to do: complete processing, co-add all data, extract summary statistics Still to do: complete processing, co-add all data, extract summary statistics

December 11, 2006 DES DM - Mohr DC2 Challenges Scale of data- almost overwhelming overwhelming Scale of data- almost overwhelming overwhelming 330GB arrive… 3.4TB produced by next day 330GB arrive… 3.4TB produced by next day Ingesting 35 million objects is a challenge-- takes 10 hours if ingest rate is 1000 objects/s Ingesting 35 million objects is a challenge-- takes 10 hours if ingest rate is 1000 objects/s Exploring sqlldr alternatives-- most come with a price Exploring sqlldr alternatives-- most come with a price Moving processed data off compute nodes is a challenge- takes about 10 hours if transfer rate is 100MB/s Moving processed data off compute nodes is a challenge- takes about 10 hours if transfer rate is 100MB/s New data movement tools making this more reliable and automatic New data movement tools making this more reliable and automatic Astrometry problems persist Astrometry problems persist With BCS data we find that astrometry errors are bad enough to produce double sources in a few percent of the images== this translates to at least one failure per co-added image With BCS data we find that astrometry errors are bad enough to produce double sources in a few percent of the images== this translates to at least one failure per co-added image Taking advice of Emmanuel Bertin to run SCAMP on a per exposure basis rather than a per image basis-- new astrometric refinement framework currently being tested Taking advice of Emmanuel Bertin to run SCAMP on a per exposure basis rather than a per image basis-- new astrometric refinement framework currently being tested

December 11, 2006 DES DM - Mohr DC2 Photometry and Astrometry Nightly spot checks-- no exhaustive testing so far Nightly spot checks-- no exhaustive testing so far Astrometry scatter plots look much like DC1 Astrometry scatter plots look much like DC1 Photometry scatter plots don’t look as good, but we think we have figured out why Photometry scatter plots don’t look as good, but we think we have figured out why Diffraction spikes/halos added to stars in ImSim2 Diffraction spikes/halos added to stars in ImSim2 Done in such a way as to augment total stellar flux Done in such a way as to augment total stellar flux This leads to an offset in our photometry at the few percent level This leads to an offset in our photometry at the few percent level Detailed statistics await further testing Detailed statistics await further testing What is full distribution of astrometric and photometric errors? What is full distribution of astrometric and photometric errors? How do both depend on seeing, location on the chip, intrinsic galaxy parameters, etc… How do both depend on seeing, location on the chip, intrinsic galaxy parameters, etc…

December 11, 2006 DES DM - Mohr Coaddition Framework Three steps to coaddition Three steps to coaddition Remapping images to std reference frame Remapping images to std reference frame Determining relative flux scale for overlapping remapped images Determining relative flux scale for overlapping remapped images Combining remapped images (with filtering) Combining remapped images (with filtering) DES DM enables a simple automated coadd DES DM enables a simple automated coadd Coadd tiling stored as metadata in the db Coadd tiling stored as metadata in the db db tools: db tools: find all tiles associated with image find all tiles associated with image find all images associated with tile find all images associated with tile Execution Execution Reduced images immediately remapped (SWarp) to each tile they overlap (and catalogued) Reduced images immediately remapped (SWarp) to each tile they overlap (and catalogued) Flux scales determined through (1) db object matching in overlapping images, (2) photometric calibration and (3) relative throughput of chips Flux scales determined through (1) db object matching in overlapping images, (2) photometric calibration and (3) relative throughput of chips Image combine (SWarp) happens en masse using archive to find correct image combinations Image combine (SWarp) happens en masse using archive to find correct image combinations Co-add Tiling DECam Imaging Layers

December 11, 2006 DES DM - Mohr BCS Coadd Tests Test framework by creating 46 coadd tiles that draw images from 10 different nights Test framework by creating 46 coadd tiles that draw images from 10 different nights griz, 36’X36’ with 0.26” pixels griz, 36’X36’ with 0.26” pixels <1hr job on server with 14 drive RAID5 disk array <1hr job on server with 14 drive RAID5 disk array Issues: Issues: Flux scaling ignored Flux scaling ignored Combine algorithm = sum Combine algorithm = sum Science quality? Science quality? Some astrometry failures (double sources) Some astrometry failures (double sources) z (3 deep) i (3 deep) r (2 deep) g (2 deep) 4’

December 11, 2006 DES DM - Mohr Weak Lensing Framework [Mike Jarvis, Bhuv Jain, Gary Bernstein, Erin Sheldon] Science Strategy: Science Strategy: start from complete object lists and measure shear for each object jointly using all available reduced data start from complete object lists and measure shear for each object jointly using all available reduced data Draft DES DM strategy: Draft DES DM strategy: Measure shapes of all objects on reduced images as part of standard reduction and cataloguing Measure shapes of all objects on reduced images as part of standard reduction and cataloguing Use isolated stars to model PSF distortions across the survey Use isolated stars to model PSF distortions across the survey Catalog on coadded images to create complete object lists Catalog on coadded images to create complete object lists Use archive tools to select all reduced objects (and images) for joint shear measurements that include PSF corrections Use archive tools to select all reduced objects (and images) for joint shear measurements that include PSF corrections Implementation just in infancy Implementation just in infancy Shape measurements: one more module for pipeline, db schema change Shape measurements: one more module for pipeline, db schema change Modeling PSF distortions: computational (not data) challenge Modeling PSF distortions: computational (not data) challenge Complete object lists: Coadd catalogs already available in db Complete object lists: Coadd catalogs already available in db Final shear measurements: a data challenge Final shear measurements: a data challenge Apply data parallel approach grouping by sky coordinates (coadd tiling) Apply data parallel approach grouping by sky coordinates (coadd tiling)