Data Acquisition & Distribution; and Archiving Plans Bill Boroski SDSS-II First Year Review National Science Foundation August 7, 2006.

Slides:



Advertisements
Similar presentations
An Introduction To Heritrix
Advertisements

Data mirrors: Techniques and Issues
VO/IVOA and The Astronomy Community Dave De Young NOAO.
SDSS-II Development Projects Bill Boroski SDSS-II First Year Review National Science Foundation August 7, 2006.
Cultural Heritage in REGional NETworks REGNET Project Meeting Content Group
By Rick Clements Software Testing 101 By Rick Clements
Enterprise Resource Planning It is not the end, it is just the beginning Mary Avery Finance Manager Nebraska Auditor of Public Accounts 2006 Joint NSAA/NASC.
Global Hands-On Universe meeting July 15, 2007 Authentic Data in the Classroom with the Sloan Digital Sky Survey Jordan Raddick (Johns Hopkins University)
Making the System Operational
Case Study: Photo.net March 20, What is photo.net? An online learning community for amateur and professional photographers 90,000 registered users.
Software change management
Information Systems Today: Managing in the Digital World
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
Copyright © 2011 by the Commonwealth of Pennsylvania. All Rights Reserved. Load Test Report.
Student Lab Management IT Services March Our Purpose this evening … o Foster good communication o Improve Relationships and Collaboration o Prepare.
May 2009 D2L Upgrade to Version 8.4 Desire2Learn Changes in Version 8.4.
Eötvös University Budapest in the Network.  Seniors: István Csabai (node coordinator): »Photometric redshift estimation, virtual observatories, science.
IDMS Conversion Certification Presentation 10/28/09 Project Certification Committee.
Development of China-VO ZHAO Yongheng NAOC, Beijing Nov
1 School District of Escambia County Class Size Presentation August 2010.
Cooking with Sloan: Special DR6 Recipes Jordan Raddick, Ani Thakar Alex Szalay, Maria Nieto-Santisteban, Nolan Li, Wil O’Mullane, Adrian Pope, Tamas Budavari,
DES PreCam A Proposal for the PreCam Calibration Strategy J. Allyn Smith David L. Burke Melissa J. Butner 15 July 2009 DRAFT.
SDSS and UKIDSS Jon Loveday University of Sussex.
A brief introduction of SDSS II data Liu Chengze Shanghai Astronomical Observatory.
NOAO/Gemini Data workshop – Tucson,  Hosted by CADC in Victoria, Canada.  Released September 2004  Gemini North data from May 2000  Gemini.
MSIS 110: Introduction to Computers; Instructor: S. Mathiyalakan1 Systems Design, Implementation, Maintenance, and Review Chapter 13.
Deploying Visual Studio Team System 2008 Team Foundation Server at Microsoft Published: June 2008 Using Visual Studio 2008 to Improve Software Development.
Managing Multi-User Databases AIMS 3710 R. Nakatsu.
CXC Implementing 2007 NRC Portals of the Universe Report Chandra X-ray Center Recommended Best Practices Roger Brissenden and Belinda Wilkes 25 April 2012.
Software Distribution Overview Prepared By: Melvin Brewster Chaofeng Yan Sheng Shan Zhao Khanh Vu.
Data Management Subsystem: Data Processing, Calibration and Archive Systems for JWST with implications for HST Gretchen Greene & Perry Greenfield.
Hands-On Microsoft Windows Server 2003 Administration Chapter 2 Managing Windows Server 2003 Hardware and Software.
IXYZ Frank Marshall NASA/GSFC 25 April 2012 April 25, 20121Implementing Portals of the Universe.
15th Dec, 2005UKIDSS SciVer meeting, Edinburgh1 UKIDSS LAS Image Classification Analysis and Some Miscellaneous Issues Richard McMahon Bram Venemans Institute.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
Principles of Information Systems, Sixth Edition Systems Design, Implementation, Maintenance, and Review Chapter 13.
SDSS-KSG 08 Workshop1 The SDSS DR7 and KIAS SDSS mirror Won-Kee Park ARCSEC, Sejong University 2008 SDSS-KSG Workshop.
National Center for Supercomputing Applications NCSA OPIE Presentation November 2000.
IT 456 Seminar 5 Dr Jeffrey A Robinson. Overview of Course Week 1 – Introduction Week 2 – Installation of SQL and management Tools Week 3 - Creating and.
Principles of Information Systems, Sixth Edition Systems Design, Implementation, Maintenance, and Review Chapter 13.
08/30/05GDM Project Presentation Lower Storage Summary of activity on 8/30/2005.
Discussion - Survey Design Survey product equation: #fields = fld/nt x useable x (%xnights/yr) x years = 4 x 0.5 x (0.75 x 13 x 18) x 3 = 4 x 0.5 x 175.
Swift HUG April Swift data archive Lorella Angelini HEASARC.
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
G. Miknaitis SC2006, Tampa, FL Observational Cosmology at Fermilab: Sloan Digital Sky Survey Dark Energy Survey SNAP Gajus Miknaitis EAG, Fermilab.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
GIST 19: GGSPS status Status of GGSPS development and operations Andy Smith GGSPS software project manager.
06-1L ASTRO-E2 ASTRO-E2 User Group - 14 February, 2005 Astro-E2 Archive Lorella Angelini/HEASARC.
Principles of Information Systems, Sixth Edition 1 Systems Design, Implementation, Maintenance, and Review Chapter 13.
The 6dF Galaxy Redshift Survey: Status Report and Data Release 2 Heath Jones ANU/AAO Will Saunders AAO Mike Read Edinburgh Matthew Colless AAO 100,000.
STEREO Science Center Status Report William Thompson NASA Goddard Space Flight Center STEREO SWG November 2007 Pasadena, California.
1 SUZAKU HUG 12-13April, 2006 Suzaku archive Lorella Angelini/HEASARC.
Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Office of Administration Enterprise Server Farm November 2004 Briefing.
GSPC -II Program GOAL: extend GSPC-I photometry to B = V ˜ 20 add R band to calibrate red second-epoch surveys HOW: take B,V,R CCD exposures centered at.
LSST Commissioning Overview and Data Plan Charles (Chuck) Claver Beth Willman LSST System Scientist LSST Deputy Director SAC Meeting.
Smart Versioning: Get Relevant, Save Money
Managing Multi-User Databases
Overview – SOE PatchTT November 2015.
RCM Turbo SQL Version.
Modernization of Navigation Statistics Publishing
Leigh Grundhoefer Indiana University
Databases, Web Pages and Archives
Cloud computing mechanisms
Department of Licensing HP 3000 Replatforming Project Closeout Report
Welcome Traceability Software Integrators
Google Sky.
PerformanceBridge Application Suite and Practice 2.0 IT Specifications
Presentation transcript:

Data Acquisition & Distribution; and Archiving Plans Bill Boroski SDSS-II First Year Review National Science Foundation August 7, 2006

Data Acquisition & Processing Operations Apache Point Observatory Fermilab Data transfer via the web Plug plate designs Plug plates U. of Washington Princeton Data transfer via the web

Data Acquisition / Observatory Operations Observing operations at APO –Performance metrics remain solid Ave. imaging efficiency = 87% (Aug-Jan); 72% (Mar-Jun) Ave. spectro efficiency = 67% Ave. system uptime = 96% –Data transfers via 15 Mbps microwave link Only two disruptions in the first 9 months of operation Recovery was relatively quick; no data lost –SN compute cluster for on-the-mountain reductions –Fire protection upgrades

Data Processing Operations Data Processing Operations at Fermilab –Production processing of all Legacy, SEGUE and Supernova Survey data; target selection and plate design –Operations continue to run smoothly and efficiently –Implemented automated method for pulling data from APO and verifying data integrity –Implemented scripts to crawl entire spinning data set to detect file corruption, missing files, etc. Data Processing Operations at Princeton –DP hardware cluster installed and commissioned –Imaging and spectro data are being reduced –Imaging reductions being used to support Photo pipeline and photometric calibration work –Spectro reductions being used in Spectro v5 pipeline development

Supernova Survey Data Release 1 First public release of SN data occurred in January 2006 –Contains data from 72 imaging runs obtained during the fall 2005 observing campaign (Sep 1 through Nov 30, 2005) –Contains an additional 16 imaging runs obtained during the fall 2004 season –Data taken on a 2.5 degree wide region along the celestial equator, from -60 < RA < 60 (SDSS Stripe 82) URL (linked off the SDSS home page): Web page provides access to corrected frames and uncalibrated catalogs via pared-down DAS script –Also provides list of available runs with information regarding data quality Future data releases will occur incrementally during each fall SN observing season –Releases will occur roughly 4-6 weeks after data collection

Data Release 5 Contains survey-quality Legacy and SEGUE data collected through June 30, Imaging Footprint Area8,000 sq. deg. Imaging Catalog215 million objects Data Volume Images9.0 TB Catalogs (DAS, fits format) (CAS, SQL DB) 1.8 TB 3.6 TB Spectroscopy Spectro Area5,740 sq. deg. Total # of spectra1,048,960 Galaxies Quasars (redshift < 2.3) Quasars (redshift > 2.3) Stars M stars and later Sky spectra Unknown 674,749 79,934 11, ,925 60,808 55,555 12,312 Also contains data from 361 extra and special plates –Repeat observations, observations of spectro targets selected by collaboration members for specialized science programs. The final release of the SDSS Survey. All data now in public domain.

Data Release 5 (contd) October 21, initial DR5 CAS and DAS released to collaboration May 4, 2006 – DAS and enhanced CAS released to collaboration Photometric redshifts of galaxies –computed by two different groups within the collaboration Detailed coverage masks –allow LSS researchers to easily calculate power spectrum and related quantities; –Allow computation of the area covered by statistical samples, such as the quasar sample in DR5. QSO catalog tables –Master list of everything that smells like a quasar, with vital signs gathered from different data sources (i.e., FIRST, ROSAT, Stetson, USNO, and USNO-B catalogs). runQA for target photometry (also bug fix) Regenerated JPEGs to fix clipping error in some images. June 28, 2006: Enhanced version made available to the public Usage varied between 80,000 and 100,000 SQL queries per day in the week immediately following the public release.

SkyServer Data Usage Statistics

DAS Data Usage Statistics

Data Usage by Release

Looking ahead to Data Release 6 Public release scheduled for July 1, 2007 –Legacy and SEGUE data collected through 30-Jun-2006 DR6 will incorporate modest data model changes, such as: –Add table containing SEGUE stellar parameters –Add primary and secondary target fields to photoObjAll and specObjAll tables to accommodate SEGUE and other plates –Extract photometry for targets into new targPhotoObjAll table in BESTDR6, which allows us to eliminate TARG DB from some servers –Develop sqlFits2CSV code to read in Pan-STARRS object tables and build CSV files for DB loading Will load and deploy using MS SQL Server 2005 Code modification and testing currently underway Data loading will commence in 4-6 weeks, with collab release occurring in late fall.

Runs Database CAS-style database that is being loaded with all imaging runs obtained over the course of the survey, regardless of data quality Updated version released to the collaboration on July 27, 2006 –Contains 370 imaging runs Incremental releases are being made every 6-8 weeks –Each increment has contained of order new runs Collaboration DAS access is also available for these runs –FITS format corrected frames –Uncalibrated (fpObjc*, asTrans*) and calibrated (tsObj*, tsField*) data. –Collab access via rsync and wget We intend to make the RunsDB available to the public at the time of the final SDSS-II data release, when all data will be in the public domain.

Interim Archive Stewardship A formal plan for the long-term stewardship of the SDSS data set is still being formulated The possibility of an interim arrangement has been discussed with senior Fermilab management –Lab management has expressed a willingness to serve as interim host until a more permanent steward is identified – timescale TBD; –Initial level of stewardship would include: Maintaining data integrity –Retention of multiple copies; automated scanning to detect data corruption. Maintaining existing systems (SkyServer, CAS, DAS, sdss.org, etc.) –Replace failed hardware, apply security patches, deploy OS upgrades, etc. –Helpdesk support?? (level to be determined) –Estimated resource requirements One FTE (HW/DBA support) $100K/yr for equipment, materials and supplies.

Long-term Stewardship Curation –Mandatory Format conversion (e.g., OS upgrades, etc.) Platform migration (Database, OS, etc.) –Value-added Errata Bug fixes Annotations –First tier – team –Second tier – archiving centers –Third tier – community Virtual Observatory (VO) compliance

Long-Term Stewardship Preliminary Thoughts on Resource Requirements Operations Support –System maintenance Hardware-level support Application-level DBA support System performance monitoring –Help desk –Logging Usage statistics Estimated Resource Requirements –Number of boxes: –Level of effort: 1 FTE for helpdesk support 1 FTE for hardware/software support 0.5 FTE for technical support (scientists, etc.) –Budget for ongoing maintenance and support costs Disk replacements, software upgrades, server upgrades, etc. ~$100K, dropping off in future years