1 HST Pipeline Project Review March 14, 2003. 2 Review Objectives Re-familiarize Project (and others) with production data processing done by STScI Familiarize.

Slides:



Advertisements
Similar presentations
Dec 2, 2014 The HST Online Cache: Changes to the User Experience Karen Levay Faith Abney Mark Kyprianou.
Advertisements

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
1 Storage Today Victor Hatridge – CIO Nashville Electric Service (615)
Pipelining II Andreas Klappenecker CPSC321 Computer Architecture.
Components and Architecture CS 543 – Data Warehousing.
©Brooks/Cole, 2003 Chapter 7 Operating Systems Dr. Barnawi.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
Implementation Review1 Moving Archive Data to the EMC Storage Array March 14, 2003 Faith Abney.
From Entrepreneurial to Enterprise IT Grows Up Nate Baxley – ATLAS Rami Dass – ATLAS
PRASHANTHI NARAYAN NETTEM.
Guide to Linux Installation and Administration, 2e1 Chapter 13 Backing Up System Data.
Implementation Review1 Moving Pre-Archive Pipeline Processing March 14, 2003 Forrest Hamilton/OPUS Ops.
Upcoming Enhancements to the HST Archive Mark Kyprianou Operations and Engineering Division Data System Branch.
Operating systems CHAPTER 7.
Part VII: Special Topics Introduction to Business 3e 18 Copyright © 2004 South-Western. All rights reserved. Using Information Technology.
Data Management Subsystem: Data Processing, Calibration and Archive Systems for JWST with implications for HST Gretchen Greene & Perry Greenfield.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.
MASSACHUSETTS INSTITUTE OF TECHNOLOGY NASA GODDARD SPACE FLIGHT CENTER ORBITAL SCIENCES CORPORATION NASA AMES RESEARCH CENTER SPACE TELESCOPE SCIENCE INSTITUTE.
STIS Closeout Plan Paul Goudfrooij 2005 HST Calibration Workshop, 10/26/2005.
©Brooks/Cole, 2003 Chapter 7 Operating Systems. ©Brooks/Cole, 2003 Define the purpose and functions of an operating system. Understand the components.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3: Operating Systems Computer Science: An Overview Tenth Edition.
Data Management Subsystem Jeff Valenti (STScI). DMS Context PRDS - Project Reference Database PPS - Proposal and Planning OSS - Operations Scripts FOS.
Implementation Review1 Deriving Architecture Requirements March 14, 2003.
Databases March 14, /14/2003Implementation Review2 Goals for Database Architecture Changes Simplify hardware architecture Improve performance Improve.
©2006 Merge eMed. All Rights Reserved. Energize Your Workflow 2006 User Group Meeting May 7-9, 2006 Disaster Recovery Michael Leonard.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Update on a New EPICS Archiver Kay Kasemir and Leo R. Dalesio 09/27/99.
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Server Virtualization
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays - done for rsrv in 3.14 Channel access priorities - planned to.
Distributed System Services Fall 2008 Siva Josyula
CIS250 OPERATING SYSTEMS Chapter One Introduction.
The Power of Aligning Backup, Recovery, and Archive Bob Madaio Sr. Manager; Backup, Recovery and Archive Marketing EMC Corporation.
Greenlight Presentation Oracle 11g Upgrade February 16, 2012.
Copyright © Curt Hill Operating Systems An Introductory Overview.
Chapter 8 System Management Semester 2. Objectives  Evaluating an operating system  Cooperation among components  The role of memory, processor,
Implementation Review1 Archive Ingest Redesign March 14, 2003.
Faculty meeting - 13 Dec 2006 The Hubble Legacy Archive Harald Kuntschner & ST-ECF staff 13 December 2006.
1 Future Directions in HST Data Processing 19 November 2004.
Copyright © Curt Hill More on Operating Systems Continuation of Introduction.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
Next Generation of Apache Hadoop MapReduce Owen
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
Solving Today’s Data Protection Challenges with NSB 1.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Lesson 9: SOFTWARE ICT Fundamentals 2nd Semester SY
HST and JWST Pipelines and Reference Files
Memory Management.
DADS Ingest and Distribution Support for WFC3 Daryl Swade
Chapter 2 Memory and process management
CSI 400/500 Operating Systems Spring 2009
Main Memory Management
O.S Lecture 13 Virtual Memory.
Chapter 2: The Linux System Part 1
Operating Systems.
Data Systems Environment at SM4
Chapter 5: Computer Systems Organization
Chapter 2: Operating-System Structures
Managing Work in the New Computing Environment March 14, 2003
Chapter 2: Operating-System Structures
OPERATING SYSTEMS MEMORY MANAGEMENT BY DR.V.R.ELANGOVAN.
Presentation transcript:

1 HST Pipeline Project Review March 14, 2003

2 Review Objectives Re-familiarize Project (and others) with production data processing done by STScI Familiarize everyone with new processing hardware and how we plan to use it Describe the steps we will be taking to shift development, I&T, and production data processing from the old systems to the new systems

3 Introduction History – Long View History – Last Year Data processing requirements Goals of this project Overall plan

4 What do we mean by “data processing” ? Receipt of science and engineering data Reformatting, quality checking, calibration, etc. needed to prepare data for the archive Archiving the data Retrieving the data Processing and calibration of retrieved data Sending data off to the user User access tools

5 History – Long View Original plan (1981)  TRW provides OSS and PODPS as two of three major pieces of SOGS  OSS to be used for real-time decision making  PODPS to process science data, including calibration, for users  STScI provides SDAS (analysis tools)  Established FITS format as basic science data format  Data provided to users on 1600/9600 bpi tapes  No archive

6 History – Long View Pre-launch changes ( )  Astrometry and Engineering Data to come to STScI  PODPS to run STSDAS-based calibrations  STScI to develop CDBS (calibration data base system)  Archive activities started  STScI developed DMF, a prototype optical disk based archive, pressed into service ~ L+ 1 year  DADS development started at Loral  StarView development started at STScI

7 History – Long View Post-Launch changes I ( )  DADS delivered, data transferred from DMF to DADS  Starview released  OMS developed for engineering data and jitter files  OPUS replaced OSS and PODPS  Consolidated software systems  Important technology upgrade to support future growth  Pipeline development for STIS and NICMOS started

8 History – Long View Post-Launch Changes II (1996 – 2001)  Data volume doubled with STIS and NICMOS  Archive utilization increased substantially  UNIX version of OPUS developed for FUSE  Archive upgraded  Magneto-Optical media replaced Optical Disks  NSA project opened DADS architecture to multiple storage media  Spinning disks considered, but judged too expensive  CDBS re-implemented  OTFR deployed  Reduced archive volume  Provided up-to-date calibrations to users

9 History – Long View

10 History – Long View Additional improvements and consolidations have been in our plans over the last few years  DADS evolution  Remove VMS dependencies  Make future technology migrations easier  Improve services based on community usage of HST archive  Replace OMS  Remove VMS dependency  Simplify system  ACS data and data processing  Increased volume  Drizzle algorithms for geometric correction and image co- addition

11 History – Last Year Several parts of the system exhibited unacceptable performance  Processing of data from HST to the Archive  Response time to user requests for data from Archive Several specific causes  NFS mount problems  Disk corruption in OPUS  Jukebox problems  Other specific hardware problems Symptomatic of more general problems with the data processing systems

12 History – Last Year

13 History – Last Year

14 History – Last Year Goal: <5% (1 day/week = 14%)

15 History – Last Year Immediate steps were taken to upgrade available hardware  Added 6 CPUs and memory to Tru64 systems  Added CPUs and memory to Sun/Solaris systems  Added and reconfigured disk space Large ACS data sets moved to an ftp site to avoid load on archive system  EROs and GOODs data sets  Ftp site off-loaded ~10GBytes/day from archive in last several months (~20% effect)

16 Current status System keeping up with demands  Running ~50% capacity on average  Loading in various places is very spiky Instability of system, and diversion of resources, has put delivery of data to ECF and CADC substantially behind schedule Expect load to increase in spring as ACS data become non-proprietary

17

18 Bulk distribution backlog In absolute numbers: ~40,000 POD files Archive Branch does not believe the current AutoBD can keep up with the current data volume, much less catch up. Implement ftp tool to augment transfer. Tool accesses data on MO directly. May be able to bypass DADS by using safestores and development JB or stand-alone reader Distribution re-design  CADC/ECF will be included as beta test sites in parallel operations starting ~ April 1,  New engine allows operators to prioritize requests  New engine supports transfer of compressed data  Consolidation of operating systems should improve reliability With all these solutions, preliminary estimate is that backlog could be eliminated in a few months

19 Data Processing Requirements Performance requirements (Astronomy community expectations) Data volume requirements  Into system from HST  Out of system to Astronomy community Programmatic goals  Fit within declining HST budget at STScI  Expect archive to live beyond HST operational lifetime  Expect archive will be used to support JWST

20 Performance Requirements-I Average time from observation execution to data receipt < 1 day Average time from observation execution to data availability in archive < 2 days 98% of data available in archive in < 3days

21 Goal: <5% (1 day/week = 14%) Performance Requirements-II Archive availability: 95% Median retrieval times  Defined as time from request to when data is ready for transmission. Does not include transmission time.  Non-OTFR data (not recalibrated): 5 hours  OTFR data (recalibrated): 10 hours Median retrieval times  Defined as time from request to when data is ready for transmission. Does not include transmission time.  Non-OTFR data (not recalibrated): 5 hours  OTFR data (recalibrated): 10 hours

22 Performance Requirements-III User support  Unlimited number of registered users  Support increased level of requests  Currently ~2000/month  Expect to grow at 20% per year (guess)  Reduce unsuccessful requests to <5% Routinely handle highly variable demand  Daily request volume varies by more than factor of 10  Insulate pre-archive processing from OTFR load

23 Data Volume Requirements-I Data volume from HST - now  Currently receive ~120 GBits/week from HST  Currently ingest ~100 GBytes/week into the archive  Currently handle ~2000 observations/week Data volume from HST – after SM4  Expect ~200 GBits/week from HST  Expect to ingest ~160 GBytes/week into archive  Expect to handle ~2000 observations/week

24 Data Volume Requirements-II Data distribution today  More than 300 GBytes/week from archive  More than 70 GBytes/week from ftp site Data distribution projection  Distribution volume determined by world-wide Astronomy community – very unpredictable  Large increase expected as Cycle 11 data become non- proprietary  Should expect GBytes/week in a few years

25 Programmatic Goals Reduce total cost of data processing activities Simplify hardware and network architecture  Reduce Operating Systems from 3 to 1  Terminate use of VMS and Tru64  Eliminate passing of data through various OSs  Consolidate many boxes into two highly reliable boxes  Flexible allocation of computing resources  Support easy re-allocation of CPU and Disk resources among tasks  Provide simple growth paths, if needed

26 Current Architecture TRU64 SOLARIS VMS

27 Programmatic Goals Provide common development, test, and operational environments  Current development and test systems cannot replicate load of operational systems  Reduce complexity of development and test environments (drop VMS, Tru64) Improve ability to capture performance data, metrics, etc.  Current systems too diverse  Difficult to transfer performance measurement on development/test systems to operations

28 Current Development and I&T Environment VMS TRU64

29 New Architecture SUN FIRE 15K Domain Config Opus/Archive 7 Dynamically Re-Configurable Domains EMC OPUS/Archive OPS EMC Databases OPS Code Development System Test Database Test OS/Security Test

30 Programmatic Goals Continue planned pipeline evolution  DADS Distribution redesign provides more flexibility to users and operators  Reflect advent of OTFR  Reflect community utilization of the archive  Provide operators more control over priority and loadings  Storing copy of Raw Data on EMC will dramatically reduce load and reliance on Jukeboxes  Ingest redesign provides opportunity to finally end the arbitrary boundary between OPUS and DADS

31 Programmatic Goals Future growth paths for HST  To first order, we expect HST to live within the capabilities of this architecture through SM4 to EOL  Input data volume will increase some, but not a lot  Plan to adjust distribution techniques and user expectations to live within the 15K/EMC resources  However, we will encourage ever more and better use of HST science data Beyond HST End-of-Life  HST data distribution would need to be revisited based on utilization at the time (seven years from now) and progress of NVO initiatives  Architecture is planned starting point for JWST, hardware is very likely to need major upgrades

32 Remainder of the Review Architecture New hardware (Sunfire 15K, EMC)  What it is, how it works  Steps to make it operational Moving development, I&T, databases Moving operational processing  OPUS processing  Raw data off Jukeboxes onto EMC  Archive software upgrades