1 Joint Technical Meeting 2/23/2016 L1 Systems Donald Petravick LSST JTM February 23, 2016.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

DISTRIBUTED COMPUTING PARADIGMS
Database System Concepts and Architecture
Software Quality Assurance Plan
® IBM Software Group © 2006 IBM Corporation Rational Software France Object-Oriented Analysis and Design with UML2 and Rational Software Modeler 04. Other.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
--What is a Database--1 What is a database What is a Database.
7M701 1 Software Engineering Object-oriented Design Sommerville, Ian (2001) Software Engineering, 6 th edition: Chapter 12 )
L4-1-S1 UML Overview © M.E. Fayad SJSU -- CmpE Software Architectures Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
©Ian Sommerville 2006Software Engineering, 7th edition. Chapter 14 Slide 1 Object-oriented Design.
Business Intelligence Dr. Mahdi Esmaeili 1. Technical Infrastructure Evaluation Hardware Network Middleware Database Management Systems Tools and Standards.
The OSI Model A layered framework for the design of network systems that allows communication across all types of computer systems regardless of their.
Overview SAP Basis Functions. SAP Technical Overview Learning Objectives What the Basis system is How does SAP handle a transaction request Differentiating.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.
Chapter 10 Architectural Design
Chapter 9 Elements of Systems Design
Hunt for Molecules, Paris, 2005-Sep-20 Software Development for ALMA Robert LUCAS IRAM Grenoble France.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
LSST Scheduler status Francisco Delgado Sr. Software Engineer Telescope & Site.
IEEE R lmap 23 Feb 2015.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
2-1 DMS Requirements Status Daryl Swade DMS Systems Engineer Nov. 6, 2013 S&OC Data Management System Design Review 3.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 07. Review Architectural Representation – Using UML – Using ADL.
JCOP Workshop September 8th 1999 H.J.Burckhart 1 ATLAS DCS Organization of Detector and Controls Architecture Connection to DAQ Front-end System Practical.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Computer Emergency Notification System (CENS)
Software Quality Assurance
POAD Book: Chapter 8 POAD: Analysis Phase Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
Software Engineering Prof. Ing. Ivo Vondrak, CSc. Dept. of Computer Science Technical University of Ostrava
Unified Modeling Language* Keng Siau University of Nebraska-Lincoln *Adapted from “Software Architecture and the UML” by Grady Booch.
Lesson Overview 3.1 Components of the DBMS 3.1 Components of the DBMS 3.2 Components of The Database Application 3.2 Components of The Database Application.
L6-S1 UML Overview 2003 SJSU -- CmpE Advanced Object-Oriented Analysis & Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College.
Test and Integration Robyn Allsman LSST Corp DC3 Applications Design Workshop IPAC August , 2008.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
CIS/SUSL1 Fundamentals of DBMS S.V. Priyan Head/Department of Computing & Information Systems.
Online Software 8-July-98 Commissioning Working Group DØ Workshop S. Fuess Objective: Define for you, the customers of the Online system, the products.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO DES Data Management Ray Plante.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
Component Patterns – Architecture and Applications with EJB copyright © 2001, MATHEMA AG Component Patterns Architecture and Applications with EJB Markus.
Motivation FACE architecture encourages modularity of components on data boundaries Transport Services Segment interface is centered on sending and receiving.
Slide 1 2/22/2016 Policy-Based Management With SNMP SNMPCONF Working Group - Interim Meeting May 2000 Jon Saperia.
OOD OO Design. OOD-2 OO Development Requirements Use case analysis OO Analysis –Models from the domain and application OO Design –Mapping of model.
Company LOGO Network Architecture By Dr. Shadi Masadeh 1.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
M. Caprini IFIN-HH Bucharest DAQ Control and Monitoring - A Software Component Model.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Statistical process model Workshop in Ukraine October 2015 Karin Blix Quality coordinator
Powerpoint Templates Data Communication Muhammad Waseem Iqbal Lecture # 07 Spring-2016.
I&T&C Organization Chart
POAD Book: Chapter 8 POAD: Analysis Phase
Controlling a large CPU farm using industrial tools
Distribution and components
LDF “Highlights,” May-October 2017 (1)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
NCSA Plan for DPP: Construction and Operations of the LSST Data Facility Don Petravick, Paul Domagala, Margaret Johnson Joint Technical Meeting March.
Fault Tolerance Distributed Web-based Systems
Design and Implementation
Analysis models and design models
Chapter 2: Operating-System Structures
Chapter 2: Operating-System Structures
Chapter 6: Architectural Design
Yining ZHAO Computer Network Information Center,
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

1 Joint Technical Meeting 2/23/2016 L1 Systems Donald Petravick LSST JTM February 23, 2016

2 Joint Technical Meeting 2/23/2016 Design Articulation, Phasing (software only) ConOps High–Level Functions Second–Level Functions Main programs Program High Level Functions Program Detail Functions Common Functionality Concerns Message Topology Base-NCSA Coordination Data-Passing System Solidification Comprehensive Test Delivery Validated Early or Mock L1 code, L1 DB Alerts to “Mocked” Brokers Phased WBS Nightly Setup Production OCS, CDS

3 Joint Technical Meeting 2/23/2016 High-Level Functions of the L1 System − We see 4 distinct functions of the L1 System: – Processing AP is one use case – Archiving – EFD Replication – Observatory Operations Server

4 Joint Technical Meeting 2/23/2016 Processing System

5 Joint Technical Meeting 2/23/2016 Timescales of the Processing System Time ScaleMajor Functions Daily CadenceAcquire system from Observing ops. Ingest any remaining L1 processing data into permanent archive. Load/Purge caches. Insert and test any changes into system. Yield system to Observing ops, become functional OCS device. ModalOCS selects “clusters of use-cases”: Science Observing, Nightly Calibrations, Narrow Band Calibration, Doughnuts, etc. Logical VisitBegin: Marry Workers to Forwarder/Distributors, start apropos science codes. End: Assemble telemetry from individual CCDs. Send visit-level telemetry to OCS. Free Workers for new assignment. ExposureForwarders acquire rafts, forward rafts to Distributors. Strings of Workers pull CCD data from Distributors, load data into Butler repository, yield control to science codes. Exposure-level telemetry is assembled, passed to OCS. Alerts are sent to authors for AP use cases.

6 Joint Technical Meeting 2/23/2016 Salient Behaviors of the Processing System − An operational policy provides Observatory operations with a degree of control over the system. – If L1 falls behind, the system will keep a list of exposures internally, and dispatch them to processing according to a policy. – The system may be run in “scan mode” for processing that is not immediate. − Daytime processing in a batch system is a backup. – Observing operations may exit L1 processing. Unprocessed exposures are marked for daytime processing. – Processing may occur in event mode (sensitive to nextVisit events) or scan mode (OCS events are not needed). – Certain error cases currently may leave rafts unprocessed.

7 Joint Technical Meeting 2/23/2016 L1 Telemetry − L1 computes telemetry data that is fed back into OCS system. – Image quality parameters are processed eventually and provide feedback to the scheduler. These parameters need to be computed. We are prepared to process offline in batch if needed. We understand that the parameters are a side effect of the L1 image processing code, not a separate module. Configuration control must provide controls should multiple codes emerge. – L1 codes may run out of order, or in batch, after the fact. Operational control of the flow of messages is needed. Concerns: non-real-time WCS feedback, batch access to the OCS bridges, etc.

8 Joint Technical Meeting 2/23/2016 Details of Alert Distribution

9 Joint Technical Meeting 2/23/2016 Brokers − The LSST broker has been described as “primitive” or “basic”. − The LSST brokers only serve LSST authorized users. − There are operational distinctions between feeding a community broker vs. running a broker that serves individual users. − Since we have to deal with community brokers, we see the functional requirement as naturally supporting multiple brokers, and therefore multiple instances. – We see the following instances: A reliable feed for EPO A reliable feed for broker awardees A best effort feed accessible by any data rights holder − We understand PU is delivering a broker.

10 Joint Technical Meeting 2/23/2016 Archiving System

11 Joint Technical Meeting 2/23/2016 Archiving System − Archiving is now a distinct service offered to Observatory Operations. – No longer coupled to the L1 processing system technically. – Can be run independently as well. − Functions: – Acquires images from the camera buffer. – Ingests into the data backbone. Primary ingest is Chilean Base Center Archive Backup ingest is NCSA Archive Synchronization policy in the archives assures – replication – migration to tape

12 Joint Technical Meeting 2/23/2016 Additional L1 System Elements − The baseline provides for an engine to replicate the Engineering and Facility Database. – We understand this is to be made of ~20 independent relational databases and a file archive. – We understand this is to require a 25-node cluster. − Observatory Operations Server – Provides special path into archive and just-born information in L1 caches. – Different authorization and authentication mechanisms are required in the baseline.

13 Joint Technical Meeting 2/23/2016 Additional and Related Work − A conops (concept of operations) narrative is – a narrative that requires no training in a methodology to understand. – accessible to anyone with basic concepts of the LSST construction project. – useful support for operational planning. − We have found writing a conops for the subsystem to be – a helpful check that the context of the system is understood. – a basis for use case and functional breakdown. – for us, this is old hat.

14 Joint Technical Meeting 2/23/2016 Conops narratives are underway − We are currently developing conops for – the data backbone – the security system, which includes the authentication and authorization work − We plan on doing this for the remaining infrastructure systems as well – Batch – L3 hosting – The rest

15 Joint Technical Meeting 2/23/2016 Summary − We are looking for hallway or organized input to the L1 system briefly presented here. – For DM we are interested especially in the Brokers Science code payloads − We see necessary elements of interface to the project include – Conops narrative – The V methodology – WBS that will be effort-loaded

16 Joint Technical Meeting 2/23/2016 Phase 1: Basic Message Topology − Initial implementation of behaviors within a “logical visit,” but not reading out data. Basic messaging and interactions, including data dictionary and message patterns. – Common main classes, startup scaffolding, message dictionary, and prototype of main program for all entities, excluding EFD replication and Observatory Operations Server. – Minimal framework to do fault injection. – Basic framework for health and status display, including status event recorder. – Inclusion of telemetry messaging conditional upon final definition of requirements specification. − Use single resource instances to check message flow.

17 Joint Technical Meeting 2/23/2016 Phase 2: Coordination of Base and NCSA − Implementation of internal scoreboards and scoreboard snapshots (Base DMCS (L1 Processing), Base Foreman, Archiver DMCS, NCSA L1 Foreman, Cluster Manager). – Message payload processing for “logical visit.” – Implementation of stage-in component to stage calibrations and templates from caches to Workers. – Initial implementation of Cluster Manager wrapper. – Addition of component reliability faults to fault injection framework. − Use all resources and appropriate logic applied to messages for their use.

18 Joint Technical Meeting 2/23/2016 Phase 3: Data-passing between Camera, Base, and NCSA -Detailed implementation of behaviors for exposure-level processing. -Integration of DDS and CDS software deliverables into DM development system. -Includes marshalling of Workers, data acquisition from the Camera Data Buffer for both archiving and L1 processing, coordination of Archivers, Forwarders and Distributors, and coordination of data passing between Distributors, Workers, and NCSA Archive. -Analytics framework for telemetry. -FITS file generation for L1 processing and archiving. -Functioning (albeit, not final) system!

19 Joint Technical Meeting 2/23/2016 Phase 4: System Solidification -Initial implementation of behaviors in “scan mode” for archiving and L1 processing. -Final implementation of Cluster Manager wrapper. -State machines for commandable entities (L1 processing and archiving entities). -Configuration manager to assign personalities to machines. -Complete startup and tear-down implementation through state machine in DMCS, including drain for L1 processing and archiving.

20 Joint Technical Meeting 2/23/2016 Phase 5: Self-Integration and Concentrated, Comprehensive Testing -End-to-end test of L1 processing and archiving with OCS messages, including fault injection. -Internal test of Alert Distribution. -Final health and status display and after-action review (AAR) capabilities. -Implementation of log recording for Archiving framework and L1 Processing framework.

21 Joint Technical Meeting 2/23/2016 Phase 6: End-to-End System Integration and Testing − End-to-end test of L1 processing and Alert Distribution with science algorithms, including fault injection. – Final log recording, including science algorithm logs.

22 Joint Technical Meeting 2/23/2016 Concerns/Questions: intra-DM − Data coupling between L1 codes generating telemetry and OCS system. – Want to talk to Butler developers. − Assembly of pixels into file-level packages, esp. for archiving. Thinking granularity is raft, depends on metadata capabilities. − How to handle variation of codes as observing program varies? e.g., essential telemetry as a side effect of science code? − Catchup processing possible in the production batch env. − Note: – Need to model Observatory Operations Server and EFD – Haven’t modeled offline L1 processing, e.g., DayMOPS

23 Joint Technical Meeting 2/23/2016 Concerns/Questions: OCS − Further understanding of TBD Trigger, which shifts Base DMCS from OfflineState to StandbyState. − Protections for OCS bridge, which penetrates SCADA enclave. − How is the number of exposures in a logical visit conveyed to the system? Relationship to “nextVisit” message? − Details about required application responses to DDS messages. (when to issue acks, max times outs, and similar) − OCS support for the design concept of “logical visit.” − How does the metadata arrive from OCS in L1 processing and archiving, including scan mode. − MEP relating to ending of a major mode. − Possible concerns w.r.t readout, pending discussions with CDS. − Use cases where data persist in camera.

24 Joint Technical Meeting 2/23/2016 Concerns/Questions: Camera-Related − Claim management for camera buffer - unknown where this function lives. Granularity amp -> exposure. − Understand within “logical visit” the parameters that are invariant within each image, and the parameters that vary with the exposure. Sources for each. − What is the unit of recovery in the face of partial failure - e.g., CCD? Raft? Exposure? − Review the bandwidth needed, e.g., can the camera CDS handle unsynchronized access by archiving and processing? − General review of the CDS API w.r.t. the existing L1 design.