NGAS – The Next Generation Archive System Jens Knudstrup NGAS The Next Generation Archive System
NGAS – The Next Generation Archive System Jens Knudstrup Motivation Motivation for NGAS: -Handle huge amount of data streams in real time. -Reduce operational costs (man-power). -Decrease expenses in general. -Provide online and offline processing capabilities. -Ease integration of archive facility with external clients/applications. -Provide a common concept for the online archive and the long-term storage facilities (NGAS OLAS + ASTO + Jukebox SW + more). Note, no plan to replace OLAS for now. -Simplify and unify the overall infrastructure of the archive system. -Increase data security.
NGAS – The Next Generation Archive System Jens Knudstrup Main Objectives Main Objectives of NGAS: Provide an archive facility with services for handling all stages in the life-time of data files: - Archiving files (+ on-the-fly checking and processing). - Retrieving & on-the-fly processing of files. - Ensuring data consistency. - Providing services for managing data. - (Executing complex, parallel data processing - TBD) In addition, to provide a system: - Which is adaptable to specific contexts. - With a high performance + scalable.
NGAS – The Next Generation Archive System Jens Knudstrup NGAS: History History of NGAS: -April 2001: Project started. -Mid June 2001: First operational prototype. -June 2001: Review + approval of design/concept. -Beginning July 2001: Installation/commissioning at La Silla (2.2m/WFI). -Mid July 2001: Entered operation at La Silla. -August 2001: Started operation of Garching NGAS Cluster. -February 2001: Upgrade from Suse to RedHat Linux. -August 2003: Installation/commissioning at Paranal (VLTI). -January 2004: Installation of second archive system for 3.6m/LS. -March 2004: First integration of NGAS on new HW (SATA). -September 2004: First tests using NGAS together with RAID5 Arrays. -September 2004: Archiving of HARPS pipeline products. -December 2004:Archiving of WFCAM frames from Cambridge/UK.
NGAS – The Next Generation Archive System Jens Knudstrup NGAS: Components Main Components of the NGAS Project: 1. NGAS SW – NG/AMS (Next Generation Archive Management System). 2. NGAS WEB Interfaces. 3. HW – (low cost) PCs with removable ATA disks. 4. NGAS OS (Linux). 5. NGAS Utilities. 6. NGAS Installation and Configuration Tools.
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: Basic Concepts Basic Concepts of the NGAS SW (NG/AMS): NG/AMS is a platform/framework providing basic services. No information is hard-coded to support specific types of data – NG/AMS does not know what e.g. a FITS file is. No information is hard-coded to support specific HW configurations. The specific behavior and the specific knowledge has to be added to the NGAS system – customizable. Based on standard protocols and formats wherever possible – can be used as a building block. Simple - advanced features can be added in front-end applications giving clients a different view of the data + provide specific services.
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: Main Features/1 Main Features of NG/AMS (1): Multi-threaded server. Standard communication protocol (HTTP) + HTTP Authentication. Data file archiving via Push and Pull Techniques. Subscription Service including filter mechanism. DB synchronization (DB Snapshot Feature). Easy adaptation to different kinds of DBMS (ANSI SQL Engine/DB Driver). Flexible/adaptable due to usage of 10 different kinds of plug-ins. Many configurable parameters. XML information exchange. Notification Service.
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: Main Features/2 Main Features of NG/AMS (2): Advanced logging service (Verbose, Local Log File, Syslog). Background Data Consistency Checking. Operation in Cluster Mode. Transparent data retrieval & on-the-fly processing. APIs in ANSI-C and Python + two clients applications based on these. Archive Client for secure and simple, remote data file archiving. Many commands to interact with and control the system. Portable. Unit/Functional Tests.
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: Server Data Provider Data Provider Host Data Requestor Data Requestor Host Info Requestor Info Requestor Host NGAS DB DBMS Host Operations UNIX Sys Logs Log NG/AM S Server Main Disks Array NGAS Host Replication Disk Array Stdout NG/AMS Configuration Archive Pull Request Data Subscriber Client Data Subscriber Host HTTP POST Request NG/AMS Server NGAS Subscriber Host Archive Push Request HTTP POST Request
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: Storage Media Infrastructure Basic Infrastructure of Storage Media:
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: XML Information Exchange Interprocess Data Exchange: - Most information exchanged between NG/AMS Servers and between the NG/AMS Server and clients, is based on XML. - Example, NgasDiskInfo Document (NG/AMS Status XML Document): <Status Date=" T08:40:23.350" HostId="acngast1" Message="Disk status file" Version="v2.0-Beta2/ T09:22:53"/> <DiskStatus Archive="ESO-ARCHIVE" AvailableMb="32300" BytesStored=" Checksum="" Completed="0" CompletionDate=" DiskId="IC35L040AVER07-0-SXPTX093675" InstallationDate=" T09:48: LogicalName="FITS-M Manufacturer="IBM" NumberOfFiles="163 TotalDiskWriteTime=" " Type="MAGNETIC DISK/ATA"/>
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: HTTP Command Interface telnet acngast Trying Connected to acngast1. Escape character is '^]'. GET STATUS HTTP/ OK <Status Date=" T14:59:42.724" HostId="acngast1" Message="Successfully handled command STATUS" State="ONLINE" Status="SUCCESS" SubState="IDLE" Version="v2.0-Beta2/ T09:22:53"/> Connection closed by foreign host.
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: DB Synchronization DB Synchronization: NGAS DBs replicated from Paranal/La Silla to Garching (Unidirectional). Synchronization between DBs of the various NGAS sites also carried out by NGAS. NG/AMS maintains snapshot (DBM) on the disks with info about the files stored on it. Local DB synchronized with this info when the disk reappears on a site. DB Snapshot can be used as a table of contents for the disk. LS NGAS DB La Silla PAR NGAS DB Paranal PAR NGAS DB Garching DB Snapshot NG/AMS DB Synchronization DBMS Synchronization (Sybase)
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: Plug-Ins NG/AMS Plug-Ins: -Ten different kinds of plug-ins provided. These make it possible to adapt the system to different kinds of hardware and different types of data – nothing is hard-coded: 1. Online Plug-In. 2. Offline Plug-In. 3. Data Archiving Plug-In. 4. Checksum Plug-In. 5. Data Processing Plug-In. 6. Registration Plug-In. 7. Label Printer Plug-In. 8. Filter Plug-In. 9. Suspension Plug-In. 10. Wake-Up Plug-In. -Standard plug-ins delivered with the system. Possible to replace these or add new plug-ins when needed. -The plug-ins delivered with a distribution of NGAS should be viewed as belonging to the core of the system when it comes to testing. -Normal user does not need to know about the plug-ins used.
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: Plug-Ins Data Archiving Plug-In – Basic Functioning: Replication Disk Storage Area Staging Area Main Disk Bad Files Area Storage Area NgasDiskInfo Target Storage Set NG/AMS Server DAPI Data File NGAS DB 1. Archive Request 2. Reception in Staging Area 3. DAPI Invocation 4. Data Checking/Processing, Parameter Extraction 5. DAPI Return Status 6. Storage of Main File in Final Location 7. DB Update, Main File 8. Replication of File 9. DB Update, Replication File
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: XML Configuration NG/AMS Configuration (1): About 110 different configurable parameters. Configuration can be loaded from an XML document or from the DB or a combination of these. Possible to re-use DB based parameters to compose specific configurations (easier to handle many, slightly different installations). Main groups of configurable parameters (1): -Basic Parameters: Port number, simulation mode, proxy mode, root mount point, … -Plug-Ins: The various plug-ins the system should use e.g. to handle data of a specific type. -DB Connection: The DB connection parameters. -Permissions: Archive, Retrieve, Processing, Remove Requests allowed. -Archive Handling Parameters: Parameters for handling Archive Requests. -Accepted Data Types: Types of data (mime-types) the system is can handle.
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: XML Configuration NG/AMS Configuration (2): Main groups of configurable parameters (2): -Storage Sets: The disk configuration. -Streams: Defines how the different kind of data should be streamed onto the Storage Sets. -Available Processing Capabilities: Defines the types of data that can be processed and which Data Processing Plug-Ins to use. -Data Check/Janitor Thread Configuration: Parameters to tune the Data Checking and Janitor Threads. -Logging Parameters: E.g. name of log files + intensity to apply when logging. - Notification Parameters: Recipients of the various types of Notification Messages. -Host Suspension Parameters: Parameters for suspending a host + for waking up suspended hosts. -Subscription Parameters: Parameters to define if a server should subscribe for data. -Authorization Parameters: Defines the known users and their access code.
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: Data Consistency Checking Data Consistency Checking: -Necessary constantly to monitor the condition of the data in the archive. -Data Consistency Checking – Thread running in background. -Possible to tune the amount of resources occupied by the service. -A check run can be scheduled to run periodically via the configuration. -Checksum check, file availability, unregistered files on storage media. -A check sub-thread is started per disk (max. number configurable). -Info about files on the system dumped once in a DBM, retrieved file by file during checking. -Possible to resume a checking from where the previous was interrupted. - Notification send to subscribers in case problems found, e.g.: Subject: NGAS-arcus2-7778: DATA INCONSISTENCY(IES) FOUND Date: Fri, 25 Jan :06: (MET) From: Error Message: DATA INCONSISTENY(IES) FOUND IN DATA HOLDING: Date: T15:32: NGAS Host: arcus2 Inconsistencies: 1 Problem Description File ID Version ERROR: Inconsistent checksum found TEST T15:25:
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: Operation in Cluster Mode/1 Example: NGAS Super Node (Proxy Mode) NGAS Super Node (Proxy Mode) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Main Node 1 NGAS Main Node 1 Network Switch Network Switch NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Main Node 2 NGAS Main Node 2 Network Switch Network Switch NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Sub-Node (10.X.X.X) NGAS Main Node 3 NGAS Main Node 3 Network Switch Network Switch Network Switch Network Switch Retrieve Request Private Network Cluster Back-Bone Network
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: Operation in Cluster Mode/2 Example: NGAS Main Node NGAS Main Node Network Switch Network Switch Retrieve Request NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node NGAS Node
NGAS – The Next Generation Archive System Jens Knudstrup Garching NGAS Cluster NGAS Cluster
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: Data Processing Data Processing at Retrieval: Simple processing supported when retrieving files. Possible to request the system to apply a Processing Plug-In on the data and to send back the result of the plug-in rather than the data itself. Processing performed on the sub-node hosting the data. Possible for clients to use the NGAS Cluster as a number cruncher to carry out parallel data processing in a simple manner. Reduces the amount of data to be transferred to the client. I.e., a floating point number may be returned rather than the entire data file. Can be extended by providing new Data Processing Plug-Ins for specific contexts. Could be used to integrate NGAS with the AVO or other archive services.
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: APIs NG/AMS APIs + Clients: Two APIs implemented in C (C library) and Python (class) provided. Facilitates implementation of client applications communicating with NGAS, e.g. to retrieve data files. Two command line utilities are provided, based on the C and Python API, which can be used to interact with an NG/AMS Server. A standalone Archive Client is provided, based on the C-API: Independent of any DBMS. Can be used to archive files from any remote host which can access the NGAS Archive via HTTP. Attempts to archive file is retried until success is returned or file classified as bad by the remote NGAS system. Files not cleaned up before cross-checking that they are really in the remote NGAS Archive (CHECKFILE Command). First applications: Archiving of HARPS pipeline products and WFCAM files from Cambridge/UK.
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS Client Applications NG/AMS Archive Client NG/AMS Server Remote NGAS System NG/AMS Archive Client Data Provider Host Archive Queue Archived Files Area Bad Files Area Log Files Area BAD Log Info Log Rotation Control Archive Requests + Commands NGAS DB
NGAS – The Next Generation Archive System Jens Knudstrup NG/AMS: Server Commands NG/AMS Server Commands (HTTP Protocol): -Commands issued as URLs: : / [? [& ]] -Commands: ARCHIVE: Archive data with Archive Push or Archive Pull Technique. CHECKFILE: Execute an explicit file check of the given file. CLONE: Clone an entire disk or individual files. CONFIG:Configure an online system. DISCARD:Force removal of file from disk and/or DB independent of number of copies. EXIT:Make the NG/AMS Server exit. INIT:Re-initialize the NG/AMS Server. LABEL:Print out disk labels. OFFLINE:Bring server to Offline State. ONLINE:Bring server Online. REGISTER:Register a file of a set of file already stored on an NGAS Disk. REMDISK:Remove a disk from the archive (only allowed if at least 3 copies of each files available). REMFILE:Remove a file from the archive. RETRIEVE:Retrieve a file, transparently, from the archive. STATUS:Query status about the server or another component in the NGAS system/cluster. SUBSCRIBE:Subscribe to new data or a set of data. UNSUBSCRIBE:Unsubscribe a previously created subscription.
NGAS – The Next Generation Archive System Jens Knudstrup Unit/Functional Tests - Features Unit/Functional Tests: -Extensive set of automatic tests provided, consisting of: -30 Test Suites. -~130 Test Cases. -Tests portable (platform/HW independent). -Testing the business logic of the system and correct functioning (simulation mode). -Need to add more Test Cases for testing correct and consistent behavior under abnormal conditions and stress tests. -Needs to be enhanced with ~200 Test Cases before next release. -Possible to generate Test Plan from test code (next slide - overhaul ongoing).
NGAS – The Next Generation Archive System Jens Knudstrup Unit/Functional Tests - Test Plan Example:
NGAS – The Next Generation Archive System Jens Knudstrup NGAS WEB Interfaces NGAS WEB Interfaces: WEB Interfaces provided to assist operators in querying the status of the system and to search for various components (data files, disks, machines). Used at all sites by the operators (Garching, Paranal, La Silla). Based on Zope. WEB management system providing editing via WEB browser ( Local Zope WEB Servers available on each site. Tools provided to list disks, find specific files get an overview of the nodes and their status. Also the so-called Operators Log Book is provided. The operators use this to log all actions carried out. Used by the operators at Paranal/La Silla to monitor the online archiving activities. Services missing for interacting with the system. Only possible to control the disk label printing for now. An enhancement is planned in the near future.
NGAS – The Next Generation Archive System Jens Knudstrup NGAS System/OS NGAS OS Distribution: -Started on a Suse Linux distribution and migrated to RedHat Linux (ESO standardization). -OS distribution prepared/managed by OTS-SOS. -Support for single-processor and multi-processor configurations. -Support for old HW (PATA) and new HW (SATA). -Limited installation, many packages removed to reduce the size of system. -Special packages needed by NGAS: Python, Sybase interface, Zope, … - installed by the NGAS Installation Tool. -Special driver SW needed for the 3ware controller. -Zope WEB server running on some nodes (optional). -3ware disk controller WEB server running on every host. -Possibility to back-up/restore complete system by means of the Mondo/Mindi tool kit (from a single CDROM) in 10 minutes. -From July 2004 NGAS OS platform installed with kickstart installation script.
NGAS – The Next Generation Archive System Jens Knudstrup NGAS HW NGAS HW (1): -Started with 8 slots parallel ATA systems. -8 x 80 GB storage capacity per node (640 GB/node, ~1.2 TB compressed). -Since March 2004 a 24 slot serial ATA system in operation (up to 24 * 400 GB = 9.6 TB/node, 19.2 TB compressed). -Reduces price per GB. -More robust HW amongst other due to serial ATA (cleaner cabling). -Disk handling easier, more robust disk frames. -Overall HW stability (hopefully) better and less intervention needed (TBC). -Amount of data/CPU should be balanced to be able to process the data in a limited time. -TBD when to use new HW in operation at observatory sites. -Investigating usage of RAID5 rather then JBOD disks.
NGAS – The Next Generation Archive System Jens Knudstrup NGAS HW NGAS HW (2):
NGAS – The Next Generation Archive System Jens Knudstrup NGAS Utilities NGAS Operators Utilities/Installation Utilities: -Small module provided (NGAS Utilities) with utilities for the daily work of the operators: -Limited time invested in this so far, however essential tools for the operation provided (e.g. Clone Verification Tool, Check File List Tool, Clone File List Tool, …). -The function of many of these tools should be taken over by the NGAS WEB Interfaces when these have been enhanced. -The module NGAS Installation Tools provides some utilities to install and check the system: -Tool provided to build NGAS layer on top of the basic NGAS Linux distribution. -Functionality still to be implemented.
NGAS – The Next Generation Archive System Jens Knudstrup NGAS Infrastructure Present ESO NGAS Infrastructure: NGAS DB NGAS DB NGAS DB Replication Archive Disk Sets Archive Unit Buffering Unit Archive Handling Unit Cluster Unit Ext. Archive Client Ext. Archive Client LS PAR GAR IN S
NGAS – The Next Generation Archive System Jens Knudstrup NGAS: Future Plans (Near) Future Plans for NGAS: Received detailed requirements from archive operations. Enhance NGAS WEB Management Interfaces. Enhancement of services for operation in cluster (extended proxy mode). Enhancement of installation utilities. Enhancement of unit tests (simulation of archive cluster operation). Implement load balancing/archive cluster operation for high availability/high data rates (VST/Cam: up to 300 GB/night, VISTA/VistaCAM up to 1 TB/night - TBC). Support for advanced data processing, utilizing an NGAS Cluster as a parallel processing engine (specify complex recipes, which are executing parallel data processing) – will be analyzed in the near future. Support for the Astrophysical Virtual Observatory/GRID?
NGAS – The Next Generation Archive System Jens Knudstrup Status - December 2004 Status of NGAS Project December 2004: -In operation since July Used heavily on a daily basis by archive operators in Garching. -Data archived daily at La Silla, Paranal and at ESO HQ. -Data archived directly into NGAS Archive in Garching from Paranal and Cambridge/WFCAM. -Some statistics: -Total number of nodes: ~25. -Total number of disks in use: ~260. -Total number of files in NGAS Archive: ~1,500,000. -Amount of compressed data in NGAS Archive: ~27 TB. -Amount of uncompressed data in NGAS Archive: ~45 TB. -Maximum throughput per node (archiving): ~400 GB/24 hours (including compression). -Major Issues to Address: -Need to invest more resources in implementing automatic tests in particular for testing robustness and handling of abnormal conditions. -Need to implement resources in implement an enhanced user interface - not very user-friendly at the moment. -Need to update the design document to reflect present status of system (not updated since it was written SPRING 2001). -Should investigate improved ways of ensuring data consistency and means for recovering lost data.