Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 1 Designing and Implementing Processing Pipelines with Conductor: The HiROC Experience.

Slides:



Advertisements
Similar presentations
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Advertisements

Module 12: Auditing SQL Server Environments
Chapter 19: Network Management Business Data Communications, 5e.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Chapter 19: Network Management Business Data Communications, 4e.
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Distributed Information Systems - The Client server model
The Architecture of Transaction Processing Systems
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Using the Engaging Networks tools Ghazal Vaghedi Toronto February 21, 2012 #12ENCONF.
Federated Searching Pre-Conference Workshop - The federated searching cookbook Qin Zhu HP Labs Research Library February 18, 2007.
Module 8: Server Management. Overview Server-level and instance-level resources such as memory and processes Database-level resources such as logical.
Upcoming Enhancements to the HST Archive Mark Kyprianou Operations and Engineering Division Data System Branch.
WebFOCUS Developer Studio Update Dimitris Poulos Technical Director September 3, 2015 Copyright 2009, Information Builders. Slide 1.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Cube Enterprise Database Solution presented to MTF GIS Committee presented by Minhua Wang Citilabs, Inc. November 20, 2008.
Microsoft ® SQL Server ® 2008 and SQL Server 2008 R2 Infrastructure Planning and Design Published: February 2009 Updated: January 2012.
SensIT PI Meeting, January 15-17, Self-Organizing Sensor Networks: Efficient Distributed Mechanisms Alvin S. Lim Computer Science and Software Engineering.
Lecture 7 Interaction. Topics Implementing data flows An internet solution Transactions in MySQL 4-tier systems – business rule/presentation separation.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
High Performance Computing Cluster OSCAR Team Member Jin Wei, Pengfei Xuan CPSC 424/624 Project ( 2011 Spring ) Instructor Dr. Grossman.
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
1 Introductory Notes on the Git Source Control Management Ric Holt, 8 Oct 2009.
Presentation on SubmissionTrackingTool: by Anjan Sharma.
Imaging Node Meeting Atlas II Status and Plans August 2, 2006.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
Page No. 1 Kelvin Nichols Payload Operations and Integration Center EO50 Delay Tolerant Networking (DTN) Implementation on the International Space Station.
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
We have developed a GUI-based user interface for Chandra data processing automation, data quality evaluation, and control of the system. This system, known.
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Archiving Standards.
Module 1: Exploring Replication. Overview Understanding SQL Server Replication Setting Up Replication Understanding Agents in Replication Securing Replication.
Large Scale Parallel File System and Cluster Management ICT, CAS.
DISTRIBUTED COMPUTING. Computing? Computing is usually defined as the activity of using and improving computer technology, computer hardware and software.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
An OBSM method for Real Time Embedded Systems Veronica Eyo Sharvari Joshi.
Experiment Management System CSE 423 Aaron Kloc Jordan Harstad Robert Sorensen Robert Trevino Nicolas Tjioe Status Report Presentation Industry Mentor:
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
CCNA4 v3 Module 6 v3 CCNA 4 Module 6 JEOPARDY K. Martin.
Portal Update Plan Ashok Adiga (512)
Composition in Modeling Macromolecular Regulatory Networks Ranjit Randhawa September 9th 2007.
MySQL and GRID status Gabriele Carcassi 9 September 2002.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Module 9 Planning and Implementing Monitoring and Maintenance.
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
A Technical Overview Bill Branan DuraCloud Technical Lead.
Unit 17: SDLC. Systems Development Life Cycle Five Major Phases Plus Documentation throughout Plus Evaluation…
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
INFORMATION DEPLOYED. SOLUTIONS ADVANCED. MISSIONS ACCOMPLISHED. PDS Punch-Out v1.0 SPS Spotlight Series October 2014.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
V7 Foundation Series Vignette Education Services.
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
«My future profession»
Graphical Data Engineering
Simulation Production System
TYPES OF SERVER. TYPES OF SERVER What is a server.
X in [Integration, Delivery, Deployment]
Database Systems Chapter 1
Cloud computing mechanisms
Presentation transcript:

Conductor at HiROC Bradford Castalia 19 September, 2007 PSI Designing and Implementing Processing Pipelines with Conductor: The HiROC Experience Bradford Castalia Systems Analyst Planetary Image Research Laboratory HiRISE Operations Center University of Arizona Tucson, Arizona

Conductor at HiROC Bradford Castalia 19 September, 2007 PSI Pipeline Processing Conductor is a Java application for managing queues of source files to be processed by sequences of procedures. Procedures Defined in a database table by sequence number Data processing procedure Success criteria A procedure must be successful for the next to run On-failure (branch) procedure Sources Defined in a database table by source number Source file pathname Log file pathname Procedure status values Will be processed by one and only one Conductor

Conductor at HiROC Bradford Castalia 19 September, 2007 PSI Pipeline Processing Database Procedures and Sources tables are paired Multiple Conductor instances use the same database Multiple Conductor instances for the same pipeline Configuration Based on ISO standard PVL Configuration files may be shared Configuration files may be included (e.g. site config) in other configuration files Environment variables are included Conductor maintained parameters Reference Resolving Configuration parameter references Database field references Nested references Expression evaluation

Conductor at HiROC Bradford Castalia 19 September, 2007 PSI Science Teams and Ops Staff HiWeb Public HiCat Pipelines Host OS Environment ISIS HiSPICE Eng DOM PDS Products HiVali HiArch RDRgenEDRgen HiReport HiEST RSDS HiDOG Conductor Downlink Data Flow

Conductor at HiROC Bradford Castalia 19 September, 2007 PSI HiDog Pipeline EDRgen Pipeline EDR_Stats Pipeline RSDS Raw Data Repository WatchDog Check data availability HiStitch Pipeline HiccdStitch Pipeline RedGeom Pipeline ColorGeom Pipeline ColorMosaic Pipeline RDRgen (JPEG2000) Pipeline Internal Products (JPEG2000) EDR Table HiCal Pipeline RedMosaic Pipeline Full-Res Color RDR Full-Res Red RDR Table EDR Geometry Table HiCat Database Standard Data Products HiGeomInit Pipeline NAIF Node SPICE Repository HiSPICE SPICE Validation SPICE Pause Validation & Release

Conductor at HiROC Bradford Castalia 19 September, 2007 PSI Initiate and Data Download FEI_Watchdog Poll the data delivery server (RSDS) Register the download file Pipeline_Source Fetch and prep the data file Download the file from the server Notify operators on failure Only continue if configured to do so perl -e ‘exit ${Continue_Status};’ Move the file and update the Source_Pathname Register the file in the next pipeline

Conductor at HiROC Bradford Castalia 19 September, 2007 PSI EDR Production and Metadata Collection Check for multi-channel data file Break out channel files and register new sources Generate EDR product file PVL_to_DB map of PDS label parameters to HiCat EDR_Products record field values Replace existing record if configured to do so RDR and Extras Production Photometric processing Geometric processing Collect all channel files for the observation before registering them in the next pipeline Use mutilple systems in parallel for compute-intensive processing Reprocessing

Conductor at HiROC Bradford Castalia 19 September, 2007 PSI Management Issues Incremental pipeline development The ability to grow the network of pipelines without inherent ripple effects is very important. Splitting and merging pipeline segments can be done at will. Testing of pipeline segments or portions of a network can be done in sandbox environments, including individual developer or user contexts, separate from the production environment without the need for a complete production configuration yet exactly mirroring the production configuration and operations. Adaptable to the level of demand Conductor instantiations can be added or removed from pipeline processing at any time. Error tolerant Each Conductor acts independently.

Conductor at HiROC Bradford Castalia 19 September, 2007 PSI Hardware System Design Issues Network bandwidth Consider all possible sources Network overload can cause hardware switches to fail Foundation (generally not incremental) CPUs Services: database, web, Compute engines: add as needed Data storage Fast, local space; especially /tmp Bulk, shared space: add as needed NFS latencies

Conductor at HiROC Bradford Castalia 19 September, 2007 PSI Future Development: PostgreSQL New Data_Port being integrated for distribution Composer Interactive Procedures table definition Add, remove and reorder procedures Edit procedure definition fields Test reference resolving Maestro Manage multiple Conductors Local or remote Start, suspend/resume, stop Monitor logging streams Report throughput and backlogs of Sources Accumulate resource utilization metrics