Existing Perl/Oracle Pipeline

Slides:



Advertisements
Similar presentations
Connecting to Databases. relational databases tables and relations accessed using SQL database -specific functionality –transaction processing commit.
Advertisements

New Release Announcements and Product Roadmap Chris DiPierro, Director of Software Development April 9-11, 2014
Database System Concepts and Architecture
Operating System.
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
UNIX Chapter 01 Overview of Operating Systems Mr. Mohammad A. Smirat.
GLAST LAT Project ISOC Peer Review - March 2, 2004 Document: LAT-PR Section 6.1 Ground Operations Software 1 Gamma-ray Large Area Space Telescope.
EventStore Managing Event Versioning and Data Partitioning using Legacy Data Formats Chris Jones Valentin Kuznetsov Dan Riley Greg Sharp CLEO Collaboration.
GLAST LAT Project ISOC Peer Review - March 2, 2004 Document: LAT-PR Section 4.3 Pipeline, Data Storage and Networking Issues1 Gamma-ray Large.
November 2011 At A Glance GREAT is a flexible & highly portable set of mission operations analysis tools that increases the operational value of ground.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
ArcGIS Workflow Manager An Introduction
Fermi Large Area Telescope (LAT) Integration and Test (I&T) Data Experience and Lessons Learned LSST Camera Workshop Brookhaven, March 2012 Tony Johnson.
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.
C++ Programming Language Lecture 1 Introduction By Ghada Al-Mashaqbeh The Hashemite University Computer Engineering Department.
SQL Server User Group Meeting Reporting Services Tips & Tricks Presented by Jason Buck of Custom Business Solutions.
Overview The GLAST Data Handling Pipeline (the Pipeline) provides a uniform interface to the diverse data handling needs of the GLAST collaboration. Its.
We have developed a GUI-based user interface for Chandra data processing automation, data quality evaluation, and control of the system. This system, known.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
LAT HSK Data Handling from B33 Cleanroom. ISOC Software Architecture.
BOINC: Progress and Plans David P. Anderson Space Sciences Lab University of California, Berkeley BOINC:FAST August 2013.
Application Software System Software.
Page 1 PACS GRITS 17 June 2011 Herschel Data Analysis Guerilla Style: Keeping flexibility in a system with long development cycles Bernhard Schulz NASA.
Batch Jobs Using the batch job functions. Use [Bulk Changes][Batch Job Utility] to start. Read the information panel. Check with TAMS Technical Support.
DZero Monte Carlo Production Ideas for CMS Greg Graham Fermilab CD/CMS 1/16/01 CMS Production Meeting.
The Database Project a starting work by Arnauld Albert, Cristiano Bozza.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
GLAST LAT ProjectNovember 18, 2004 I&T Two Tower IRR 1 GLAST Large Area Telescope: Integration and Test Two Tower Integration Readiness Review SVAC Elliott.
System SOFTWARE.
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
Invitation to Computer Science 6th Edition
Databases (CS507) CHAPTER 2.
 2001 Prentice Hall, Inc. All rights reserved.
Simulation Production System
Database Replication and Monitoring
2. OPERATING SYSTEM 2.1 Operating System Function
PLM, Document and Workflow Management
Operating System.

Designing For Testability
Work report Xianghu Zhao Nov 11, 2014.
GLAST Release Manager Automated code compilation via the Release Manager Navid Golpayegani, GSFC/SSAI Overview The Release Manager is a program responsible.
“Running Monte Carlo for the Fermi Telescope using the SLAC farm”
PHP / MySQL Introduction
IBM AS 400 online Training in Hyderabad
TYPES OFF OPERATING SYSTEM
Introduction of Week 3 Assignment Discussion
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
GLAST Large Area Telescope
LAT Data Server Serve what?
Lecture 1: Multi-tier Architecture Overview
Tiers vs. Layers.
Module 01 ETICS Overview ETICS Online Tutorials
Overview of big data tools
Lecture Topics: 11/1 General Operating System Concepts Processes
GLAST Large Area Telescope
Outline Chapter 2 (cont) OS Design OS structure
Overview of Workflows: Why Use Them?
Cohesion and Coupling.
Level 1 Processing Pipeline
GLAST Large Area Telescope Instrument Science Operations Center
Reportnet 3.0 Database Feasibility Study – Approach
Server & Tools Business
Overview of Computer system
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Existing Perl/Oracle Pipeline Daniel Flath (SLAC) SAS J2EE Review Nov 23, 2004

Requirements Handle MC, Data and be configurable to run arbitrary linked tasks Envisaged as the heart of the ISOC (Instrument Science Operations Center) triggering all its automated work Will be in use for 10+ years Talks to central databases, batch system and file servers in SCS Must run different tasks (eg flight data; MC; re-Recon) in parallel and not choke with hundreds to thousands of queued/running jobs Portability would be nice – for potential use at other GLAST sites and as backup at the GSSC (Science Support Center at Goddard)

Required Functionality automatically process Level 0 data through reconstruction (Level 1) provide near real-time feedback to IOC facilitate the verification and generation of new calibration constants re-process existing data produce bulk Monte Carlo simulations backup all data that passes through

Major Components relational database (management system) database access layer user interface scheduler execution layer

Components Perl Stored Procedure Wrapper Library Oracle Data (auto-generated) Oracle Data Pipeline Scheduler & Other Utilities Oracle Stored Procedure Code Perl Data Storage Classes Perl Data Management Code (logical operations)

Database Overview Currently Oracle, using Stored Procedures for all query access Perl Module provided as entrypoint to stored procedures (API) Primarily two sets of tables: Pipeline Management tables allow users to configure processing flow of a pipeline Processing History tables record status of processing and data

Task Configuration Tables Task Table Linked to by TaskProcess and Dataset Tables A task is comprised of 1 record in the task table and 0 or more records in the TaskProcess and Dataset tables The latter are related by a linked list of Read and Write flags that determine processing flow

TaskProcess and Dataset Tables A TaskProcess record contains all information necessary to run a job (script version, location, etc.) It’s links to Dataset records allow the pipeline to determine where to find input and where to write output datasets (files) for the job These filenames are provided to the users’ wrapper scripts at job execution

Processing History Tables Records in the Run table represent instances of processing for a Task TPInstance, and DSInstance records are in the same way analogous to TaskProcess and Dataset records

TPInstance and DSInstance Tables TPInstance tracks the state of processing for a single job (Execution status, CPU time, memory used, etc.) DSInstance records act as file descriptors for datasets used in the processing chain that is a run (File size, location on disk, file format, data type, archive location, etc.)

Major Components relational database (management system) database access layer user interface scheduler execution layer

Database Overview Currently Oracle, using Stored Procedures for all query access Perl Module provided as entrypoint to stored procedures (API) Primarily two sets of tables: Pipeline Management tables allow users to configure processing flow of a pipeline Processing History tables record status of processing and data

Task Configuration Tables Task Table Linked to by TaskProcess and Dataset Tables A task is comprised of 1 record in the task table and 0 or more records in the TaskProcess and Dataset tables The latter are related by a linked list of Read and Write flags that determine processing flow

TaskProcess and Dataset Tables A TaskProcess record contains all information necessary to run a job (script version, location, etc.) It’s links to Dataset records allow the pipeline to determine where to find input and where to write output datasets (files) for the job These filenames are provided to the users’ wrapper scripts at job execution

Processing History Tables Records in the Run table represent instances of processing for a Task TPInstance, and DSInstance records are in the same way analogous to TaskProcess and Dataset records

TPInstance and DSInstance Tables TPInstance tracks the state of processing for a single job (Execution status, CPU time, memory used, etc.) DSInstance records act as file descriptors for datasets used in the processing chain that is a run (File size, location on disk, file format, data type, archive location, etc.)

Current System, pros/cons Development is very fast Interaction with O/S is effortless (strong support of process-mgmt, file-sys access, string manipulation.) Strong, active user community – (free) Modules available to do everything. Feature additions are often simply a matter of finding appropriate libraries and gluing them together. Modification often more difficult than rewriting Code often very slow running compared to non-scripted equivalent Development tools (to my knowledge) lacking vs those available for Java (ie Debugging) Database Stored Procedures (thousands of LOC) not portable to other RDMBSes (excepting, perhaps, PostgreSQL) te