DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S.

Slides:



Advertisements
Similar presentations
OPERATING SYSTEM An operating system is a group of computer programs that coordinates all the activities among computer hardware devices. It is the first.
Advertisements

Chapter 3: Modules, Hierarchy Charts, and Documentation
SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
MotoHawk Training Model-Based Design of Embedded Systems.
ARCS Data Analysis Software An overview of the ARCS software management plan Michael Aivazis California Institute of Technology ARCS Baseline Review March.
1: Operating Systems Overview
Lesson 12 – NETWORK SERVERS Distinguish between servers and workstations. Choose servers for Windows NT and Netware. Maintain and troubleshoot servers.
F2032 Fundamental of OS Chapter 1 Introduction to Operating System Part 4.
The Structure of the “THE” -Multiprogramming System Edsger W. Dijkstra Jimmy Pierce.
Experience with K42, an open- source, Linux-compatible, scalable operation-system kernel IBM SYSTEM JOURNAL, VOL 44 NO 2, 2005 J. Appovoo 、 M. Auslander.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Guide To UNIX Using Linux Third Edition
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Chapter 6: An Introduction to System Software and Virtual Machines
Introduction to Software Testing
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Design Synopsys System Verilog API Donations to Accellera João Geada.
ECE 265 – LECTURE 9 PROGRAM DESIGN 8/12/ ECE265.
–Streamline / organize Improve readability of code Decrease code volume/line count Simplify mechanisms Improve maintainability & clarity Decrease development.
Stack Management Each process/thread has two stacks  Kernel stack  User stack Stack pointer changes when exiting/entering the kernel Q: Why is this necessary?
What is Concurrent Programming? Maram Bani Younes.
Computer Software. Evolution of Programming Languages Machine Languages Assembly Languages High-Level Languages Fourth-Generation Languages.
Topics Introduction Hardware and Software How Computers Store Data
DPEAS Training Session April 19, DPEAS Training Session Dr. Andrew S. Jones, Mr. Phil Shott, and Mr. John Forsythe Cooperative Institute for Research.
Advanced PI Calculation Engine Makes Complex PI Calculations Easy! Use of EDICTvb for Multi-Plant Advanced PI Calculations Dane OverfieldEXELE Information.
Imperial College Tracker Slow Control & Monitoring.
CISC105 General Computer Science Class 1 – 6/5/2006.
SPACE TELESCOPE SCIENCE INSTITUTE Operated for NASA by AURA COS Pipeline Language(s) We plan to develop CALCOS using Python and C Another programming language?
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
An IDL-BasedValidation Toolkit: Extensions to use the HDF-EOS Swath Format Ken Stone, Center for Lower Atmospheric Studies - University of Colorado, Boulder.
OPERATING SYSTEMS Goals of the course Definitions of operating systems Operating system goals What is not an operating system Computer architecture O/S.
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview Part 2: History (continued)
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Operating Systems Lecture November 2015© Copyright Virtual University of Pakistan 2 Agenda for Today Review of previous lecture Hardware (I/O, memory,
INFORMATION SYSTEM-SOFTWARE Topic: OPERATING SYSTEM CONCEPTS.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
Earth System Modeling Framework Python Interface (ESMP) October 2011 Ryan O’Kuinghttons Robert Oehmke Cecelia DeLuca.
March 2004 At A Glance autoProducts is an automated flight dynamics product generation system. It provides a mission flight operations team with the capability.
Operating Systems Objective n The historic background n What the OS means? n Characteristics and types of OS n General Concept of Computer System.
CPSC 171 Introduction to Computer Science System Software and Virtual Machines.
Lecture 18 Windows – NT File System (NTFS)
Application Software System Software.
Full and Para Virtualization
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Overview of Previous Lesson(s) Over View 3 Program.
Getting ready. Why C? Design Features – Efficiency (C programs tend to be compact and to run quickly.) – Portability (C programs written on one system.
Chapter – 8 Software Tools.
CITA 171 Section 1 DOS/Windows Introduction. DOS Disk operating system (DOS) –Term most often associated with MS-DOS –Single-tasking operating system.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Programming Logic and Design Seventh Edition Chapter 1 An Overview of Computers and Programming.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Invitation to Computer Science 6th Edition
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Topics Introduction Hardware and Software How Computers Store Data
KERNEL ARCHITECTURE.
Computer Science I CSC 135.
Topics Introduction Hardware and Software How Computers Store Data
What is Concurrent Programming?
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
Operating Systems Lecture 3.
Introduction to Computer Programming
Principles of Programming Languages
Overview of Workflows: Why Use Them?
The Lua Chunk Vault, an enhancement to epics base
System Virtualization
Programming Logic and Design Eighth Edition
Presentation transcript:

DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S. Jones Colorado State University (CSU) Cooperative Institute for Research in the Atmosphere (CIRA) DOD Center for Geosciences / Atmospheric Research (CG/AR) Fort Collins, CO

DOD Center for Geosciences / Atmospheric Research Colorado State University What is it?  Data processing system for “large” data analysis tasks using common PCs  Features:  2 nd generation system (replaces an earlier system called PORTAL (Jones et al., 1995))  Parallel implementation  Web-based documentation and monitoring  Incorporates a Fortran-interpreter for input tasks  Virtualized I/O subsystem (only memory-resident data structures are needed, data algorithms now function like a model)  Able to failover to redundant hardware  Extensible User Module  Error Analysis code is still under development  Implemented on Windows NT/2000 OS

DOD Center for Geosciences / Atmospheric Research Colorado State University What Does it Do?  Global merge capabilities for numerous data sets  Current system in operational use for 2+ years at CIRA  Current average operational throughput rates using 15 processors on 8 PCs is 17 TB/yr (47 GB/day).  Measured max. throughput rate is: 2.5 PB/yr (7.1 TB/day)  Simplifies  Powerful abstraction layers allow anyone to write parallel code  Virtual I/O subsystem reduces end-user code complexities  Users interact using a language most already know  Easily Scales  Limited process “cross-talk” improves scaling behavior  Tests have shown that a 2000 machine cluster is physically feasible.  Basically… just add hardware.

DOD Center for Geosciences / Atmospheric Research Colorado State University 10 Data Types Are Currently Supported  Reads and Writes HDF-EOS natively  GOES IMAGER (McIDAS)  NOAA AVHRR GAC and LAC (McIDAS)  NOAA AMSU-A and B (HDF-EOS)  DMSP SSM/I (Byte Stream)  DMSP SSM/T-2 (NGDC OIS)  DMSP OLS (NGDC OIS)  TRMM TMI and VIRS (HDF)  User extensible… (your format here)

DOD Center for Geosciences / Atmospheric Research Colorado State University The Hardware

DOD Center for Geosciences / Atmospheric Research Colorado State University Failover Mode

DOD Center for Geosciences / Atmospheric Research Colorado State University Module Context GUIs This is DPEAS

DOD Center for Geosciences / Atmospheric Research Colorado State University An example of a DPEAS input script file

DOD Center for Geosciences / Atmospheric Research Colorado State University How DPEAS Starts Program Start DPEAS Initialization Interpreting DPEAS script declarations Interpreting DPEAS script executable statements

DOD Center for Geosciences / Atmospheric Research Colorado State University How DPEAS Ends Program End DPEAS Summary Interpreting DPEAS script executable statements

DOD Center for Geosciences / Atmospheric Research Colorado State University How Are Spawned Input Scripts and Jobs Created?  All spawned DPEAS jobs run machine-generated DPEAS input scripts which are generated by the data processing engine from the Master DPEAS input script (The examples shown previously were examples of DPEAS machine-generated code)  This is automated within DPEAS and the user code goes along for the free ride since it is part of the DPEAS executable (it’s like meeting a friendly virus which helps to spread your code along with it)

DOD Center for Geosciences / Atmospheric Research Colorado State University What Does DPEAS Parallelism Look Like? Do loop contents are sent to other resources in parallel The new jobs run the same “DPEAS.exe”, but execute only the subtask operations Completed Jobs allow additional jobs to start

DOD Center for Geosciences / Atmospheric Research Colorado State University The 3 Programming Steps to Add a User Routine to DPEAS 1. Insert a program “hook” The program hook makes the main DPEAS program aware of the existence of your wrapper routine. 2. Create a wrapper routine The wrapper routine tells the DPEAS fortran interpreter how to parse and interact with your application subroutine arguments. 3. Create an application routine The application routine performs the “real” work. You can do anything you want within the application routine.

DOD Center for Geosciences / Atmospheric Research Colorado State University How does the “User_Module.f90” relate to my DPEAS Input Scripts?

DOD Center for Geosciences / Atmospheric Research Colorado State University User Example: The user’s application routine Using the virtual I/O data via pointers 1. Find each MW channel 2. Allocate a new output array data structure Your science code looks like this

DOD Center for Geosciences / Atmospheric Research Colorado State University User Example: The results: Complete integration The new user routine is now fully integrated into DPEAS

DOD Center for Geosciences / Atmospheric Research Colorado State University User Example: The output HDF-EOS file

DOD Center for Geosciences / Atmospheric Research Colorado State University 150 GHz Effective Emissivity Calculated from: GOES-08 IMAGER NOAA-15 AMSU-B User Example: The output image representation

DOD Center for Geosciences / Atmospheric Research Colorado State University  Creates 2 new routines:  Wrapper routine  Application routine  Requires 25 lines of executable code:  2 – Program hook  4 – Wrapper routine  19 – Application routine  2 – Variable assignments  3 – Science algorithm  14 – Virtual I/O library calls (using only 2 Virtual I/O library routines) User Example: Summary Small overhead for gaining massive parallelism capabilities!

DOD Center for Geosciences / Atmospheric Research Colorado State University  Creates 2 new routines:  Wrapper routine  Application routine  Requires 59 lines of executable code:  2 – Program hook  4 – Wrapper routine  53 – Application routine  2 – Variable assignments  3 – Science algorithm  48 – HDF-EOS library calls (using 26 HDF-EOS library routines) User Example: How complex would the user routine be, if written without the Virtual I/O library? Answer: Without the DPEAS Virtual I/O library there would be: 24 additional I/O routines called by the user (+1200%) 34 additional lines of user code (+236%)

DOD Center for Geosciences / Atmospheric Research Colorado State University User Example: Conclusions  Implementation Insights  Minimal amount of end-user code is required  The effort and resources involved are small ( The DPEAS program recompiled in < 30 s on the user’s desktop)  Virtual I/O Insights  The DPEAS virtual I/O access method is less complex than traditional HDF-EOS file access methods  End user’s perspective  End users are protected from technical data format issues  End users can develop higher quality code by leveraging shared robust common modules  Scalability is greatly enhanced with little end user effort

DOD Center for Geosciences / Atmospheric Research Colorado State University Summary  DPEAS can process large data sets in an efficient manner while maintaining centralized management controls and error handling behaviors  Parallelism of the code is automatic and runs on “cheap hardware”  Failover capabilities make the system more robust  User code is shielded from complexities of the system using software abstraction layers  Little training is needed since user interfaces are in a known scientific language  User modules directly access data from memory – obsolesces traditional file access methods but maintains needed file compatibility

DOD Center for Geosciences / Atmospheric Research Colorado State University What did I learn about HDF-EOS in the process?  HDF-EOS is an excellent “universal” data format It works for all satellite sensors types I have encountered to date (10+)  HDF-EOS requires serious software design before the implementation stage  It is my experience that “Time” information as a geo/time field for sectorizing is overrated and is likely to cause future software design headaches with the more complex sensors if encouraged to be the “norm”

DOD Center for Geosciences / Atmospheric Research Colorado State University My 2 cents: How HDF-EOS could be made even better (Hopefully someone has already thought of these things, and this short list will be a reaffirmation)  Given that GOES data, for example, and other multi-detector sensors can have multiple times for each channel for the same geolocation position, and that in addition, they can and do interrupt their sensor scans at any time…  Treat “Time” as a data attribute  Currently I associate “Time” and other associated arrays with its principle data array by nomenclature  It would be better to use data array attribute “groups”. Then “Time”, “Calibration”, and other associated arrays could be grouped with the data array through the data format.

DOD Center for Geosciences / Atmospheric Research Colorado State University Why Data Attributes?  Many data channels have “associated” information  For example, it might be very meaningful to associate the min. and max. of a grid location with its mean value  It would be better if there was a standard way of showing that group association, so we don’t have to understand each other’s unique nomenclatures, “intent”, or have to resort to the use of unusual “mixed” HDF/HDF-EOS data files  Data attributes should not be arbitrarily limited in scope, but have full data type ranges  Units could also be incorporated through data attributes

DOD Center for Geosciences / Atmospheric Research Colorado State University The End

DOD Center for Geosciences / Atmospheric Research Colorado State University Appendix The following series of slides show how a user can easily modify DPEAS 1. The user’s program hook 2. … wrapper routine 3. … application routine (using the virtual I/O data via pointers) 4. Usage of the new user routine in a DPEAS input script file 5. The Results: Complete Integration

DOD Center for Geosciences / Atmospheric Research Colorado State University User Example: The user’s program hook 2 lines of code

DOD Center for Geosciences / Atmospheric Research Colorado State University User Example: The user’s wrapper routine 4 lines of executable code

DOD Center for Geosciences / Atmospheric Research Colorado State University User Example: The user’s application routine Using the virtual I/O data via pointers 1. Find each MW channel 2. Allocate a new output array data structure Your science code looks like this

DOD Center for Geosciences / Atmospheric Research Colorado State University User Example: Usage of the new user routine in a DPEAS input script file

DOD Center for Geosciences / Atmospheric Research Colorado State University User Example: The results: Complete integration The new user routine is now fully integrated into DPEAS

DOD Center for Geosciences / Atmospheric Research Colorado State University Where Do I Find DPEAS? DPEAS Home Page: Please direct questions to