WINTER 2016 – TERM PRESENTATION MICHAEL O’KEEFE. PAST RESEARCH - SUMMER 2015 Continued Jason Woodring’s research on UWCA Main issue with UWCA is the slow.

Slides:



Advertisements
Similar presentations
Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
File Systems.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.
Chapter 8: I/O Streams and Data Files. In this chapter, you will learn about: – I/O file stream objects and functions – Reading and writing character-based.
Distributed storage for structured data
Unit Testing & Defensive Programming. F-22 Raptor Fighter.
PROCESS MODELING Chapter 8 - Process Modeling
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
1 -Defined Functions 1. Goals of this Chapter 2. General Concept 3. Advantages 4. How it works Programmer.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
CSC 213 – Large Scale Programming. Why Do We Test?
Unit 1 – Improving Productivity Tyler Dunn Instructions ~ 100 words per box.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
JAVA: An Introduction to Problem Solving & Programming, 5 th Ed. By Walter Savitch and Frank Carrano. ISBN © 2008 Pearson Education, Inc., Upper.
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
C++ for Engineers and Scientists Second Edition Chapter 8 I/O File Streams and Data Files.
Property of Jack Wilson, Cerritos College1 CIS Computer Programming Logic Programming Concepts Overview prepared by Jack Wilson Cerritos College.
Chapter 4 Memory Management Virtual Memory.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Parallel Algorithms & Distributed Computing Matt Stimmel Matt White.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
1 FUNCTIONS - I Chapter 5 Functions help us write more complex programs.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Project18’s Communication Drawing Design By: Camilo A. Silva BIOinformatics Summer 2008.
Testing Chapter 23 IB103 Week 12 (part 3). Verify that a complex (any) program works correctly, that the program meets specifications The chapter reviews.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Process by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
Operating System Concepts
An Introduction to GPFS
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Lesson 9: SOFTWARE ICT Fundamentals 2nd Semester SY
Compute and Storage For the Farm at Jlab
MASS Java Documentation, Verification, and Testing
Chapter 3: Process Concept
Chapter 7 Text Input/Output Objectives
Operating Systems (CS 340 D)
Chapter 7 Text Input/Output Objectives
Lecture 21 Concurrency Introduction
Parallel Density-based Hybrid Clustering
CSE341: Programming Languages Lecture 14 Thunks, Laziness, Streams, Memoization Dan Grossman Winter 2013.
Un</br>able’s MySecretSecrets
Operating Systems (CS 340 D)
CSE341: Programming Languages Lecture 14 Thunks, Laziness, Streams, Memoization Zach Tatlock Winter 2018.
Microsoft Visual Basic 2005 BASICS
Chapter 13: File Input and Output
Some Basics for Problem Analysis and Solutions
Lecture 17: Distributed Transactions
Lecture 2: Processes Part 1
Design and Programming
Directory Structure A collection of nodes containing information about all files Directory Files F 1 F 2 F 3 F 4 F n Both the directory structure and the.
Algorithm Discovery and Design
Threads Chapter 4.
Files Management – The interfacing
Introduction to Operating Systems
PVFS: A Parallel File System for Linux Clusters
CS639: Data Management for Data Science
Brett Wortzman Summer 2019 Slides originally created by Dan Grossman
Chapter 5 File Systems -Compiled for MCA, PU
Programming Logic and Design Eighth Edition
Presentation transcript:

WINTER 2016 – TERM PRESENTATION MICHAEL O’KEEFE

PAST RESEARCH - SUMMER 2015 Continued Jason Woodring’s research on UWCA Main issue with UWCA is the slow file reading performance A number of possible reasons (UWCA code, MASS, Netcdf, etc) Netcdf and text file performance testing Sequential read performance vs distributed read performance Performed tests using different storage methods Results confirmed that both text files and Netcdf files are read faster on multiple computing nodes

PAST RESEARCH - SUMMER 2015 (CONT)

WHAT ARE NETCDF FILES? Stands for Network Common Data Form Created by Unidata, part of the University for Atmospheric Research (UCAR) Array-oriented scientific data Consists of three parts: Dimensions (UWCA: longitude, latitude, time) Variables (UWCA: tasmax) Attributes (metadata)

FALL PAST RESEARCH Spent majority of my time looking through UWCA for causes of its slow read performance – very slow process since UWCA is such a large application Found some minor issues Was not making any significant progress Doctor Fukuda thought of a way to solve the performance problem and benefit the MASS Library Began designing how to implement Parallel I/O in the MASS Library

MOTIVATION FOR PARALLEL I/O Since the MASS library does not have any parallel I/O, users must read and/or write files via the master node, (i.e., the main program), which will become a bottleneck between the disk and all slave nodes participating in the same computation Parallel I/O would solve UWCA’s performance issue, and simultaneously benefit other MASS applications that could use parallel I/O capabilities and increase application performance

PARALLEL I/O - INITIAL DESIGN

PARALLEL I/O – INITIAL DESIGN (CONT)

PARALLEL I/O – CURRENT DESIGN Implemented in Place.java of the MASS Library Fields:

PARALLEL I/O - CURRENT DESIGN (CONT) Private class within Place.java – class FileAttributes:

PARALLEL I/O – OPEN CURRENT DESIGN Open method: Must be synchronized Requires a path to the file to be opened and a specified I/O type (0 for read, 1 for write) Returns a unique file number (descriptor) * Explain the process view *

NETCDF FILE OPEN VS OPEN IN MEMORY Recently found the method openInMemory(fileName) for Netcdf files Ran performance tests to see if this method could be useful: Created a 3D Netcdf File writing program, where the size of the file can easily be adjusted Created a Netcdf File reading program that has the option to open the written file in memory or on the disk, and then reads the file and outputs the time it took to read in milliseconds

RESULTS The reading time was on average 40% faster using openInMemory() (tested with varying file sizes from 500MB to 2GB) Note that occasionally the first read after opening in memory will only be 20% faster Note that a file size over 2GB cannot be opened in memory Note that the opening in memory process does take longer than just opening (of course!) Ran tests on Juno

RESULTS (CONT) One set of results:

PARALLEL I/O – CLOSE CURRENT DESIGN Close Method: Synchronized Returns true if the given file descriptor is closed properly; otherwise, false Takes a file number (descriptor) as a parameter If the file descriptor exists in the file table, the file descriptor is closed and the file is removed from the file table

UNIT TESTING – THANKS MATT EasyMock & Junit Allows developers to isolate pieces of code and test them using mock objects Can create any test cases you can think of Using random generators could help catch some edge cases Gives you peace of mind when changing your code (can check if tests pass) Isolating the code to be tested allows developers to not worry about other components of the application that might have errors (you know exactly where the bugs are)

MY UNIT TESTING Created a class called PlaceTest.java Created Place test subjects Created multiple text and Netcdf files to perfromed I/O operations on Can run the tests and compare output with expected output Right now: open and close functionality works for both text and Netcdf files Read is working for text files

UNIT TEST EXAMPLE Note that no mock objects are needed Would you like to see the test run?

TO BE IMPLEMENTED Netcdf reading – need to figure out how to determine the structure of an unknow Netcdf file Text and Netcdf writing – need to determine how the user specifies what is to be written File partitioning and distribution tool – splits a file depending on how many computing nodes are being used and distributes each file piece to their respective node – in the /tmp directory File collecting and merging tool – collects the file pieces from each node after I/O operations have been preformed, and merges pieces back to one file

QUESTIONS – ME ASKING YOU To implement write: Need to know what to write to a file How do you want the user to give us that information? Will read be non-synchronized if it is called from a synchronized method, open? I found this earlier this week, which brings up some questions:

WHAT I HAVE LEARNED File I/O in Java – synchronized, FileChannel, ByteBuffer, OpenOptions Linux – file descriptos, working in Terminal Unit Testing Source Tree, Bitbucket Avoiding race conditions – testAndSet() Netcdf Files - reading, writing, determining structure

WHAT I WOULD DO DIFFERENTLY (GOOD LESSON) Design before for implementation: I was eager to get start coding, especially when Doctor Fukuda wrote out the basic design (whiteboard pictures) A lot of that original design was changed It would have been better if I tried to break down the entire task before writing a single line of code Draw flow charts Write pseudocode Ask questions

QUESTIONS? – YOU ASKING ME Thanks for listening!