{ U.S. Department of the Interior U.S. Geological Survey Michael P. Finn Briefing to a pre-conference workshop of the 27th International Cartographic Conference:

Slides:



Advertisements
Similar presentations
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Advertisements

Running Large Graph Algorithms – Evaluation of Current State-of-the-Art Andy Yoo Lawrence Livermore National Laboratory – Google Tech Talk Feb Summarized.
Accelerating TauDEM as a Scalable Hydrological Terrain Analysis Service on XSEDE 1 Ye Fan 1, Yan Liu 1, Shaowen Wang 1, David Tarboton 2, Ahmet Yildirim.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Rapid Raster Projection Transformation and Web Service Using High-performance Computing Technology 2009 AAG Annual Meeting Las Vegas, NV March 25 th, 2009.
Reference: Message Passing Fundamentals.
The GEON LiDAR Workflow: An Internet-Based Tool for the Distribution and Processing of LiDAR Point Cloud Data Christopher J. Crosby, J Ramón Arrowsmith,
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
© 2007 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.1 Computer Networks and Internets with Internet Applications, 4e By Douglas.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
FLANN Fast Library for Approximate Nearest Neighbors
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological.
Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA) U.S. Department of the Interior U.S. Geological Survey Michael P.
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
Ch 4. The Evolution of Analytic Scalability
PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment U.S.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
CyberGIS Toolkit: A Software Toolbox Built for Scalable cyberGIS Spatial Analysis and Modeling Yan Liu 1,2, Michael Finn 4, Hao Hu 1, Jay Laura 3, David.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
A High-Throughput Computational Approach to Environmental Health Study Based on CyberGIS Xun Shi 1, Anand Padmanabhan 2, and Shaowen Wang 2 1 Department.
Descriptive Data Analysis of File Transfer Data Sudarshan Srinivasan Victor Hazlewood Gregory D. Peterson.
National Center for Supercomputing Applications The Computational Chemistry Grid: Production Cyberinfrastructure for Computational Chemistry PI: John Connolly.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
U.S. Department of the Interior U.S. Geological Survey Reprojecting Raster Data of Global Extent Auto-Carto 2005: A Research Symposium March, 2005.
Appraisal and Data Mining of Large Size Complex Documents Rob Kooper, William McFadden and Peter Bajcsy National Center for Supercomputing Applications.
CyberGIS in Action CyberGIS in Action Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic.
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Small-Scale Raster Map Projection Transformation Using a Virtual System to Interactively Share Computing Resources and Data U.S. Department of the Interior.
07:44:46Service Oriented Cyberinfrastructure Lab, Introduction to BOINC By: Andrew J Younge
OARE Module 5A: Scopus (Elsevier). Table of Contents About Scopus (Elsevier) Using Scopus Search Page Results/Refine Search Pages Download, PDF, Export,
Efrat Frank, Ashraf Memon, Vishu Nandigam, Chaitan Baru
Using free and/or open source tools to build workflows to manipulate and process LiDAR data Christopher Crosby.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
TeraGrid Quarterly Meeting Arlington, VA Sep 6-7, 2007 NCSA RP Status Report.
HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information.
Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Software Systems Development
Shaowen Wang1, 2, Yan Liu1, 2, Nancy Wilkins-Diehr3, Stuart Martin4,5
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment  Michael.
Chapter 2: The Linux System Part 1
CS110: Discussion about Spark
Ch 4. The Evolution of Analytic Scalability
Hadoop Technopoints.
2009 AAG Annual Meeting Las Vegas, NV March 25th, 2009
CSE451 Virtual Memory Paging Autumn 2002
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

{ U.S. Department of the Interior U.S. Geological Survey Michael P. Finn Briefing to a pre-conference workshop of the 27th International Cartographic Conference: Spatial data infrastructures, standards, open source and open data for geospatial (SDI-Open 2015) August 2015, Brazilian Institute of Geography and Statistics (IBGE), Rio de Janeiro, Brazil Using the Message Passing Interface (MPI) and Parallel File Systems for Processing Large Files in the LAS Format

Co-Authors Jeffrey Wendel – Lead Author Jeffrey Wendel – Lead Author U.S. Geological Survey (USGS), Center of Excellence for Geospatial Information Science (CEGIS) John Kosovich John Kosovich USGS, Core Science Analytics, Synthesis, & Libraries (CSAS&L) Jeff Falgout Jeff Falgout USGS, CSAS&L Yan Liu Yan Liu CyberInfrastructure and Geospatial Information Laboratory (CIGI) National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign (UIUC) Frank E. Velasquez Frank E. Velasquez USGS, CEGIS

Outline Objective Objective Test Environment/ XSEDE Test Environment/ XSEDE Study Areas Study Areas Creating DEMS from Point Clouds Creating DEMS from Point Clouds Modifying open source software Modifying open source software p_las2las Implementation p_las2las Implementation p_las2las Results p_las2las Results p_points2grid Implementation p_points2grid Implementation p_points2grid Results p_points2grid Results Literature Literature Questions Questions

Objective Problem: Create high resolution DEMs from large lidar datasets Approach: Modify open source software to run in parallel and test on XSEDE supercomputers (clusters)

XSEDE – Extreme Science and Engineering Development Environment A 5-year 127M (1.27 x 10 8 ) dollar project funded by NSF Supports 16 supercomputers and high-end visualization and data analysis resources One of those is “Stampede” at Texas Advanced Computing Center (TACC) Accessed through University of Illinois (Liu Co-PI) allocations Current (yearly) allocation 8 million (8.0 x 10 6 ) computing hours Accessed through USGS “Campus Champion” allocations

Stampede Peak Performance (tflops): 9,600 Number of Cores: 522,080 Memory (TB): 270 Storage (PB): 14

What is a “Supercomputer” Anyway? Collection of nodes (Xeon X5 based computers) Networked together (Infiniband) Running a common OS (Linux) Shared parallel file system (Lustre) Running a scheduler (SLURM) An inter-process communication (IPC) mechanism (MPI: MPI_Send, MPI_Recv, MPI_File_write_at... ) A good example is Stampede

{

Great Smoky Mtn. Study Area

Grand Canyon Study Area

Create high-resolution DEMs from large lidar datasets DEM resolution of 1 meter Lidar datasets with coverage over about a 15x15 minute footprint This results in a raster size of approximately 27,000 rows X 27,000 columns, about 750 million cells Obtained two datasets in LAS 1.2 format A 16 GB file with 500 million (5.7 x 10 8 ) points in the Smoky Mountains over a 40,000 X 20,000 meter area A 120 GB file with 4 billion (4.2 x 10 9 ) points over the Grand Canyon over a 25,000 X 30,000 meter area Both files are somewhat sparse in that some “tiles” within the coverage are missing

Modify open source software to run in parallel (and test on XSEDE clusters) las2las from the lastools suite to filter all but ground points points2grid to make the DEM from the filtered LAS file Test on Stampede, at TACC

p_las2las Implementation las2las application and supporting LASlib library were extended with the MPI API to allow the application to be run in parallel on a cluster Goal was an application that would scale to arbitrarily large input Limited only by the amount of disk space needed to store the input and output file No intermediate files are generated and individual process memory requirements are not determined by the size of the input or output

p_las2las Implementation Comparison of Native las2las versus p_las2las Native las2las algorithm: For all points: Read the point Apply a filter and/or transformation Write the possibly transformed point if it passes the filter p_las2las algorithm: Processes determine point range and set input offsets For all points in a process's point range: Read point and apply filter Keep count of points that pass filter Processes gather filtered point counts from other processes Processes can then set write offsets Processes set read offsets back to beginning point, begin second read for all points: Read point and apply filter and transformation Write the possibly transformed point if it passes the filter Gather and reduce point counts, return counts, min and max x values Update the header with rank 0 process

p_las2las Implementation The high level view of the p_las2las application: The vertical flow describes the job flow While the processes across the top are run in parallel on the flow

p_las2las Results Smoky Mountains (16 GB) * Native unmodified las2las source code from LASTools compiled on Stampede with the Intel C++ compiler Number of Processes Filter / Transformation Output Size Elapsed Time (seconds) Native*None16 GB138 Native*Keep Class 22 GB73 Native*Reproject16 GB502 64None16 GB20 64Keep Class 22 GB6 64Reproject16 GB26 256None16 GB8 256Keep Class 22 GB4 256Reproject16 GB9 1024None16 GB8 1024Keep Class 22 GB5 1024Reproject16 GB8

p_las2las Results Grand Canyon (120 GB) p_las2las Results Grand Canyon (120 GB) * Native unmodified las2las source code from LASTools compiled on Stampede with the Intel C++ compiler Number of Processes Filter / Transformation Output Size Elapsed Time (seconds) Native*None120 GB1211 Native *Keep Class 225 GB623 Native*Reproject120 GB None120 GB128 64Keep Class 225 GB59 64Reproject120 GB None120 GB33 256Keep Class 225 GB18 256Reproject120 GB None120 GB Keep Class 225 GB9 1024Reproject120 GB24

p_points2grid Implementation points2grid application was extended with the MPI API to allow the application to be run in parallel on a cluster Goal was an application that would scale to arbitrarily large input Limited only by the amount of disk space needed to store the input and output file and the number of processes available No intermediate files are generated and individual process memory needs are not determined by the size of the input or output

p_points2grid Implementation Comparison of Native points2grid versus p_points2grid Native points2grid algorithm: For each point: Update output raster cells when the point falls within a circle defined by the cell corner and a given radius Optionally fill null cells with adjacent cell values Write the output raster cell p_points2grid algorithm: Processes are designated as reader or writer Reader processes are assigned a range of points Writer processes are assigned a range of rows Reader processes read LAS points from the input file and sends them to the appropriate writer processes based on whether the point falls within a circle defined by the cell corner and a given radius Writer processes receive LAS points from reader processes and update cell contents with elevation values. When all points have been sent and received, Writer processes apply an optional window filling parameter to fill null values ( This involves writer to writer communication when the window size overlaps two writers) Writer processes determine and set write offsets and write their range of rows to the output file. The first writer rank is responsible for writing the output file header and in the case of TIFF output, the TIFF directory contents

p_points2grid Implementation The high level view of the p_points2grid application The job flow is described by the boxes on the right side While the processes along the left are the internal processes of the flow functions

p_points2grid Results Smoky Mountains (16 GB) p_points2grid Results Smoky Mountains (16 GB) 12, 1-meter resolution DEMs totaling 70 GB of output for p_points2grid runs 12, 6-meter resolution DEMs totaling 2 GB of output for native run Number of Processes Number of Readers Number of Writers Time: Reading, Communication Time: Writing Elapsed Time (seconds) Native11NA

p_points2grid Results Grand Canyon (120 GB) p_points2grid Results Grand Canyon (120 GB) * Native unmodified las2las source code from LASTools compiled on Stampede with the Intel C++ compiler Number of Processes Number of Readers Number of Writers Time: Reading, Communication Time: Writing Elapsed Time (seconds) Native11NA , 1-meter resolution DEMs totaling 71 GB of output for p_points2grid runs 12, 6-meter resolution DEMs totaling 2 GB of output for native run

Conclusions Demonstration of novel solution to the handling, exploitation, and processing of enormous lidar datasets Especially at a time when their availability is increasing rapidly in the natural sciences Expansion of existing tools that now run in parallel processing modes Improves upon previous attempts to exploit lidar data in the parallel processing environment by using MapReduce Creating parallel processing algorithms based on the open source las2las and points2grid code bases Greatly reduced run times processing extremely large lidar point cloud datasets Over 100 GB in file size Both in classifying the points and in generating DEMs p_las2las and p_points2grid provides approximately two or more orders of magnitude reduction in processing time Demonstrated scalability up to 4,096 processes

References The Apache Software Foundation (2014) Welcome to Apache™ Hadoop®! Internet at: Last accessed 24 November Arrowsmith, J.R., N. Glenn, C. J. Crosby, and E. Cowgill (2008) Current Capabilities and Community Needs for Software Tools and Educational Resources for Use with LiDAR High Resolution Topography Data. Proceedings of the OpenTopography Meeting held in San Diego, California, on August 8, San Diego: San Diego Supercomputer Center. ASPRS (American Society for Photogrammetry and Remote Sensing) (2008) LAS Specification, Version 1.2. Internet at Last accessed 24 November ASPRS (American Society for Photogrammetry and Remote Sensing) (2011) LASer (LAS) File Format Exchance Activities. Internet at Exchange-Activities.html. Last accessed 05 March Behzad, B., Y. Liu, E.Shook, M. P. Finn, D. M. Mattli, and S. Wang (2012). A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data. Abstract presented at the Auto-Carto 2012, A Cartography and Geographic Information Society Research Symposium, Columbus, OH. CGR (Center for Geospatial Research), University of Georgia, Athens, GA. Internet at: Last accessed 05 March Dean, J., and S. Ghemawat (2004) MapReduce: Simplified Data Processing on Large Clusters. Proceedings of OSDI ’04: 6th Symposium on Operating System Design and Implementation, San Francisco, CA, Dec Dewberry (2011) Final Report of the National Enhanced Elevation Assessment (revised 2012). Fairfax, Va., Dewberry, 84p. plus appendixes. Internet at enhanced-elevation-assessment. Last accessed 24 November Factor, M., K. Meth, D. Naor, O. Rodeh, and J. Satra (2005) Object storage: the future building block for storage systems. In LGDI ’05: Proceedings of the 2005 IEEE International Symposium on Mass Storage Systems and Technology, pages 119–123, Washington, DC, USA. IEEE Computer Society. Finn, Michael P., Yan Liu, David M. Mattli, Babak Behzad, Kristina H. Yamamoto, Qingfeng (Gene) Guan, Eric Shook, Anand Padmanabhan, Michael Stramel, and Shaowen Wang (2015). High-Performance Small- Scale Raster Map Projection Transformation on Cyberinfrastructure. Paper accepted for publication as a chapter in CyberGIS: Fostering a New Wave of Geospatial Discovery and Innovation, Shaowen Wang and Michael F. Goodchild, editors. Springer-Verlag. Isenburg, Martin (2014) lasmerge: Merge Multiple LAS Files into a Single File. Internet at Last accessed 03 March Kosovich, John J. (2014). Vertical Forest Structure from Lidar Point-cloud Data for the Tennessee Portion of Great Smoky Mountains National Park. Abstract presented at the 2014 International Lidar Mapping Forum, Denver, CO. Internet at: Krishnan, Sriram, Chaitanya Baru, and Christopher Crosby (2010). Evaluation of MapReduce for Gridding LIDAR Data. 2nd IEEE Internation Conference on Cloud Computing Technology and Science. Piernas, J., J. Nieplocha, and E. Felix (2007). Evaluation of active storage strategies for the lustre parallel file system. Proceedings of the ACM/IEEE Conference on Supercomputing. ACM, New York,. rapidlasso GmbH (2014) Lastools. Internet at Last accessed 24 November Rose, Eli T., John J. Kosovich, Alexa J. McKerrow, and Theodore R. Simons (2014). Characterizing Vegetation Structure in Recently Burned Forests of the Great Smoky Mountains National Park. Abstract presented at the ASPRS 2014 Annual Conference, Louisville, KY. Internet at: Sakr, Sherif, Anna Liu, and Ayman G. Fayoumi (2014). MapReduce Family of Large-Scale Data-Processing Systems. Chapter 2 in Large Scale and Big Data, Sherif Sakr and Mohamed Medhat Gaver, editors. CRC Press. Towns, John, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor Hazlewood, Scott Lathrop, Dave Lifka, Gregory D. Peterson, Ralph Roskies, J. Ray Scott, and Nancy Wilkins- Diehr (2014) XSEDE: Accelerating Scientific Discovery, Computing in Science & Engineering, vol.16, no. 5, pp , Sept.-Oct., doi: /MCSE US Army Corps of Engineers (2014). CRREL/points2grid. Internet at: Yoo, Andy B., Morris A. Jette, and Mark Grondona (2003). SLURM: Simple Linux utility for resource management. In Job Scheduling Strategies for Parallel Processing (pp ). Springer Berlin Heidelberg.

{ U.S. Department of the Interior U.S. Geological Survey Michael P. Finn Briefing to a pre-conference workshop of the 27th International Cartographic Conference: Spatial data infrastructures, standards, open source and open data for geospatial (SDI-Open 2015) August 2015, Brazilian Institute of Geography and Statistics (IBGE), Rio de Janeiro, Brazil Using the Message Passing Interface (MPI) and Parallel File Systems for Processing Large Files in the LAS Format Questions?

Backup slides

{ p_las2las Implementation Detailed Explanation of the p_las2las Implementation Each process opens the LAS input file and reads the file header to determine the number of points in the input file, point size, and size of the header. Based on the point count, process rank and process count, each process calculates the range of LAS points for which it will be responsible. Since point and the header size are known, each process can calculate and set its file pointer to its beginning point. Each process then reads each point in its range and applies any filter passed to the program, keeping a count of points that pass the filter. After reading and filtering the last point, all processes gather from one another the number of points that have passed the filter, and thus will be writing. Each process uses the results from this gather and its rank order to calculate and set its output file pointer. Each process then sets its read pointer back to the beginning of its range of points. A second read and filtering of its point range begins, but this time the points that pass the filter are written to the output file. It is this second read pass that allows the program to scale to arbitrary input and output size without allocating extra memory or writing temporary files. The process with rank 0 is charged with writing the output file header with data gathered from the input header, and gathering and reducing process dependent data such as minx, maxx, miny, maxy, minz, max from the other processes. To minimize the number of calls to MPI_File_write, each process allocates a buffer of configurable size and only calls MPI_File_write when its buffer is full, along with a final flush to disk after the last point is processed.

{ p_points2grid Implementation Detailed Explanation of the p_points2grid Implementation Initialization Each reader process allocates a LAS point buffer for each writer process. These buffers are necessary to keep process-to-process communication at reasonable levels, since without them an MPI send/receive would occur with every LAS point. The size of these buffers is dependent on the writer count and is calculated and capped at run time so as not to exceed available memory. Each writer process allocates memory to hold the grid cell values for the rows for which it is responsible. The row count is determined by the number of rows in the grid divided by the writer process count. This introduces a memory dependency that our current implementation does not address. As a practical matter, the number of writer processes can be increased to address this limitation. Each writer process also allocates write buffers of configurable size to limit the number of calls to MPI_write. When a window filling parameter is specified, writer processes allocate and fill two dimensional raster cell buffers of up to three rows before and after their range. This is necessary to keep process-to-process communication at reasonable levels. Reading and Communication: Each reader process calculates and sets its input file pointer to the the beginning of its range of points. It then reads each point and determines which raster cells have overlap with the point and a circle defined by the cell corner and a radius given either by default or as a program input parameter. For each overlap, the point is added to the appropriate LAS point buffer. That is, the buffer corresponding to the writer responsible for processing that cell. When a buffer fills, it is sent with a MPI_Send to the appropriate writer. The writer receives the buffer with MPI_Recv and updates its raster cell values with the point buffer data. When a reader process completes, it flushes its point buffers to all writers one last time. Writer Processing: Once all points have been received by the readers, Each writer iterates over their raster cells and calculates mean and standard deviation values. If a window size parameter has been passed, each writer iterates over its cells and attempts to fill null cell values with weighted averages of values from adjacent cells up to three cells away. When cells fall near a writer's beginning or ending row, these values are retrieved from the rows of adjacent writer processes. Writing: Each writer first determines the total number of bytes it will write for each DEM output type and each raster cell type, Since the output types supported are ASCII text, each writer must iterate over its cell values and sum their output length. Once all writer's have determined their output length counts, each gathers these counts from all other writer processes and uses these counts along with rank order and header size to set output file pointer positions. Each process then iterates over its values again, but this time writes the ASCII format of the value to a buffer. When the buffer fills it is written to disk with MPI_File_write. The first writer process is responsible for writing the file header.