Presentation is loading. Please wait.

Presentation is loading. Please wait.

{ U.S. Department of the Interior U.S. Geological Survey Michael P. Finn Briefing to a pre-conference workshop of the 27th International Cartographic Conference:

Similar presentations


Presentation on theme: "{ U.S. Department of the Interior U.S. Geological Survey Michael P. Finn Briefing to a pre-conference workshop of the 27th International Cartographic Conference:"— Presentation transcript:

1 { U.S. Department of the Interior U.S. Geological Survey Michael P. Finn Briefing to a pre-conference workshop of the 27th International Cartographic Conference: Spatial data infrastructures, standards, open source and open data for geospatial (SDI-Open 2015) 20-21 August 2015, Brazilian Institute of Geography and Statistics (IBGE), Rio de Janeiro, Brazil Using the Message Passing Interface (MPI) and Parallel File Systems for Processing Large Files in the LAS Format

2 Co-Authors Jeffrey Wendel – Lead Author Jeffrey Wendel – Lead Author U.S. Geological Survey (USGS), Center of Excellence for Geospatial Information Science (CEGIS) John Kosovich John Kosovich USGS, Core Science Analytics, Synthesis, & Libraries (CSAS&L) Jeff Falgout Jeff Falgout USGS, CSAS&L Yan Liu Yan Liu CyberInfrastructure and Geospatial Information Laboratory (CIGI) National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign (UIUC) Frank E. Velasquez Frank E. Velasquez USGS, CEGIS

3 Outline Objective Objective Test Environment/ XSEDE Test Environment/ XSEDE Study Areas Study Areas Creating DEMS from Point Clouds Creating DEMS from Point Clouds Modifying open source software Modifying open source software p_las2las Implementation p_las2las Implementation p_las2las Results p_las2las Results p_points2grid Implementation p_points2grid Implementation p_points2grid Results p_points2grid Results Literature Literature Questions Questions

4 Objective Problem: Create high resolution DEMs from large lidar datasets Approach: Modify open source software to run in parallel and test on XSEDE supercomputers (clusters)

5 XSEDE – Extreme Science and Engineering Development Environment A 5-year 127M (1.27 x 10 8 ) dollar project funded by NSF Supports 16 supercomputers and high-end visualization and data analysis resources One of those is “Stampede” at Texas Advanced Computing Center (TACC) Accessed through University of Illinois (Liu Co-PI) allocations Current (yearly) allocation 8 million (8.0 x 10 6 ) computing hours Accessed through USGS “Campus Champion” allocations

6 Stampede Peak Performance (tflops): 9,600 Number of Cores: 522,080 Memory (TB): 270 Storage (PB): 14 https://portal.xsede.org/tacc-stampede

7 What is a “Supercomputer” Anyway? Collection of nodes (Xeon X5 based computers) Networked together (Infiniband) Running a common OS (Linux) Shared parallel file system (Lustre) Running a scheduler (SLURM) An inter-process communication (IPC) mechanism (MPI: MPI_Send, MPI_Recv, MPI_File_write_at... ) A good example is Stampede

8 {

9 Great Smoky Mtn. Study Area

10 Grand Canyon Study Area

11 Create high-resolution DEMs from large lidar datasets DEM resolution of 1 meter Lidar datasets with coverage over about a 15x15 minute footprint This results in a raster size of approximately 27,000 rows X 27,000 columns, about 750 million cells Obtained two datasets in LAS 1.2 format A 16 GB file with 500 million (5.7 x 10 8 ) points in the Smoky Mountains over a 40,000 X 20,000 meter area A 120 GB file with 4 billion (4.2 x 10 9 ) points over the Grand Canyon over a 25,000 X 30,000 meter area Both files are somewhat sparse in that some “tiles” within the coverage are missing

12 Modify open source software to run in parallel (and test on XSEDE clusters) las2las from the lastools suite to filter all but ground points points2grid to make the DEM from the filtered LAS file Test on Stampede, at TACC

13 p_las2las Implementation las2las application and supporting LASlib library were extended with the MPI API to allow the application to be run in parallel on a cluster Goal was an application that would scale to arbitrarily large input Limited only by the amount of disk space needed to store the input and output file No intermediate files are generated and individual process memory requirements are not determined by the size of the input or output

14 p_las2las Implementation Comparison of Native las2las versus p_las2las Native las2las algorithm: For all points: Read the point Apply a filter and/or transformation Write the possibly transformed point if it passes the filter p_las2las algorithm: Processes determine point range and set input offsets For all points in a process's point range: Read point and apply filter Keep count of points that pass filter Processes gather filtered point counts from other processes Processes can then set write offsets Processes set read offsets back to beginning point, begin second read for all points: Read point and apply filter and transformation Write the possibly transformed point if it passes the filter Gather and reduce point counts, return counts, min and max x values Update the header with rank 0 process

15 p_las2las Implementation The high level view of the p_las2las application: The vertical flow describes the job flow While the processes across the top are run in parallel on the flow

16 p_las2las Results Smoky Mountains (16 GB) * Native unmodified las2las source code from LASTools compiled on Stampede with the Intel C++ compiler Number of Processes Filter / Transformation Output Size Elapsed Time (seconds) Native*None16 GB138 Native*Keep Class 22 GB73 Native*Reproject16 GB502 64None16 GB20 64Keep Class 22 GB6 64Reproject16 GB26 256None16 GB8 256Keep Class 22 GB4 256Reproject16 GB9 1024None16 GB8 1024Keep Class 22 GB5 1024Reproject16 GB8

17 p_las2las Results Grand Canyon (120 GB) p_las2las Results Grand Canyon (120 GB) * Native unmodified las2las source code from LASTools compiled on Stampede with the Intel C++ compiler Number of Processes Filter / Transformation Output Size Elapsed Time (seconds) Native*None120 GB1211 Native *Keep Class 225 GB623 Native*Reproject120 GB6969 64None120 GB128 64Keep Class 225 GB59 64Reproject120 GB150 256None120 GB33 256Keep Class 225 GB18 256Reproject120 GB42 1024None120 GB18 1024Keep Class 225 GB9 1024Reproject120 GB24

18 p_points2grid Implementation points2grid application was extended with the MPI API to allow the application to be run in parallel on a cluster Goal was an application that would scale to arbitrarily large input Limited only by the amount of disk space needed to store the input and output file and the number of processes available No intermediate files are generated and individual process memory needs are not determined by the size of the input or output

19 p_points2grid Implementation Comparison of Native points2grid versus p_points2grid Native points2grid algorithm: For each point: Update output raster cells when the point falls within a circle defined by the cell corner and a given radius Optionally fill null cells with adjacent cell values Write the output raster cell p_points2grid algorithm: Processes are designated as reader or writer Reader processes are assigned a range of points Writer processes are assigned a range of rows Reader processes read LAS points from the input file and sends them to the appropriate writer processes based on whether the point falls within a circle defined by the cell corner and a given radius Writer processes receive LAS points from reader processes and update cell contents with elevation values. When all points have been sent and received, Writer processes apply an optional window filling parameter to fill null values ( This involves writer to writer communication when the window size overlaps two writers) Writer processes determine and set write offsets and write their range of rows to the output file. The first writer rank is responsible for writing the output file header and in the case of TIFF output, the TIFF directory contents

20 p_points2grid Implementation The high level view of the p_points2grid application The job flow is described by the boxes on the right side While the processes along the left are the internal processes of the flow functions

21 p_points2grid Results Smoky Mountains (16 GB) p_points2grid Results Smoky Mountains (16 GB) 12, 1-meter resolution DEMs totaling 70 GB of output for p_points2grid runs 12, 6-meter resolution DEMs totaling 2 GB of output for native run Number of Processes Number of Readers Number of Writers Time: Reading, Communication Time: Writing Elapsed Time (seconds) Native11NA 328 12832963356105 12864 2684125 51232480201340 51264448101733 51212838482336 512256 72640 512384128114468 102464940101132 102438464021429 102476825662846

22 p_points2grid Results Grand Canyon (120 GB) p_points2grid Results Grand Canyon (120 GB) * Native unmodified las2las source code from LASTools compiled on Stampede with the Intel C++ compiler Number of Processes Number of Readers Number of Writers Time: Reading, Communication Time: Writing Elapsed Time (seconds) Native11NA 1548 5126444810415135 512128384552190 512256 6030110 102464960807101 1024128896391162 102425676827851 1024384640241553 1024512 261956 1024768256472490 10248961288944167 40962563840171063 40965123584101153 4096102430728846 40962048 81876 12, 1-meter resolution DEMs totaling 71 GB of output for p_points2grid runs 12, 6-meter resolution DEMs totaling 2 GB of output for native run

23 Conclusions Demonstration of novel solution to the handling, exploitation, and processing of enormous lidar datasets Especially at a time when their availability is increasing rapidly in the natural sciences Expansion of existing tools that now run in parallel processing modes Improves upon previous attempts to exploit lidar data in the parallel processing environment by using MapReduce Creating parallel processing algorithms based on the open source las2las and points2grid code bases Greatly reduced run times processing extremely large lidar point cloud datasets Over 100 GB in file size Both in classifying the points and in generating DEMs p_las2las and p_points2grid provides approximately two or more orders of magnitude reduction in processing time Demonstrated scalability up to 4,096 processes

24 References The Apache Software Foundation (2014) Welcome to Apache™ Hadoop®! Internet at: http://hadoop.apache.org. Last accessed 24 November 2014. Arrowsmith, J.R., N. Glenn, C. J. Crosby, and E. Cowgill (2008) Current Capabilities and Community Needs for Software Tools and Educational Resources for Use with LiDAR High Resolution Topography Data. Proceedings of the OpenTopography Meeting held in San Diego, California, on August 8, 2008. San Diego: San Diego Supercomputer Center. ASPRS (American Society for Photogrammetry and Remote Sensing) (2008) LAS Specification, Version 1.2. Internet at http://www.asprs.org/a/society/committees/standards/asprs_las_format_v12.pdf. Last accessed 24 November 2014. ASPRS (American Society for Photogrammetry and Remote Sensing) (2011) LASer (LAS) File Format Exchance Activities. Internet at http://www.asprs.org/Committee-General/LASer-LAS-File-Format- Exchange-Activities.html. Last accessed 05 March 2015.http://www.asprs.org/Committee-General/LASer-LAS-File-Format- Behzad, B., Y. Liu, E.Shook, M. P. Finn, D. M. Mattli, and S. Wang (2012). A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data. Abstract presented at the Auto-Carto 2012, A Cartography and Geographic Information Society Research Symposium, Columbus, OH. CGR (Center for Geospatial Research), University of Georgia, Athens, GA. Internet at: http://www.crms.uga.edu/wordpress/?page_id=1212. Last accessed 05 March 2015. Dean, J., and S. Ghemawat (2004) MapReduce: Simplified Data Processing on Large Clusters. Proceedings of OSDI ’04: 6th Symposium on Operating System Design and Implementation, San Francisco, CA, Dec. 2004. Dewberry (2011) Final Report of the National Enhanced Elevation Assessment (revised 2012). Fairfax, Va., Dewberry, 84p. plus appendixes. Internet at http://www.dewberry.com/services/geospatial/national- enhanced-elevation-assessment. Last accessed 24 November 2014.http://www.dewberry.com/services/geospatial/national- Factor, M., K. Meth, D. Naor, O. Rodeh, and J. Satra (2005) Object storage: the future building block for storage systems. In LGDI ’05: Proceedings of the 2005 IEEE International Symposium on Mass Storage Systems and Technology, pages 119–123, Washington, DC, USA. IEEE Computer Society. Finn, Michael P., Yan Liu, David M. Mattli, Babak Behzad, Kristina H. Yamamoto, Qingfeng (Gene) Guan, Eric Shook, Anand Padmanabhan, Michael Stramel, and Shaowen Wang (2015). High-Performance Small- Scale Raster Map Projection Transformation on Cyberinfrastructure. Paper accepted for publication as a chapter in CyberGIS: Fostering a New Wave of Geospatial Discovery and Innovation, Shaowen Wang and Michael F. Goodchild, editors. Springer-Verlag. Isenburg, Martin (2014) lasmerge: Merge Multiple LAS Files into a Single File. Internet at http://www.liblas.org/utilities/lasmerge.html. Last accessed 03 March 2015. Kosovich, John J. (2014). Vertical Forest Structure from Lidar Point-cloud Data for the Tennessee Portion of Great Smoky Mountains National Park. Abstract presented at the 2014 International Lidar Mapping Forum, Denver, CO. Internet at: http://www.lidarmap.org/international/images/conference/Program/ILMF_2014_Grid_with_Abstracts.pdf. Krishnan, Sriram, Chaitanya Baru, and Christopher Crosby (2010). Evaluation of MapReduce for Gridding LIDAR Data. 2nd IEEE Internation Conference on Cloud Computing Technology and Science. Piernas, J., J. Nieplocha, and E. Felix (2007). Evaluation of active storage strategies for the lustre parallel file system. Proceedings of the ACM/IEEE Conference on Supercomputing. ACM, New York,. rapidlasso GmbH (2014) Lastools. Internet at http://rapidlasso.com/lastools/. Last accessed 24 November 2014. Rose, Eli T., John J. Kosovich, Alexa J. McKerrow, and Theodore R. Simons (2014). Characterizing Vegetation Structure in Recently Burned Forests of the Great Smoky Mountains National Park. Abstract presented at the ASPRS 2014 Annual Conference, Louisville, KY. Internet at: http://conferences.asprs.org/Louisville-2014/T1-Mar-26-3-30. Sakr, Sherif, Anna Liu, and Ayman G. Fayoumi (2014). MapReduce Family of Large-Scale Data-Processing Systems. Chapter 2 in Large Scale and Big Data, Sherif Sakr and Mohamed Medhat Gaver, editors. CRC Press. Towns, John, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor Hazlewood, Scott Lathrop, Dave Lifka, Gregory D. Peterson, Ralph Roskies, J. Ray Scott, and Nancy Wilkins- Diehr (2014) XSEDE: Accelerating Scientific Discovery, Computing in Science & Engineering, vol.16, no. 5, pp. 62-74, Sept.-Oct., doi:10.1109/MCSE.2014.80 US Army Corps of Engineers (2014). CRREL/points2grid. Internet at: https://github.com/CRREL/points2grid. Yoo, Andy B., Morris A. Jette, and Mark Grondona (2003). SLURM: Simple Linux utility for resource management. In Job Scheduling Strategies for Parallel Processing (pp. 44-60). Springer Berlin Heidelberg.

25 { U.S. Department of the Interior U.S. Geological Survey Michael P. Finn Briefing to a pre-conference workshop of the 27th International Cartographic Conference: Spatial data infrastructures, standards, open source and open data for geospatial (SDI-Open 2015) 20-21 August 2015, Brazilian Institute of Geography and Statistics (IBGE), Rio de Janeiro, Brazil Using the Message Passing Interface (MPI) and Parallel File Systems for Processing Large Files in the LAS Format Questions?

26 Backup slides

27 { p_las2las Implementation Detailed Explanation of the p_las2las Implementation Each process opens the LAS input file and reads the file header to determine the number of points in the input file, point size, and size of the header. Based on the point count, process rank and process count, each process calculates the range of LAS points for which it will be responsible. Since point and the header size are known, each process can calculate and set its file pointer to its beginning point. Each process then reads each point in its range and applies any filter passed to the program, keeping a count of points that pass the filter. After reading and filtering the last point, all processes gather from one another the number of points that have passed the filter, and thus will be writing. Each process uses the results from this gather and its rank order to calculate and set its output file pointer. Each process then sets its read pointer back to the beginning of its range of points. A second read and filtering of its point range begins, but this time the points that pass the filter are written to the output file. It is this second read pass that allows the program to scale to arbitrary input and output size without allocating extra memory or writing temporary files. The process with rank 0 is charged with writing the output file header with data gathered from the input header, and gathering and reducing process dependent data such as minx, maxx, miny, maxy, minz, max from the other processes. To minimize the number of calls to MPI_File_write, each process allocates a buffer of configurable size and only calls MPI_File_write when its buffer is full, along with a final flush to disk after the last point is processed.

28 { p_points2grid Implementation Detailed Explanation of the p_points2grid Implementation Initialization Each reader process allocates a LAS point buffer for each writer process. These buffers are necessary to keep process-to-process communication at reasonable levels, since without them an MPI send/receive would occur with every LAS point. The size of these buffers is dependent on the writer count and is calculated and capped at run time so as not to exceed available memory. Each writer process allocates memory to hold the grid cell values for the rows for which it is responsible. The row count is determined by the number of rows in the grid divided by the writer process count. This introduces a memory dependency that our current implementation does not address. As a practical matter, the number of writer processes can be increased to address this limitation. Each writer process also allocates write buffers of configurable size to limit the number of calls to MPI_write. When a window filling parameter is specified, writer processes allocate and fill two dimensional raster cell buffers of up to three rows before and after their range. This is necessary to keep process-to-process communication at reasonable levels. Reading and Communication: Each reader process calculates and sets its input file pointer to the the beginning of its range of points. It then reads each point and determines which raster cells have overlap with the point and a circle defined by the cell corner and a radius given either by default or as a program input parameter. For each overlap, the point is added to the appropriate LAS point buffer. That is, the buffer corresponding to the writer responsible for processing that cell. When a buffer fills, it is sent with a MPI_Send to the appropriate writer. The writer receives the buffer with MPI_Recv and updates its raster cell values with the point buffer data. When a reader process completes, it flushes its point buffers to all writers one last time. Writer Processing: Once all points have been received by the readers, Each writer iterates over their raster cells and calculates mean and standard deviation values. If a window size parameter has been passed, each writer iterates over its cells and attempts to fill null cell values with weighted averages of values from adjacent cells up to three cells away. When cells fall near a writer's beginning or ending row, these values are retrieved from the rows of adjacent writer processes. Writing: Each writer first determines the total number of bytes it will write for each DEM output type and each raster cell type, Since the output types supported are ASCII text, each writer must iterate over its cell values and sum their output length. Once all writer's have determined their output length counts, each gathers these counts from all other writer processes and uses these counts along with rank order and header size to set output file pointer positions. Each process then iterates over its values again, but this time writes the ASCII format of the value to a buffer. When the buffer fills it is written to disk with MPI_File_write. The first writer process is responsible for writing the file header.

29


Download ppt "{ U.S. Department of the Interior U.S. Geological Survey Michael P. Finn Briefing to a pre-conference workshop of the 27th International Cartographic Conference:"

Similar presentations


Ads by Google