Leila Talebi, Anika Kuczynski, Andrew Graettinger, and Robert Pitt Application of Image Processing and LiDAR Data in Stormwater Management Leila Talebi, Anika Kuczynski, Andrew Graettinger, and Robert Pitt Civil, Construction and Environmental Engineering Department, The University of Alabama, Tuscaloosa, Alabama
Why is urban land classification necessary? Water quality issues As cities grow and more development occurs, the natural landscape is replaced by roads, buildings, housing developments, and parking lots. Impervious areas are surfaces with low or no capacity for infiltration. Examples of impervious surfaces include concrete, pavement (roads and parking lots), building rooftops, and railroads. Pervious areas are permeable and allow for infiltration. Impervious areas are impermeable and include undeveloped land with vegetation cover such as grass or trees. Sediment-laden water from a construction site entering the Chattahoochee River, just west of Atlanta. It is important to distinguish between and define impervious and pervious surfaces to determine effects on water quality, infiltration rates, and stormwater runoff volumes. Impervious vs. pervious surfaces http://ga.water.usgs.gov/
WinSLAMM (Pitt and Voorhees, 1995) WinSLAMM: Source Loading and Management Model for Windows Estimate runoff quantity and quality Design stormwater controls Stormwater quality control examples include street cleaning, storm drain inlets with filters, infiltration, stormwater filtration, catchbasin cleaning, wet detention ponds, grass swales and filters, and porous pavements.
Objective Create a tool to facilitate urban surface classification, while maintaining reasonable accuracy and decreasing required analysis time. ArcGIS tool creation (ModelBuilder) Outcome Urban Classifications: Roofs Parking lots Streets Pervious areas The most exact method of classification is human analysis of an image in that different types of surfaces are visually interpreted and manually delineated based on color, size, texture, geometric shape, and feature context. This process takes a lot of time. We are using ArcGIS software to process data because it has a wide range of inbuilt image and geographic data analysis and editing functions.
Image Types Satellite images at high (1-4 m) and low resolution Landsat TM/ETM/ETM+, IKONOS, SPOT, Quickbird Aerial photos In reviewing the literature, we found that many different types of satellite images as well as aerial photos have been used to identify impervious areas. Some examples include the Landsat Thematic Mapper/Extended Thematic Mapper (Plus), IKONOS, SPOT, and Quickbird images at various resolutions. Landsat Chesapeake Bay area Photo Credit: NASA
Digital Image Processing Per-pixel classifiers Maximum likelihood classifier Nearest neighbor classification Object-based algorithms Artificial neural networks (ANN) Classification and regression tree (CART) algorithms/decision tree learning Normalized Difference Vegetation Index (NDVI) Support Vector Machines (SVM) Maximum likelihood classification assumes that cell values in a training sample are normally distributed and assigns a class value to each pixel based on the computed statistical probability. Artificial neural networks are based on taking a certain set of spectral input data from an image, applying a set of established rules, and returning output data assigning a level of imperviousness to each pixel. Classification and regression tree (CART) analysis is a form of machine learning that automatically establishes a set of rules concerning patterns and spectral response thresholds based on probability and statistics. The NDVI indicates vegetation cover in an image. Plants readily absorb visible light (400 to 900 nm wavelength) for photosynthesis but strongly reflect near-IR light (700 to 1100 nm)
Color Variables Combine RGB, YCbCr, and HSI models In addition to RGB, use: Luminance: Y Hue: H Saturation: S Intensity: I Advantage: more variables available for image processing more accurate result This approach is based on incorporating several different color models into the image processing aspect of the project. The available RGB bands from an image are used to calculate several more variables coming from the YCbCr model (commonly used as digital video formats) and the HSV model. Y represents the luminance (measurement unit of the energy amount that an observer perceives from a light source), Cb represents the difference between the B component and a reference value, and Cr represents the difference between the R component and a reference value. In the HSV model, H represents hue (a color attribute describing a pure color like yellow, orange, or red), S represents saturation (a measurement of the degree to which a pure color is diluted by white light), and I stands for intensity (average value of red, green, and blue at a specific location). (Rottensteiner et al., 2002)
Image Processing Problems Similar spectral responses: Roof and pavement Soil and concrete or asphalt Tree coverage Shadows Viewing angles Note: The tree coverage problem may be overcome for streets by adding transportation vector data.
LiDAR Digital Surface Model (DSM): generated directly from reflective LiDAR points (blue color in the Figure) Digital Terrain Model (DTM): depicts the pure terrain surface, filtering functions are applied to remove surface objects (red color in the Figure) NDSM = DSM – DTM LiDAR stands for Light Detection and Ranging and is an optical remote sensing technology that uses light pulses to measure the distance to objects. It has become especially popular in building extraction and in defining road networks. The nDSM is the difference model between the DSM and the DTM and provides an initial buildings and trees segmentation from the terrain surface. All objects in the nDSM stand on the surface and the influence of terrain topography is excluded.
Methodology Data: Aerial photo LiDAR Centerline of streets Solution shapefiles for UA campus Streets, parking, buildings Approach ArcGIS 10.0 tools Partially automated image processing of nDSM and color band rasters
Process This process flow diagram shows the essentially components and sequence of our approach. We use LiDAR and an aerial photo as the main inputs to create a nDSM and a saturation raster. These are analyzed using ArcGIS tools to classify roof.
ArcGIS Tools Raster Calculator Slope thresholds Curvature SetNull and IsNull ZonalGeometry Buffer Generalize Polygon Aggregate Polygon While/For loops The “solution” that we are comparing our pavement and building/roof classification to was obtained from available campus GIS data (streets shapefile of centerlines and edge of pavement, parking polygons, building polygons).
Case Study Institutional area Residential area UA campus south of University Blvd. 46.15 ac Residential area Off campus, Tuscaloosa 13th St (north), 10th Ave (east), Elmwood Dr. (south) 11.15 ac For our case study, we ran our model on two different areas using the same aerial photo and type of LiDAR data. The institutional area that we analyzed covers about 46 acres of campus and is located south of University Blvd (includes Rose Administration Building and the President’s Mansion). The residential area analyzed covers about 11 acres and is bounded by 13th St to the north, 10th Ave to the east, and Elmwood Dr to the south.
Institutional Area Aerial photo Residential Area
Model Parameters Threshold Institutional Residential Saturation 20 18 Slope 70 40 Curvature 120 30 Area 100 Thickness 4 3 Buffer The variables shown in this table are model parameters that were adjusted for each type of area of interest (institutional and residential) to yield better results. When a user applies our model, he should only have to provide the model input files (RGB bands of photo, DTM, DSM, centerlines) and then run the model with these recommended parameter values before making adjustments. The saturation value is adjusted to classify pavement (street or parking) and changes just a little bit. The slope raster results from taking the first derivative of the nDSM and the curvature raster represents the second derivative (the slope of the slope) of the nDSM. These are used to identify roof. The thickness and buffer variables are also adjusted to determine roof area. Thickness refers to the maximum width of any polygon and the buffer is necessary to “regrow” the roof polygons after previous analysis steps that shrink the polygons.
Roof Classification Once the nDSM is created, the model applies several ArcGIS tools to the nDSM raster. Applying a height threshold, the model classifies regions with tall obstructions (e.g. buildings) as above ground (potential roof) and regions of lower obstructions (e.g. cars) as ground level. An nDSM threshold of 1.8 m (6 ft) that could be adjusted by the user, was determined through iteration to provide the best binary separation height between above-ground and ground-level pixels. The height threshold value of 1.8 m (6ft) is smaller than the average height of a one story building and results in removing objects with lower heights such as cars and small bushes. Using LiDAR data, the Slope and Curvature tools in ArcGIS 10.0 determine the gradient of the z-value and curvature from cell to cell. Slope is a measure of change in surface value over distance. The slope of a cell in a raster is the maximum rate of change between the cell and its neighbors. The curvature is the second derivative of a surface which is also known as the slope of the slope. These outputs are combined with saturation to separate trees from roofs. To classify roof areas, the model creates a raster representing saturation. The Raster Calculator returns saturation (S) values from the RGB bands according to Equation 2 (Bong et al., 2009): [2] where R, G, and B are red, green, and blue bands, respectively, with normalized values ranging from 0 to 1. The saturation (S) values in this study range from 0 to 256. The Slope and Curvature tool outputs in combination with the saturation values help separate trees and roof because trees have a rough texture and much lower saturation values than roofs. Multiplication of the output raster that distinguishes trees (rough texture) from roofs (smooth texture) by the nDSM threshold raster that separates ground-level elevations from above-ground elevations results in a preliminary roof classification. The preliminary roof classification raster is converted to a polygon shapefile to be processed further using different available ArcGIS tools to manipulate and analyze feature classes. A variety of conversion tools (SetNull, IsNull, Zonal Geometry (area threshold and thickness threshold), Buffer, Generalize Polygon, Aggregate Polygon, and Polygon to Raster) produce the final roof area. While tall objects, like large trucks, could be a source of error as being classified as above ground in the first step, the area threshold excludes small areas, also eliminating stand-alone trees. Roofs that are entirely covered by trees cannot be identified, but roofs that are only partially covered by trees are correctly identified because the Buffer tool expands the roof area and Generalize Polygon tool smooths the roof area close to the actual geometric shape and size. In cases of greater tree coverage (greater than about 10% of the actual roof area), manual corrections are necessary. The result of these analysis steps is a polygon feature class with well-defined impervious roof areas.
Pavement Classification At the end of roof classification, the roof polygon feature class is converted back into a raster for further analysis. Pavement areas were obtained by adding the darker saturation raster areas to the buffered centerline areas and subtracting the previously defined roof areas. Note that by using the freely available TIGER centerline feature class as an input data source, tree coverage of streets is not a source of error as in the case of sole image processing.
Institutional Area Solution by Manual Delineation Model Output The “solution” that we are comparing our pavement and building/roof classification to was obtained from available campus GIS data (streets shapefile of centerlines and edge of pavement, parking polygons, building polygons) and manual delineation. Note that the parking lot area is underestimated due to some tree coverage and because parked cars are misclassified as landscape. The LiDAR data also does not align perfectly with the areal photo, causing the misclassification of pixels as parking around the edges of buildings. Model Output
Residential Area Solution by Manual Delineation Model Output The “solution” that we are comparing model classification output to was obtained from available campus GIS data (streets shapefile of centerlines and edge of pavement, parking polygons, building polygons) and manual delineation. Note that parking areas were underestimated. Model Output
Accuracy Assessment Overall accuracy : dividing the total number of correct pixels (the sum of the major diagonal) by the total number of pixels in the error matrix (Congalton, 1991). User’s accuracy : is a “measure of commission error” which is the probability that a pixel classified on the map correctly corresponds to the same category on the reference Producer's accuracy : is a “measure of emission” which represents the probability that a reference pixel is being correctly classified (Story and Congalton, 1986).
Error Matrix and Accuracy Assessment for Institutional Area Classification from manual delineation User's accuracy (%) Roof Street Parking Pervious Total Classification from model 1,384,123 109 6,228 134,441 1,524,900 90.8 4,356 820,779 17,435 70,199 912,768 89.9 56,473 47,385 16,95,687 170,218 1,969,764 86.1 83,173 49,703 280,406 3,221,886 3,635,168 88.6 1,528,124 917,976 1,999,756 3,596,744 8,042,600 Producer's accuracy (%) 90.6 89.4 84.8 89.6 Overall accuracy: 88.6% Matrix entries represent number of pixels. The model output consists of a raster defining the four general land surface categories: roofs, streets, parking areas, and pervious areas. For each category, areas were obtained from the attribute table and compared to corresponding areas found by manual delineation, as shown in Figure 5. For each land use, we compared the model classification results with the manual delineation measurements (as reference data) to assess the overall accuracy, as well as user’s- and producer’s accuracy. The accuracy assessment results for the institutional and residential areas are shown in Table 2 and Table 3 respectively. The overall classification accuracy is 89% for the institutional case study area, while noting that accuracy is lower (81%) for the residential area. The user’s and producer’s accuracies for roof areas are high, especially for the institutional area with taller/larger buildings (~ 90%) compared to the residential area (~80%) due to less tree coverage of roofs in the institutional area. As shown in Tables 2 and 3, street and pervious areas have good to high user’s and producer’s accuracies ranging from 77% to 90% in both land use studies. Parking area has the lowest user’s and producer’s accuracies among all the classification categories for both institutional (86% and 85% respectively) and residential (71% and 69% respectively). One major reason for reduced accuracy is that the nDSM and saturation rasters did not align exactly (LiDAR data is for 2005, while the aerial photo is for 2004). This, in turn, resulted in misclassification of roof edges as parking area which reduces the user’s accuracy. Another reason could be similar spectral variations among parking lots and some pervious areas with bare soil cover. Therefore, the model had a lower accuracy when detecting parking areas (low producer’s accuracy). Table 4 presents the compared area results and percent errors. Roof areas were slightly underestimated because of misclassification of roof edges as parking area. In defining street areas, the model overestimated the area in the institutional case (5.2%) and slightly underestimated the area in the residential case (-0.6%). Note that in the residential case, the model defined unpaved areas covered in gravel as pervious (although due to compacted conditions, gravel areas function hydraulically similarly to impervious areas). In all categories, errors were less than 6%, resulting in negligible area differences when used as input for calculations in stormwater models. This indicates adequate accuracy of the model results. Time savings were significant in running the model compared to manual delineation of roof, street, and parking areas, using a computer with the following specifications: 32-bit operating system, Intel(R) Core (TM) 2 Duo CPU, and 4.00 GB RAM memory. In the institutional case (~ 182,100 m2 or 45 ac), the model was prepared with input data and parameters in less than 2 minutes and ran for approximately 10 minutes, while manual delineation took approximately 80 minutes; thus, applying the model resulted in time savings of more than 80%. In the residential case (~ 44,500 m2 or 11 ac), model preparation (excluding LiDAR processing) again required less than 2 minutes and the model ran for approximately 2 minutes, while manual delineation took approximately 30 minutes; thus, applying the model resulted in time savings of more than 90%. These percent time savings are case specific and one has to consider that the user may have to make model parameter adjustments and run the model several times for adequate results. However, even an inexperienced user who has to determine case specific threshold values and run the model two or three times can expect to save a significant amount of time (> 40%).
Error Matrix and Accuracy Assessment for the Residential Area Classification from manual delineation User's accuracy (%) Roof Street Parking Pervious Total Classification from model 245,942 2,870 2,112 53,927 304,852 80.7 359,605 14,825 92,066 466,496 77.1 33,428 4,013 146,311 21,719 205,472 71.2 27,850 76,980 47,488 815,143 967,460 84.3 307,220 443,468 210,736 982,856 1,944,280 Producer's accuracy (%) 80.1 81.1 69.4 82.9 Overall accuracy: 80.6% Matrix entries represent number of pixels. The model output consists of a raster defining the four general land surface categories: roofs, streets, parking areas, and pervious areas. For each category, areas were obtained from the attribute table and compared to corresponding areas found by manual delineation, as shown in Figure 5. For each land use, we compared the model classification results with the manual delineation measurements (as reference data) to assess the overall accuracy, as well as user’s- and producer’s accuracy. The accuracy assessment results for the institutional and residential areas are shown in Table 2 and Table 3 respectively. The overall classification accuracy is 89% for the institutional case study area, while noting that accuracy is lower (81%) for the residential area. The user’s and producer’s accuracies for roof areas are high, especially for the institutional area with taller/larger buildings (~ 90%) compared to the residential area (~80%) due to less tree coverage of roofs in the institutional area. As shown in Tables 2 and 3, street and pervious areas have good to high user’s and producer’s accuracies ranging from 77% to 90% in both land use studies. Parking area has the lowest user’s and producer’s accuracies among all the classification categories for both institutional (86% and 85% respectively) and residential (71% and 69% respectively). One major reason for reduced accuracy is that the nDSM and saturation rasters did not align exactly (LiDAR data is for 2005, while the aerial photo is for 2004). This, in turn, resulted in misclassification of roof edges as parking area which reduces the user’s accuracy. Another reason could be similar spectral variations among parking lots and some pervious areas with bare soil cover. Therefore, the model had a lower accuracy when detecting parking areas (low producer’s accuracy). Table 4 presents the compared area results and percent errors. Roof areas were slightly underestimated because of misclassification of roof edges as parking area. In defining street areas, the model overestimated the area in the institutional case (5.2%) and slightly underestimated the area in the residential case (-0.6%). Note that in the residential case, the model defined unpaved areas covered in gravel as pervious (although due to compacted conditions, gravel areas function hydraulically similarly to impervious areas). In all categories, errors were less than 6%, resulting in negligible area differences when used as input for calculations in stormwater models. This indicates adequate accuracy of the model results. Time savings were significant in running the model compared to manual delineation of roof, street, and parking areas, using a computer with the following specifications: 32-bit operating system, Intel(R) Core (TM) 2 Duo CPU, and 4.00 GB RAM memory. In the institutional case (~ 182,100 m2 or 45 ac), the model was prepared with input data and parameters in less than 2 minutes and ran for approximately 10 minutes, while manual delineation took approximately 80 minutes; thus, applying the model resulted in time savings of more than 80%. In the residential case (~ 44,500 m2 or 11 ac), model preparation (excluding LiDAR processing) again required less than 2 minutes and the model ran for approximately 2 minutes, while manual delineation took approximately 30 minutes; thus, applying the model resulted in time savings of more than 90%. These percent time savings are case specific and one has to consider that the user may have to make model parameter adjustments and run the model several times for adequate results. However, even an inexperienced user who has to determine case specific threshold values and run the model two or three times can expect to save a significant amount of time (> 40%).
Results
Conclusions Land use classification is necessary to accurately calculate water quality and runoff volumes. This work presented an approach to classify land use as roofs, streets, parking lots, and pervious areas based on analysis of LiDAR data, aerial photographs, and TIGER line data using ArcGIS 10.0 tools in a ModelBuilder program. Two case studies in Tuscaloosa, Alabama, including an institutional land use, and a residential land use. The accuracy assessment shows high value of overall accuracy for both land uses; 89% and 81% for the institutional and residential land use test areas respectively.
Conclusions The comparison of output areas for each category (roofs, streets, parking lots, pervious areas) to known areas (manually delineated) showed highest result accuracy for roof areas. Therefore, this model is very suitable for determining roof areas for designing cisterns and drywells for roof runoff stormwater harvesting systems. Although the other three area estimates are less accurate than the roof result, they are sufficiently accurate (< 6% error) for most preliminary design purposes.