Integrating Online Geocoding Sources into Web-Based Household Travel Surveys
Introduction Most samples in Household Travel Surveys (HTS) complete via web Geocoding is an important element in HTS collection Online geocoding services are commonly used in web-based instruments Google Maps (geocoding) + Google Places (point of interest – POI) Bing Maps General assumption that these services are equivalent to traditional desktop GIS geocoding Including offsetting addresses from centerline… Intro / background / objective
Household Travel Survey User Needs Origin Destination
The Typical Travel Survey Geocoding Process Web survey instrument uses Google APIs for geocoding locations Searches done using geocoding and POI services Real-time response Familiar user interface (for many) High Precision Participant’s encouraged to provide nearest cross-streets or better Ability to drag location marker and click on map Quality-control Automated checks verify minimum geocode type (address, intersection, POI) Consistency in travel reviewed using speed checks Regular client deliverables and review Discuss typical front end and back end processes for acquiring places, geocoding locations, and quality-control
2015 MDOT Statewide (MTC III) + SEMCOG HTS Address-based sample Invitations delivered by USPS Focus on web self-completion (MITravelCounts.com) WebGeoSurvey + TripBuilder Web (online instrument) Telephone reporting and support available via toll-free number Statewide Survey - additional interest from SEMCOG region 3rd Iteration (previous efforts in 2004 and 2009) Very peculiar geocoding requirements in RFP SEMCOG addresses had to be matched to a point address file Minimum MDOT Statewide model network offsets (25’) Discuss size and significance of MI travel survey, the multiple interests/parties/GIS-groups (and the effects that had on our own review and processes)
Study Area MDOT SEMCOG Continue discussion from previous slide
SEMCOG Point Address Matching Loaded point addresses into PostgreSQL table (~1.7 million) Created indexed geometries using original coordinates Exposed geocoding service via a web-service to online tools Developed match query in PostgreSQL that matched online geocodes to point addresses in post-processing Used built-in address parsing and spatial data extensions Matched using location and address components (fuzzy street name match and number within 225’) Used functions in the tiger and postgis extensions of PostgreSQL
MDOT Model Network Distance Requirement Compared geocode locations against MDOT network to look for cases where the geocode fell closer than 25’ of a link Our expectation was that most cases would fall around this distance or, at least at some consistent offset to the correct side of the street segments which we could then extend to meet minimum distance requirements Not exactly what we found…
Pilot Study Results Geocode Distance to MDOT Statewide Model Network
Network distances – What we found (1/4) Google Maps does not really offset results from the centerline for address matches that don’t use parcel data The geocode_type we were saving did not include enough information to determine if the coordinates were offset from centerline. A new variable called location_type was added rooftop* range_interpolated approximate geometric_center The process used by online instruments tools to augment POI results with address components involved re-geocoding them, which replaced the original coordinates with the ones returned by the Google geocoder.
Network distances – What we found (2/4) Out of about 3,000 geocodes in the pilot, close to 400 fell within 25’ of a link (excludes home locations and ”intersection” geocodes) The data on the SEMCOG region was not as affected because we were already matching geocodes to point addresses What was Google doing? We re-ran the delivery addresses that were not home nor intersections through the Google geocoder to get its indicator of location_type (not captured in the pilot). This data element told us how the coordinate was obtained. Not available on POI results, only geocode results. Distances to resulting coordinates were then measured against the MDOT network. For comparison, we ran those same addresses through the Bing Maps geocoder service. Now let’s look at some pretty charts…
Network distances – What we found (3/4) Google distance to MDOT Network
Network distances – What we found (4/4) Bing distance to MDOT Network
Post-Pilot Process Improvements (1/2) Changed online tools so that original offset coordinates from POI results were preserved. Added geometry location type to saved geocode attributes.
Post-Pilot Process Improvements (2/2) Created new process that re-geocoded Google results using Bing Target location type of “range_interpolated” Also re-geocoded locations that are not “intersection” nor home that fall within 25’ of the MDOT road network Only replaced geocodes with Bing results if they fell close to the original (within 75’) Added checks that identified cases for review using ancillary data sources Expected only a small percentage (~5%) of Bing re-geocoded coordinates may need manual review and adjustment. And now, more pretty charts… Google Bing Analyst
Main Study Adjustments - Examples
Main Study Adjustments - Examples Discuss the pilot effort and results with graphics
Network-aware Geocode Auditing
Discuss the pilot effort and results with graphics
Discuss the pilot effort and results with graphics
Discuss the pilot effort and results with graphics
Final Remarks Using PostgreSQL made it easy to automate geocode checks Address parsing (tiger extension) Fuzzy string matching ( Spatial matching to point addresses Distance to network There is a need to check Google’s returned location types If it interpolated coordinates along its centerlines it likely did not offset them Combining the strengths of multiple geocoding sources in order to maximize The final break down of geocodes was: Google Rooftop: 17,462 Google POI: 77,878 Bing Re-geocode: 2,819 Analyst review: 2,835 Discuss closing observations, impressions, conclusions