Download presentation
Presentation is loading. Please wait.
Published byKathlyn Morrison Modified over 8 years ago
1
ALLEGHENY COUNTY DEPARTMENT OF HUMAN SERVICES You live where? Address and geocoding woes Catherine, Amy, Melinda
2
A LLEGHENY C OUNTY D EPARTMENT OF H UMAN S ERVICES & G EOGRAPHIC I NFORMATION S YSTEM D IVISION Largest county department Fund nearly 400 providers for 1,600 distinct services DARE: Data, Analysis, Research, and Evaluation Data Warehouse April 9, 2015 2 Codefest 2016
3
Why are we here? We would like to standardize, geocode, and cache address data from multiple sources.
4
4 Codefest 2016 ADDRESS IDSOURCE ADDRESS 123Ross Park Mall, PGH 1241 Smithfield St, Fl #4, PGH, PA 15222 A sample input file will be provided with 18730 record from 8 different sources. All address information needed for geocoding will be contained in a single field.
5
5 Codefest 2016 1.Case Folding (all lower case) 2.Normalize Feature Words (i.e. “rd.” > “road”, “ste.” > “suite”, “PGH” > “Pittsburgh”) 3.Remove apartment information from the address field and put it in a different field. 4.Standardize Intersections (i.e. “5 th Ave and Smithfield St” > “5 th Ave at Smithfield St”) 5.Strip punctuation
6
6 Codefest 2016 Some Geocoding Options: ArcGIS Googl e Geocoding API Google Places API OpenStreetMap NOMinatim Limitations: Privacy Solution must run on Windows 7 Professional
7
7 Codefest 2016 ADDRESS ID CLEANED ADDRESS ADDRESS LINE 2 GEOCODED ADDRESS MATCH ACCURACY* LATLONGNEIGHBOR -HOOD MUNICI- PALITY ZIP CODE 123Ross Park Mall, Pittsburgh, PA Ross Park Mall Dr, Pittsburgh, PA 15237, USA GEOMETRIC CENTER 40.543275-80.010159Ross Township 15237 1241 Smithfield Street, Pittsburgh, PA 15222 Floor 41 Smithfield St, Pittsburgh, PA 15222, USA ROOFTOP40.437105-80.001022Downtown15222 Optional Additional Fields: [county, census tract, school district] Optional Additional Fields: [county, census tract, school district] *alternatively, use a match score
8
What We Have: Address data from multiple internal and external information systems with varying quality and completeness. What We Want: An open source solution (implemented in R or Python or ArcGIS) with documentation – Solution Outputs (.csv file): » Original Input (address ID, input address) » Cleaned, Standardized Addresses » Geographic elements Why We Want It: Streamlines workflow for geocoding and geospatial analysis. – Efficiency – Fast – SOOOOOO many addresses Codefest 2016
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.