ALLEGHENY COUNTY DEPARTMENT OF HUMAN SERVICES You live where? Address and geocoding woes Catherine, Amy, Melinda
A LLEGHENY C OUNTY D EPARTMENT OF H UMAN S ERVICES & G EOGRAPHIC I NFORMATION S YSTEM D IVISION Largest county department Fund nearly 400 providers for 1,600 distinct services DARE: Data, Analysis, Research, and Evaluation Data Warehouse April 9, Codefest 2016
Why are we here? We would like to standardize, geocode, and cache address data from multiple sources.
4 Codefest 2016 ADDRESS IDSOURCE ADDRESS 123Ross Park Mall, PGH 1241 Smithfield St, Fl #4, PGH, PA A sample input file will be provided with record from 8 different sources. All address information needed for geocoding will be contained in a single field.
5 Codefest Case Folding (all lower case) 2.Normalize Feature Words (i.e. “rd.” > “road”, “ste.” > “suite”, “PGH” > “Pittsburgh”) 3.Remove apartment information from the address field and put it in a different field. 4.Standardize Intersections (i.e. “5 th Ave and Smithfield St” > “5 th Ave at Smithfield St”) 5.Strip punctuation
6 Codefest 2016 Some Geocoding Options: ArcGIS Googl e Geocoding API Google Places API OpenStreetMap NOMinatim Limitations: Privacy Solution must run on Windows 7 Professional
7 Codefest 2016 ADDRESS ID CLEANED ADDRESS ADDRESS LINE 2 GEOCODED ADDRESS MATCH ACCURACY* LATLONGNEIGHBOR -HOOD MUNICI- PALITY ZIP CODE 123Ross Park Mall, Pittsburgh, PA Ross Park Mall Dr, Pittsburgh, PA 15237, USA GEOMETRIC CENTER Ross Township Smithfield Street, Pittsburgh, PA Floor 41 Smithfield St, Pittsburgh, PA 15222, USA ROOFTOP Downtown15222 Optional Additional Fields: [county, census tract, school district] Optional Additional Fields: [county, census tract, school district] *alternatively, use a match score
What We Have: Address data from multiple internal and external information systems with varying quality and completeness. What We Want: An open source solution (implemented in R or Python or ArcGIS) with documentation – Solution Outputs (.csv file): » Original Input (address ID, input address) » Cleaned, Standardized Addresses » Geographic elements Why We Want It: Streamlines workflow for geocoding and geospatial analysis. – Efficiency – Fast – SOOOOOO many addresses Codefest 2016