Download presentation
Presentation is loading. Please wait.
Published byBrett Ralf Shields Modified over 6 years ago
1
Richard Wen rwen@ryerson.ca
Using Open Source Python Packages for Machine Learning on Vector Geodata Richard Wen
2
Overview Introduction Methods Results Conclusion
Python, Machine Learning, Vector Geodata Methods Workflow and packages with code samples Results Example use with OpenStreetMap Data Conclusion Limitations Summary and key points
3
Introduction A brief overview of Python, machine learning, and vector geodata in the context of open source and GIS
4
Python Readable code and rapid development
90,000+ open source packages1 375,000+ repositories2 import arcpy, qgis print('Python is also used in ArcGIS and QGIS!') 1pypi.python.org, 2coderstats.net
5
Python Packages Often free with variety and abundance
Control and customization Large open source online community1 62.5% of Python developers interested in continuing use of Python 63% of ~800 data scientists use Python 76% of ~42,000 use Stack Overflow to get help for their job 1stackoverflow.com/research/developer-survey-2016
6
Machine Learning Find structure in example data Data driven solutions
Enabled by recent technologies “Field of study that gives computers the ability to learn without being explicitly programmed.” – Arthur Samuel, 1959
7
Machine Learning Example
Satellite Image Tile Search (terrapattern.com)
8
Vector Geodata Objects defined by vertices and join rules
Exist in geographic space Represent real-world features Point Line Polygon
9
Objective Using machine learning, implement a workflow for learning geometry and placement patterns to predict a variable of interest. Potential Uses Knowledge discovery of local geography Geometry and placement error detection Missing data completion
10
Methods Demonstration of an automated workflow with selected Python Packages and code samples
11
Automated Workflow Geodata Train Test Geoprocess Machine Learning
Output
12
Automated Workflow (Cont.)
Data I/O Read and Write Spatial Data Geoprocessing Geometry and Placement Variables Machine Learning Models and Scoring
13
1. Data I/O pandas: data manipulation
geopandas: spatial extension of pandas import geopandas shp = geopandas.read_file('path/to/file.shp') shp.to_file('path/to/out.shp')
14
2. Geoprocessing Projection: define, reproject
Geometry: area, length, vertices Placement: spatial relations, coordinates shp.to_crs({'init': 'epsg:4326'}) shp.area shp.intersects(shp[0])
15
3. Machine Learning sklearn: machine learning models
Supervised classification models: Train Data: example data to learn from Test Data: data for model to predict against variable1 variable2 target 1.5 yes class1 7.8 no class2 …
16
3. Machine Learning (Training)
Training a model area length type point 100 40 polygon import sklearn.svm, pandas train = pandas.read_csv('path/to/train.csv') model = sklearn.svm.SVC() model.fit(train[['area', 'length']], train['type'])
17
3. Machine Learning (Testing)
Predicting and scoring area length type ? 300 150 test = pandas.read_csv('path/to/test.csv') model.predict(test) model.score(test, ['point', 'polygon'])
18
Results An example of using the automated workflow on OpenStreetMap data in Toronto, Ontario
19
Applied Workflow Toronto OSM Train Test Geometry and Place
Random Forest Output
20
1. Geodata OpenStreetMap (OSM) for Toronto, ON
~70,000 vector objects with 53 target classes Class examples: library, subway, helipad Roads 46,812 Line 16 classes Transport Points 21,309 Point 13 classes Amenities 1507 8 classes Places 760 Aeroways 438 2 classes Transport Areas 72 Polygon 6 classes
21
2. Geometry and Place Variables Examples area 100, 500, 1200 length
500, 800, 1100 vertices 1, 2, 8 x y distances to nearest library, subway, etc 100, 300, 2000 *Reprojected to NAD83/UTM Zone 17N
22
3. Random Forest Decision trees on random subsets
Majority vote as prediction Interpretability and Scalability Near? Walk Drive Bus Vehicle? See sklearn.ensemble.RandomForestClassifier
23
4. Output Data Tables Plots and Summary Report
Predictions and Probabilities Performance Measures Outliers Variable Importance and Contributions Plots and Summary Report
24
Conclusion A discussion of open source Python package limitations, ending with a summary and key points for this presentation
25
Limitations Divided standards and management
Support and updates are not guaranteed Skills and knowledge requirements
26
Summary Demonstrated with: Python packages effective for needs of:
Geoprocessing (pandas, geopandas) Machine learning (sklearn) Python packages effective for needs of: Low-cost and specialized workflows Modular and innovative tools But requires: Time to learn and explore Managing and integrating mixed sources
27
Key Points Open Source Python packages have: Large community support
Inclusive development Permissive freedoms Which enables: Skills and knowledge development Solutions to a wide variety of needs Collaboration and networking But: Standards and support are not guaranteed Can be time expensive and inconsistent
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.