Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds
BUT Ian Turton, CCG, Leeds University For the latest on Stan
Why would we want to do this? Geographical Data Explosion Public imperative Lack of geographically aware tools
Mountains of Data
Swamps of Data
We know what you spend...
…where you spend it...
…who you talk to...
…where you live... LS2 9JT What your neighbours are like
...Crime data and... crime type crime location insurance data
...Health data environmental data socio-economic data admissions data
Geographical Hyperspace Geography –x,y co-ordinates, postcodes Time –days, hours, months Attributes –place - pollution sources, soil type, distance to motorway –cases - type of disease, age, sex
Data Mining
Turning data into knowledge How do these data sets fit together? Is there anything important hidden in here? Does geography make a difference?
DatatypeNature of Data Interaction _________________________________________ 1.spatial data 2.time data 3.multiple attribute data 4.geography and time data 5.time and multiple attribute data 6.geography and multiple attribute data 7.geography, time, and multiple attribute data
HISTORICALLY these effects have been hidden by research design BUT
The result is often data strangulation The patterns are being destroyed or damaged by the research design
What is needed is a geographic data mining technology that works
How can we do this? Developing new smarter methods Testing them –HPC is vital to this process Disseminating them –Internet –Java
Being SMART is not just a matter of methodology but also involves access, usability, relevancy, and result communication factors
The complete novice should be able to perform some sophisticated geographical analysis and get some useful and understandable results on the same day the work started
User Friendly Spatial Analysis provides analysis that users need simple to perform highly automated making it fast and efficient readily understood results are self-evident and can be communicated to non-experts safe and trustworthy
What we did in this study Comparison of techniques on the same data Multiple techniques –GAM/K –GAM/K-T –MAPEX –GDM1/2 –FLOCK –Proprietary Data Mining Tools
Study Area
Stan’s Cases
Chris’ cases
How to search the geographic space Exhaustively –GAM, GEM Smartly –Genetic algorithm mapex, gdm –Flocking boids
GAM & GEM
Mapex & GDM
FLOCK
And the Attributes... Exhaustively –GAM, GEM Smartly –Genetic algorithm mapex, gdm, boids
GAM & GEM with time
Rock A Rock B Rock C Rock D Geology Map
railway 2 km buffer polygon
Combined Geology and Railway Buffer Map Rock A Rock B Rock C Rock D 2 km
Combinations of Attributes If we have 8 attributes with 10 classes each There are 3160 permutations of 2 classes from 80 compared with 24,040,016 if any 5 are used Smart searches are essential –use GA to generate possible combinations of interest
Proprietary Data Miners
Results How to visualise them?
Results GAM/K –did very well –was not put off by time or attributes GAM/KT –worked well –time clusters found MAPEX / GDM/1 –worked well
Results continued FLOCK –worked very well Data mining –didn’t work at all well out of the box –could have built a GAM inside them
What next? Build a harder data set for more tests Re-run the analysis Put it all on the web
Thanks to European Research Office of the US Army ESRC grant R for paying Ian’s salary. ESRC/JISC for the Census data purchase. OS for the bits of the maps they own.
To find out more Web based Multi-engine spatial analysis tools James Macgill, Openshaw and Turton –Session 1A Sunday Smart Crime Pattern Analysis using GAM Ian Turton, Openshaw and Macgill –Session 7A Tuesday
Contacts check out smart pattern analysis on the web Latest news on Stan