Interoperating with GIS and Statistical Environment for an Interactive Spatial Data Mining Didier Josselin, THEMA, UPRESA 6049 du CNRS, Besançon, GDR CASSINI Xlisp-Stat programming D. Betz, L. Tierney, C. Brunsdon, D. Josselin, L. Guerre, B. Dancuo
French Research Group about GIS ( : GDR CASSINI, ?)
The spatial data mining quest Finding significant relations between geographical objects in order to cluster them The spatial data mining quest Finding significant relations between geographical objects in order to cluster them
Examples of geographical purpose
Sub-objectives at geographical entity scale u 1st door : the statistical dependency some entities have common characteristics... u 2nd door : the spatial relation some entities are contiguous, closed from each others… u 3rd door : the combination of spatial and statistical relation some entities are similar and closed...
Sub-objectives at territory and geographical space scale u 1st door : the spatial cutting out and data aggregation : a succession of deriving... Analysing spatial repartition, Identifiing gradients, Detecting discontinuities... u 2nd door : the spatial auto-correlation measure Global and local u 3rd door : the identification of geographical composite (heterogeneous) entities
Geographical agricultural flows analysis Geographical agricultural flows analysis
Agricultural flows between French communes Agricultural flows between French communes Commune A Commune B
Various flows status
Outgoing flows in Franche-Comté
What are we looking for ? Commune aggregate with its key and boundary Commune described by an attribute Commune couple flow
Which softwares may be available and convenient ? Which softwares may be available and convenient ?
Geographical Information Systems Geographical Information Systems
u Various structured query languages u Existing tools to build clean structured databases u Graphical and mapping functionalities u generally open to other softwares+
u Poor in statistical functions u Rarely integrate Exploratory Data Analysis u Need to write queries rather execute them in a graphic way-
ESDA Environment ESDA Environment
u Numerous statistical functions u Numerous graphic representations u Ease to select objects on screen u Dynamic link between objects u generally open to development by programming+
u Poor in geographical and semiologic functionalities u Does not integrate structured databases functions u Does not include geometrical or topological models-
Any solutions ? Any solutions ?
Modifying existing softwares Modifying existing softwares
First methodological choice Adding to a statistical environment some mapping and relational functionalities First methodological choice Adding to a statistical environment some mapping and relational functionalities ARPEGE’ : a tool to Analyse Robustly in Practice and Explore Geographical Environment (XlispStat)
The « visioner » in ARPEGE’
Using ARPEGE’ to analyse flows
u Dynamic link between multiple objects u Relative fastness to support expert decision making u Facilities to implement relations and triggers between objects u Possibility to focus on many crossed selections+
u Difficult to manage with multiscaling u Users may miss some synthetic statistical indicators or automatic methods u Application must be quite simple (RAM limitations) u Combinatory explosion risk !-
Coupling two complementary softwares Coupling two complementary softwares
Second methodological choice Interoperating with a GIS and a statistical environment software Second methodological choice Interoperating with a GIS and a statistical environment software LAVSTAT : a dynamic Link between ArcView and XlispSTAT
Interaction
LAVSTAT principles ArcView XlispStat Services, DDE Server
u Dynamic link between GIS and Statistical software u The whole functionalities access to both systems u Increases the ways to investigate spatial data+
u A screen is not enough to explore data u A few time loss to make interoperating the two softwares u Not already stable (memory conflicts)-
CONCLUSION CONCLUSION
A few advices for spatial analysis to take reliant decisions in order to shape the future... A few advices for spatial analysis to take reliant decisions in order to shape the future...
If you have some objectives to reach with data to explore...
Choose the appropriate methods... Choose the appropriate methods... 0
Keep a critical look on tools and methods... 1
Choose most robusts methods to analyse your data... 2
Check hypothesis without too tight assumptions 3
Try to dominate time during anaysis and to be inside learning process... 4
Keep in touch with all individual data 5
Bring to light all aspects of your problem by multiple representations 6
Use dynamic links and interactivity 7
Study the fringe as the trend... Study the fringe as the trend... 8
...and model deviation, residuals......and model deviation, residuals
… and relations between geographical objects through different scales... 10
… which may be well defined (semantic,topology, structural, functional...) 11
Validate your results by maths and expertise 12
And consider the “measurement density” is not constant 13