Download presentation
Presentation is loading. Please wait.
Published byNeal Horace Baldwin Modified over 9 years ago
1
Probabilistic Latent Semantic Analysis as a Potential Method for Integrating Spatial Data Concepts R.A. Wadsworth 1, A.J. Comber 2, P.F. Fisher 2 1.Centre for Ecology and Hydrology, Lancaster, UK 2.Dept of Geography, Leicester University, UK
2
Motivation We want to understand how the environment is changing. But, natural resource inventories constantly develop new base-lines. Therefore we want some way to know how similar two categories are so we can decide whether inconsistencies are change or error.
3
Earlier approaches First we just asked people (domain experts) “are ‘a’ and ‘b’, similar or dis-similar or you’re not sure?” But, the domain expert has to make lots of choices, sometimes domain experts aren’t available, you don’t know why they think concepts are similar (or not), etc. so... (Very) simple text mining – the more words used in common in two categories the more similar they are.
4
Case Study In the proceedings we use land-cover categories, but, We’re all here because of Andrew... So, what does his writing tell us about the underlying concepts behind his work?
5
Case Study – the data Used the English language abstracts from the papers provided on his web site. This is a biased sample, do the other papers contain concepts not covered by the English language work? Do they contain collaborations I’ve missed? However, just want to illustrate the process...
6
Case Study – the data Red dots – collaborators Blue squares – papers in this analysis
7
Text Mining Andrew’s Abstracts “A formal model of correctness in cadstre” “Processes in cadstre” “Surveying education for the future” “Object orientated modelling in GIS” “Surveying mapping and LIS education in the USA” “Expert systems for GIS”
8
Guessing what the axis mean 1 st axis.... Education Geometry? ScoreTitle High Surveying, mapping, and land information systems education in the United States. Computer education for surveying engineers. Surveying Education for the Future Zur Einfuhrung eines LIS in der Schweiz 1 Macintosh: rethinking computer education for engineering students..... Spatial concepts, geometric data models and data structures Formalization of Families of Categorical Coverages On the design of formal theories of Geographic space. Concepts and paradigms in spatial information: Are current geographic information systems truly generic? Low Neighbourhood Relations between Fields with Applications to Cellular Networks Notes 1. There is an abstract in English
9
Guessing what the axis mean Second Axis... Object-orientated Education ? ScoreTitle HighFormalization of conceptual models for GIS using GOFER Finite-Resolution Simplicial Complexes Computer cartography for GIS: an object-oriented view on the display transformation. Object-orientated modelling in GIS: Inheritance and propagation Topology in raster and vector representations......... Geographic information science: new methods and technology Surveying, mapping, and land information systems education in the United States. Surveying Education for the Future Toward consensus on a European GIS curriculum: the international post-graduate course on GIS. LowMacintosh: rethinking computer education for engineering students.
10
Why latent analysis? If we knew what the underlying (hidden, latent) concepts are, we might be able to understand why two categories are considered to be similar.
11
Probabilistic Latent Semantic Analysis It is a “generative model” Assumes: documents describe themes and words are associated with themes We observe the frequency of words in documents P(d,w) = P(d)∑ zєZ P(w|z)P(z|d) Therefore, we try and model what latent variables (z’s) exist.
12
Probabilistic Latent Semantic Analysis In practice similar to clustering but... “Documents are not assigned to clusters, they are characterized by a specific mixture of factors with weights P(z|d). These mixing weights offer more modelling power and are conceptually very different from posterior probabilities in clustering models and (unsupervised) naive Bayes models.” Thomas Hofmann 1999
13
PLSA – iterative, stochastic
14
Nine Latent Themes in Andrew’s Work cadastre211.0 models200.6 processes120.9 reality90.6 consistency80.5 constraints70.6 cadastral60.7 geometry60.7 world60.5 work60.5 ontology50.8 focus50.6 cartographic90.7 metadata71.0 scale60.8 perspective60.5 categorical100.9 properties100.5 geographical90.7 coverages81.0 tools60.5 form60.5 generalization50.6 Cadastral systems, metadata and cartography? “A” “B”“C”
15
Latent Themes in Andrew’s work technology170.9 new160.6 development120.5 course71.0 intersection70.7 curriculum70.6 perspective60.5 simplicial51.0 field50.8 shows50.6 computer160.6 education150.7 surveying140.6 raster90.8 engineering80.7 vector80.5 representations80.5 management70.7 processing60.7 profession50.8 functions50.8 hardware50.6 Education and Technology? “D” “E”
16
Latent Themes in Andrew’s work quality110.7 decision80.9 environment80.5 target71.0 decisions70.6 city61.0 interface60.5 uncertainty60.5 metaphor51.0 street51.0 strategy50.8 navigation50.6 direction130.6 directions110.9 distance100.9 reasoning100.9 approach70.6 fields60.8 point60.5 example60.5 cardinal51.0 pricing51.0 qualitative51.0 value51.0 algebraic50.8 geoinformation50.6 Decisions and Directions? “F” “G”
17
Latent Themes in Andrew’s work design130.5 theories91.0 expert90.9 application90.5 implementation80.5 query70.9 examples50.6 discusses50.6 techniques50.6 structure110.5 concepts110.5 geometric100.8 conceptual80.5 describe80.5 specification50.8 Data? “H” “I”
18
Conclusions Simple text mining allows you to relate categories to each other, but, not always easy to say why. PLSA gives some indication of the underlying (fundamental?) themes, but, how stable or useful are the results...?
19
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.