Probabilistic Latent Semantic Analysis as a Potential Method for Integrating Spatial Data Concepts R.A. Wadsworth 1, A.J. Comber 2, P.F. Fisher 2 1.Centre.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

C Introduction to the Geostat project Session on User needs (Geostat workshop in Bled 1-3 october 2008) Lars H. Backer
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Managing risk - terms ► hazardprocess, various perils e.g. flood where is it? what depth? ► exposureasses, thing at risk e.g. housewhere is it? what characteristics?
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
IPY and Semantics Siri Jodha S. Khalsa Paul Cooper Peter Pulsifer Paul Overduin Eugeny Vyazilov Heather lane.
Geographical Information Systems and Science Longley P A, Goodchild M F, Maguire D J, Rhind D W (2001) John Wiley and Sons Ltd 9. Geographic Data Modeling.
Introduction to Databases
Raster Based GIS Analysis
Investigating a bottom-up approach for extracting domain ontologies from urban databases Christophe Chaidron 1, Roland Billen 1 & Jacques Teller 2 1 University.
GIS: The Grand Unifying Technology. Introduction to GIS  What is GIS?  Why GIS?  Contributing Disciplines  Applications of GIS  GIS functions  Information.
1 CPSC 695 Data Quality Issues M. L. Gavrilova. 2 Decisions…
Visual Recognition Tutorial
The Vuel Concept: Towards a new way to manage Multiple Representations in Spatial Databases ISPRS / ICA Workshop Multi-Scale Representations of Spatial.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Geog 458: Map Sources and Errors January Representing Geography.
Cognitive modelling (Cognitive Science MSc.) Fintan Costello
The Map as a Model of Geographic Data The Language of Spatial Thinking Doç.Dr. Necla ULUĞTEKİN İTÜ.
Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
File Systems and Databases
GTECH 201 Lecture 05 Storing Spatial Data. Leftovers from Last Session From data models to data structures Chrisman’s spheres ANSI Sparc The role of GIScience.
Geographic Information Systems
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Information Modeling: The process and the required competencies of its participants Paul Frederiks Theo van der Weide.
Testing Bridge Lengths The Gadsden Group. Goals and Objectives Collect and express data in the form of tables and graphs Look for patterns to make predictions.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Conceptual modelling. Overview - what is the aim of the article? ”We build conceptual models in our heads to solve problems in our everyday life”… ”By.
Geography 241 – GIS I Dr. Patrick McHaffie Associate Professor Department of Geography Cook County, % population < 5.
Geographical Information System GIS By: Yahia Dahash.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Geometric Conceptual Spaces Ben Adams GEOG 288MR Spring 2008.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Standardization and Research Prof. Dr. Christine Giger Swiss Federal Institute of Technology Zurich © Atlas der Schweiz - interaktiv.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
–combines elements of computer science –database design –software design geography –map projections –geographic reasoning mathematics –mathematical topology.
Introduction to Geographic Information Systems (GIS) Lesson 1.
Do tangible interfaces enhance learning? Richard Haines.
Realism and Perceptions of Data Quality in Computer-Displayed Maps Tony Boughman, Geography Department; Collaborators: Dr. Sara Fabrikant, Geography; Dr.
OBJECT-ORIENTED APPROACH TO GIS DATA MANAGEMENT Tomáš Richta, Jiří Žára Computer Graphics Group Department of Computer Science and Engineering Czech Technical.
8. Geographic Data Modeling. Outline Definitions Data models / modeling GIS data models – Topology.
Geographic Techniques for Teachers GCU 674. Today’s Challenges Local, National, Global Environmental, Social, Political, Economic … What is done to help.
A Set of Tools for Map Use in a Digital Environment Barbara Hofer Institute for Geoinformation
URBDP 422 Urban and Regional Geo-Spatial Analysis Lecture 2: Spatial Data Models and Structures Lab Exercise 2: Topology January 9, 2014.
UNCERTML - DESCRIBING AND COMMUNICATING UNCERTAINTY WITHIN THE (SEMANTIC) WEB Matthew Williams
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
CSC241 Object-Oriented Programming (OOP) Lecture No. 1.
1 What is OO Design? OO Design is a process of invention, where developers create the abstractions necessary to meet the system’s requirements OO Design.
Synthetic Experiments for Spatial Reference Systems Engineering Approaches to Cognitive Science –Andrew U. Frank –Geoinformation –TU Vienna
ISPRS Congress 2000 Multidimensional Representation of Geographic Features E. Lynn Usery Research Geographer U.S. Geological Survey.
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Geography is not Cartography. Intradisciplinary: history, economics, political science, anthropology, sociology, etc. Interdisciplinary: science (environmental.
Be.wi-ol.de User-friendly ontology design Nikolai Dahlem Universität Oldenburg.
Bayesian Networks in Document Clustering Slawomir Wierzchon, Mieczyslaw Klopotek Michal Draminski Krzysztof Ciesielski Mariusz Kujawiak Institute of Computer.
What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial data”
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The Relational Model Lecture #2 Monday 21 st October 2001.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Identifying Mathematical Knowledge for Teaching at the Secondary Level (6-12) from the Perspective of Practice Joint NSF-CLT Conference on Curriculum,
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
DATA MODELS.
Simon: Modeling and Analysis of Design Space Structures
Presented by: Prof. Ali Jaoua
Schema translation and data quality Sven Schade
Michal Rosen-Zvi University of California, Irvine
Session 3: Information Modelling and Information Communities
Topic Models in Text Processing
Presentation transcript:

Probabilistic Latent Semantic Analysis as a Potential Method for Integrating Spatial Data Concepts R.A. Wadsworth 1, A.J. Comber 2, P.F. Fisher 2 1.Centre for Ecology and Hydrology, Lancaster, UK 2.Dept of Geography, Leicester University, UK

Motivation We want to understand how the environment is changing. But, natural resource inventories constantly develop new base-lines. Therefore we want some way to know how similar two categories are so we can decide whether inconsistencies are change or error.

Earlier approaches First we just asked people (domain experts) “are ‘a’ and ‘b’, similar or dis-similar or you’re not sure?” But, the domain expert has to make lots of choices, sometimes domain experts aren’t available, you don’t know why they think concepts are similar (or not), etc. so... (Very) simple text mining – the more words used in common in two categories the more similar they are.

Case Study In the proceedings we use land-cover categories, but, We’re all here because of Andrew... So, what does his writing tell us about the underlying concepts behind his work?

Case Study – the data Used the English language abstracts from the papers provided on his web site. This is a biased sample, do the other papers contain concepts not covered by the English language work? Do they contain collaborations I’ve missed? However, just want to illustrate the process...

Case Study – the data Red dots – collaborators Blue squares – papers in this analysis

Text Mining Andrew’s Abstracts “A formal model of correctness in cadstre” “Processes in cadstre” “Surveying education for the future” “Object orientated modelling in GIS” “Surveying mapping and LIS education in the USA” “Expert systems for GIS”

Guessing what the axis mean 1 st axis.... Education Geometry? ScoreTitle High Surveying, mapping, and land information systems education in the United States. Computer education for surveying engineers. Surveying Education for the Future Zur Einfuhrung eines LIS in der Schweiz 1 Macintosh: rethinking computer education for engineering students..... Spatial concepts, geometric data models and data structures Formalization of Families of Categorical Coverages On the design of formal theories of Geographic space. Concepts and paradigms in spatial information: Are current geographic information systems truly generic? Low Neighbourhood Relations between Fields with Applications to Cellular Networks Notes 1. There is an abstract in English

Guessing what the axis mean Second Axis... Object-orientated Education ? ScoreTitle HighFormalization of conceptual models for GIS using GOFER Finite-Resolution Simplicial Complexes Computer cartography for GIS: an object-oriented view on the display transformation. Object-orientated modelling in GIS: Inheritance and propagation Topology in raster and vector representations Geographic information science: new methods and technology Surveying, mapping, and land information systems education in the United States. Surveying Education for the Future Toward consensus on a European GIS curriculum: the international post-graduate course on GIS. LowMacintosh: rethinking computer education for engineering students.

Why latent analysis? If we knew what the underlying (hidden, latent) concepts are, we might be able to understand why two categories are considered to be similar.

Probabilistic Latent Semantic Analysis It is a “generative model” Assumes: documents describe themes and words are associated with themes We observe the frequency of words in documents P(d,w) = P(d)∑ zєZ P(w|z)P(z|d) Therefore, we try and model what latent variables (z’s) exist.

Probabilistic Latent Semantic Analysis In practice similar to clustering but... “Documents are not assigned to clusters, they are characterized by a specific mixture of factors with weights P(z|d). These mixing weights offer more modelling power and are conceptually very different from posterior probabilities in clustering models and (unsupervised) naive Bayes models.” Thomas Hofmann 1999

PLSA – iterative, stochastic

Nine Latent Themes in Andrew’s Work cadastre211.0 models200.6 processes120.9 reality90.6 consistency80.5 constraints70.6 cadastral60.7 geometry60.7 world60.5 work60.5 ontology50.8 focus50.6 cartographic90.7 metadata71.0 scale60.8 perspective60.5 categorical100.9 properties100.5 geographical90.7 coverages81.0 tools60.5 form60.5 generalization50.6 Cadastral systems, metadata and cartography? “A” “B”“C”

Latent Themes in Andrew’s work technology170.9 new160.6 development120.5 course71.0 intersection70.7 curriculum70.6 perspective60.5 simplicial51.0 field50.8 shows50.6 computer160.6 education150.7 surveying140.6 raster90.8 engineering80.7 vector80.5 representations80.5 management70.7 processing60.7 profession50.8 functions50.8 hardware50.6 Education and Technology? “D” “E”

Latent Themes in Andrew’s work quality110.7 decision80.9 environment80.5 target71.0 decisions70.6 city61.0 interface60.5 uncertainty60.5 metaphor51.0 street51.0 strategy50.8 navigation50.6 direction130.6 directions110.9 distance100.9 reasoning100.9 approach70.6 fields60.8 point60.5 example60.5 cardinal51.0 pricing51.0 qualitative51.0 value51.0 algebraic50.8 geoinformation50.6 Decisions and Directions? “F” “G”

Latent Themes in Andrew’s work design130.5 theories91.0 expert90.9 application90.5 implementation80.5 query70.9 examples50.6 discusses50.6 techniques50.6 structure110.5 concepts110.5 geometric100.8 conceptual80.5 describe80.5 specification50.8 Data? “H” “I”

Conclusions Simple text mining allows you to relate categories to each other, but, not always easy to say why. PLSA gives some indication of the underlying (fundamental?) themes, but, how stable or useful are the results...?

Thank you