The end of geographic theory ? Prospects for model discovery in the geographic domain Mark Gahegan Centre for eResearch & Dept. Computer Science University.

Slides:



Advertisements
Similar presentations
Slide 1 of 18 Uncertainty Representation and Reasoning with MEBN/PR-OWL Kathryn Blackmond Laskey Paulo C. G. da Costa The Volgenau School of Information.
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Introduction of Probabilistic Reasoning and Bayesian Networks
Designing a Continuum of Learning to Assess Mathematical Practice NCSM April, 2011.
Chapter 4 Validity.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
Chapter 1 Conducting & Reading Research Baumgartner et al Chapter 1 Nature and Purpose of Research.
Developing Ideas for Research and Evaluating Theories of Behavior
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Testing Bridge Lengths The Gadsden Group. Goals and Objectives Collect and express data in the form of tables and graphs Look for patterns to make predictions.
Data Mining – Intro.
Correlational Designs
Chapter 10: Architectural Design
Qualitative Research: Data Analysis and Interpretation
CORRELATIO NAL RESEARCH METHOD. The researcher wanted to determine if there is a significant relationship between the nursing personnel characteristics.
Science and Engineering Practices
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Data Mining Techniques
Framework for K-12 Science Education
Chapter 10 Architectural Design
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Research Terminology for The Social Sciences.  Data is a collection of observations  Observations have associated attributes  These attributes are.
Unit 1 Lesson 4 Representing Data
Geovisualization for Constructing and Sharing Concepts Alan M. MacEachren, Mark Gahegan, & Bill Pike GeoVISTA Center Geography, Penn State
TEA Science Workshop #3 October 1, 2012 Kim Lott Utah State University.
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Discovering Dynamic Models Lecture 21. Dynamic Models: Introduction Dynamic models can describe how variables change over time or explain variation by.
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Mining fuzzy domain ontology based on concept Vector from wikipedia category network.
The Influence of Feature Type, Feature Structure and Psycholinguistic Parameters on the Naming Performance of Semantic Dementia and Alzheimer’s Patients.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Conducting and Reading Research in Health and Human Performance.
บทบาทของนักสถิติต่อภาคธุรกิจ และอุตสาหกรรม. Scientific method refers to a body of techniques for investigating phenomena, acquiring new knowledge, or.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Generic Tasks by Ihab M. Amer Graduate Student Computer Science Dept. AUC, Cairo, Egypt.
Learning from Model-Produced Graphs in a Climate Change Science Class Catherine Gautier Geography Department UC Santa Barbara.
Scientific Method. Science Science: A way of learning about the natural world – Includes all of the knowledge gained by exploring the natural world –
Chapter 10. The Explorer System in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans On, Kyoung-Woon Biointelligence Laboratory.
Data Mining and Decision Support
1 URBDP 591 A Analysis, Interpretation, and Synthesis -Assumptions of Progressive Synthesis -Principles of Progressive Synthesis -Components and Methods.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Books Visualizing Data by Ben Fry Data Structures and Problem Solving Using C++, 2 nd edition by Mark Allen Weiss MATLAB for Engineers, 3 rd edition by.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Chapter 12 Understanding Research Results: Description and Correlation
Themes in Geosciences.
Statistical Data Analysis
Data Mining 101 with Scikit-Learn
Conceptual Frameworks, Models, and Theories
CSc4730/6730 Scientific Visualization
Overview of Machine Learning
Unit 1 Lesson 4 Representing Data
Causal Models Lecture 12.
Course Introduction CSC 576: Data Mining.
Statistical Data Analysis
Data Warehousing Data Mining Privacy
Presentation transcript:

The end of geographic theory ? Prospects for model discovery in the geographic domain Mark Gahegan Centre for eResearch & Dept. Computer Science University of Auckland, New Zealand

The holy grail of analytics Analytical models that can explain their own reasoning – David Harvey – Peter Gould – Stan Openshaw Computational Model Discovery (or Discovery Informatics)

Recap: there are two kinds of analytical models… - Predictive models - Descriptive models

In what way is this new? Data mining & knowledge discovery – Does not emphasize model comprehensibility – Does not take advantage of prior knowledge – Produces predictive models that do not connect to existing knowledge Computational Model Discovery – Focus on interpretability of models by humans – Interested in explanations by connecting observations to theory

Explanation in Geography (Harvey 1969) Examines the stages of geographic investigation and how together they support explanation, via: - methodological frameworks: the nature of investigation and - philosophy: the nature of the science process and its various conceptual artifacts (includes ontology), - which determine representation: how we abstract and represent the world and - analysis: how we model and analyze the world - through to explanation: which uses theory to describe what our analysis reveals.

Inductive learning of models based on processes A process is a collection of related functions – Differential or algebraic form – Can be a single equation Can have unobserved variables Specifies a causal relationship between one or more input and output variables

Computational Model Discovery Prey Predator

Example Process Model (from SC-IPM, Bridewell, 2008) Prey growth Predation Predator loss Algebraic process to calculate grazing rate Bridewell et al, 2008

Inducing Process Models Summary Input – Time-series data – Domain knowledge – Processes and constraints Structure search – Combine processes together using constraints and an evaluation strategy to limit the search Parameter search – For a given structure, fit parameters and evaluate Output – List of models ranked by score

Computational Model Discovery Given: – a methodology for the research and – a meta-model for the process of the research and – a set of representational forms for the observations (data) – observations for a set of variables; – a set of categories (entities) that the model may include; – a set of generic processes that specify relations among entities; – a set of constraints that indicate plausible relations among processes and entities; Find: – a specific process model and associated parameterization that not only predicts the observed values but also explains them

EVE, a bench robot for drug discovery? Qi et al, 2010, Journal of Integrative Bioinformatics, 7(3):126,

GOES early fire detection system Koltunov et al, 2012

So, how close are we, in GIScience, to discovering process models?

Example domain model: OneGeology

Example library of analytical functions (PySAL)

One possible process for scientific investigation Exploration: EXPLORING, DISCOVERING Analysis: GENERALIZING, MODELING Evaluation: EXPLAINING, TESTING, GENERALIZING Evaluation: EXPLAINING, TESTING, GENERALIZING Presentation: COMMUNICATING, CONSENSUS- BUILDING Presentation: COMMUNICATING, CONSENSUS- BUILDING Synthesis: LEARNING, CATEGORIZING Data Map Explanation confidence Results Theory Category, relation Model Concept Hypothesis Gahegan, 2005

CyberGIS Grand Challenge Create a ‘Geographical Process Model Discovery System’ that integrates: – a science model – a domain (data) model – analysis software – data – (constraints)

Are there limits to what we can learn from data? Yes, but our learned models may still be useful Yes, the model is—at best—as good as the data – But this still might be better than current theory Yes, but as data becomes ubiquitous, then these limits will retreat

End

CyberGIS Workflow: 5 simple (and also very complicated) steps 1.Discover and gain access to, and – to some extent – understand (e.g. the semantics, the provenance, the limitations of) each dataset we intend to use. 2.Harmonize these datasets into a consistent form (data model), for example by re-projecting, converting from raster to vector and harmonizing the semantics. (Data Model Integration) 3.Analyze the datasets via an analytical workflow of some kind. (Software Integration) 4.Validate the accuracy and suitability of the results and 5.Publish the results back into the Infrastructure. The results are of little value unless they maintain connections to the above steps.

Learn a predictive model, even when entire steps/states are missing? Bayesian belief network learning

An example inferred model from GIScience The consumer wants fit-for-purpose data, but the task and domain semantics are not given (latent variables). Gahegan & Adams, 2014

The education of the GIScientist? Better data custodian skills Better scientific computing skills—but you have to bring the geographic understanding too Deeper awareness of the processes /philosophy of our science A greater respect for data… An outward gaze…

Data Concept Results Theory Explanation confidence Exploration: EXPLORING, DISCOVERING Analysis: GENERALIZING, MODELING Evaluation: EXPLAINING, TESTING, GENERALIZING Presentation: COMMUNICATING, CONSENSUS- BUILDING Synthesis: LEARNING, CATEGORIZING Category, relation Map Model Hypothesis Scatterplot, grand tour, projection pursuit, parallel coordinate plot, iconographic displays Self organizing map, k-means, clustering, geographical analysis machine, data mining, concept learning. Interactive visual classification, parallel coordinate plot, separability plots, graphs of relationships machine learning, maximum. likelihood, decision trees, regression & correlation analysis Scene composition, information fusion, visual overlay Statistical modeling, Uncertainty visualization Statistical testing, M-C simulation Maps, navigable worlds, charts, immersive visualizations Databases, Digital libraries, clearinghouses …with types of inference and examples of visual and computational methods

The First Paradigm: Experiment/Measurement The Second Paradigm: Analytical Theory The Third Paradigm: Numerical Simulations The Fourth Paradigm: Data-Driven Science? Data fusion + data mining + synthesis/learning + explanation The Evolving Paths to Knowledge George Djorgovski, Caltech)

Building Explanatory Models from Time-Series Data Process models are a natural choice Many ways to define process Processes are casual relations between one or more input and output variables Processes represent knowledge in notation familiar to scientists – Helpful for explanation