Decision Support Tools for River Quality Management

Slides:



Advertisements
Similar presentations
Chapter 2 The Process of Experimentation
Advertisements

Applications of one-class classification
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
Clustering: Introduction Adriano Joaquim de O Cruz ©2002 NCE/UFRJ
Chapter 1 What is Science
Modeling Human Reasoning About Meta-Information Presented By: Scott Langevin Jingsong Wang.
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
Cognitive Psychology Chapter 7. Cognitive Psychology: Overview  Cognitive psychology is the study of perception, learning, memory, and thought  The.
Parameterising Bayesian Networks: A Case Study in Ecological Risk Assessment Carmel A. Pollino Water Studies Centre Monash University Owen Woodberry, Ann.
1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi.
Unsupervised Pattern Recognition for the Interpretation of Ecological Data by Mark A. O’Connor Centre for Intelligent Environmental Systems School of Computing.
Introduction to Neural Networks Simon Durrant Quantitative Methods December 15th.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Building Knowledge-Driven DSS and Mining Data
Data Mining – Intro.
Chapter 5 Data mining : A Closer Look.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Mining Techniques
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Understanding Data Analytics and Data Mining Introduction.
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Research Terminology for The Social Sciences.  Data is a collection of observations  Observations have associated attributes  These attributes are.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Department of Information Technology Indian Institute of Information Technology and Management Gwalior AASF hIQ 1 st Nov ‘09 Department of Information.
Anomaly detection with Bayesian networks Website: John Sandiford.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Multivariate Data Analysis CHAPTER seventeen.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Robust GW summary statistics & robust GW regression are used to investigate spatial variation and relationships in a freshwater acidification critical.
Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data warehouse and query tools Decision trees.
Fundamentals of Information Systems, Third Edition2 Principles and Learning Objectives Artificial intelligence systems form a broad and diverse set of.
Experimentation in Computer Science (Part 1). Outline  Empirical Strategies  Measurement  Experiment Process.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
I Robot.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Robust GW summary statistics & robust GW regression are used to investigate a freshwater acidification data set. Results show that data relationships can.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 12-1 Chapter 12 Advanced Intelligent Systems.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
Presented by:- Reema Tariq Artificial Intelligence.
Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.
A field of study that encompasses computational techniques for performing tasks that require intelligence when performed by humans. Simulation of human.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Chapter Seventeen Copyright © 2004 John Wiley & Sons, Inc. Multivariate Data Analysis.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Managing Qualitative Knowledge in Software Architecture Assesment Jilles van Gurp & Jan Bosch Högskolan Karlskrona/Ronneby in Sweden Department of Software.
What Is Cluster Analysis?
By Arijit Chatterjee Dr
CEE 6410 Water Resources Systems Analysis
Facet5 Audition Module Facilitator Date Year.
DSS & Warehousing Systems
Chapter 12 Advanced Intelligent Systems
CSc4730/6730 Scientific Visualization
Statistical Data Analysis
Presentation transcript:

Decision Support Tools for River Quality Management Martin Paisley, David Trigg and William Walley Centre for Intelligent Environmental Systems, Faculty of Computing, Engineering & Technology, Staffordshire University

Contents Background The River Pollution Diagnostic System (RPDS). Our Aims Our Approach The River Pollution Diagnostic System (RPDS). Pattern Recognition Data exploration, diagnosis and classification The River Bayesian Belief Network (RPBBN). Plausible Reasoning Diagnosis, prognosis and scenario-testing Summary © 2009 David Trigg

Our Aims Maximise the benefit gained from existing databases/information, increased objectivity. Exploit the available technology to create sophisticated, flexible, multi-purpose tools Make the technology easy to use. Provide expert support to those who need it to help them do their job. © 2009 David Trigg

Our Approach Our initial studies with expert ecologist H.A. Hawkes lead to goal of trying to capture expertise. Expert systems is the branch of Artificial Intelligence (AI) that attempts to capture expertise in a computer based system. Study of an expert is required to reveal: what they do, how they do it; and what information and mental processes they use. © 2009 David Trigg

The Expert Ecologist Our early research discovered the expert ecologist tend used to use two complementary techniques. Memory (pattern matching) – “I’ve seen this before, it was due to …” Scientific knowledge (plausible reasoning) – based on their knowledge of the system and available evidence they are able to reason about the likely state of other elements of the system. We set out to replicate these processes and produce software that would allow people to gain easy access to ‘expert’ interpretation © 2009 David Trigg

The Modelling Tools After over a decade of research in this field the current modelling techniques we use are: our own clustering and visualisation system know as MIR-Max (Mutual Information & Regression Maximisation) for pattern matching; and Bayesian Belief Networks (BBN) for plausible reasoning. These techniques were used to produce the models on which our decision support software is based. © 2009 David Trigg

What the tools provide. Visualisation and exploration of large complex datasets. (RPDS) Classification of samples. (RPDS) Diagnosis of potential pressures. (RPDS & RPBBN) Prediction of biology from environmental and chemical parameters. (RPBBN) Scenario testing – impact of changing sample parameters. (RPBBN) © 2009 David Trigg

Pattern Recognition © 2009 David Trigg

Pattern Recognition –What is it? Recognition of patterns – pattern implies multiple attributes, so is a multivariate technique. Classification of a new pattern (thing) as being of a particular type, based on similarity to a set of attributes indicative of that type. Success of pattern recognition reliant on having the appropriate distinguishing features. Enough features to clearly discriminate. Appropriate set of features – orthogonal/uncorrelated. © 2009 David Trigg

Pattern Recognition – Why do it? Method of managing information – reduce multiple instances as single type or kind. Classification of situations allows to cope with novel but similar situations. Exploitation of existing ‘information’. Once identified as being of a type ‘unknown’ attributes can be inferred. © 2009 David Trigg

Pattern Recognition - Clustering To create a model first need to cluster training samples The training samples contain both data on the training/clustering variables and additional ‘information’ variables (those that are to be predicted). In the case of RPDS, the training variables are the biology and the information variables the chemical and other stress parameters. © 2009 David Trigg

Pattern Recognition - Clustering Set of samples .. grouped into ‘clusters’ .. to provide templates/types in the model © 2009 David Trigg

Pattern Recognition - Classification Classification involves matching a new sample with an existing cluster. Based on the training variables. In this example the closest match for the new sample is cluster ‘A’. This is the ‘classification’ of the new sample. The quality of the cluster is that assigned to the new sample. © 2009 David Trigg

Pattern Recognition - Diagnosis The diagnosis is derived from the values for the information variables (the blue bars) in the training samples grouped in the cluster. The predicted values are derived from the training samples in the cluster. These values are usually a statistic such as mean, median or a percentile. © 2009 David Trigg

Visualisation Classification can appear as a black box system. Visualisation is a useful tool. Opens the model up for inspection. Helps understand & validate model. Helps explore data and discovery of new relationships. To help visualisation clusters can be ‘ordered’ in a map. © 2009 David Trigg

Ordering Ordering sole purpose is to help visualise the data and the cluster model, no more no less. The process involves arranging the clusters in a space/map usually based on similarity. Similar clusters are placed close together dissimilar far apart. Our algorithm, R-Max, uses the r correlation coefficient between distances in data space and corresponding distances in output space © 2009 David Trigg

Data Visualisation - Ordering Clusters j d x y z X Y D j i d = distance in data space D = distance between clusters in map R-Max aims to maximise the correlation r between d and D © 2009 David Trigg

Pattern Recognition - Ordering Clusters templates/types … destination map … clusters ordered by similarity © 2009 David Trigg

Pattern Recognition - Visualisation Maps can be colour-coded to show the value of any chosen feature across all of the clusters ‘Feature maps’ and ‘templates’ form the basis of RPDS visualisation © 2009 David Trigg

RPDS 3.0 Primary uses are Data exploration – visual element to the clustered/organised data allows existing relationships in the data to be verified (model validation) and new ones to be identified (data mining). Classification - assignment of a sample to cluster allows an estimated quality class to be defined. Diagnosis - The ‘known’ stress information associated with other samples in the cluster can help diagnose potential problems. © 2009 David Trigg

RPDS 3.0 - Data Exploration © 2009 David Trigg

RPDS 3.0 - Data Exploration © 2009 David Trigg

RPDS 3.0 - Data Exploration © 2009 David Trigg

RPDS 3.0 - Data Exploration © 2009 David Trigg

RPDS 3.0 - Data Exploration © 2009 David Trigg

RPDS 3.0 - Classification © 2009 David Trigg

RPDS 3.0 - Classification © 2009 David Trigg

RPDS 3.0 - Diagnosis © 2009 David Trigg

RPDS 3.0 - Comparison © 2009 David Trigg

Plausible Reasoning © 2009 David Trigg

Reasoning Reasoning: Thinking that is coherent and logical. A set of cognitive processes by which an individual may infer a conclusion from an assortment of evidence or from statements of principles. Goal-directed thought that involves manipulating information to draw conclusions of various kinds. Use available information combined with existing knowledge to derive conclusions for a particular purpose. © 2009 David Trigg

Reasoning with Uncertainty If reasoning is ‘coherent and logical’, how can it deal with unknowns, conflicting information and uncertainty? The ability to quantifying uncertainty helps to resolve conflicts and provides ‘lubrication’ for the reasoning process. In humans this takes the form of beliefs. Probability theory provides a mathematical method of handling uncertainty. © 2009 David Trigg

Probability Theory Probability theory is robust and proven to be a mathematically sound. It provides a method for representing and manipulating uncertainty. It is one of the principle methods used for handling uncertainty in computer based systems. Bayesian Belief Networks (BBN) are currently the most popular methods for creating probabilistic systems. © 2009 David Trigg

Bayesian Belief Networks A BBN consists of two elements causal network and a set of probability matrices. A causal network is a graph of arcs (variables) and directed edges (relationships). The network defines the relationships between all the variables in a domain. The causal variables are often referred to ‘parents’ and the effect variables as ‘children’. Can be defined through data analysis but is probably best achieved by an expert. © 2009 David Trigg

Causal Network © 2009 David Trigg

Probability Matrix The probability matrices encode the relationship between variables. A probability is required for every combination of parent and child states. The number of states grows geometrically meaning that the derivation probabilities is often better achieved via data analysis. © 2009 David Trigg

Probability Values (0 - 100) Outputs - Predictions The outputs of the system are likelihood of each of the states of the variables occurring. The whole system is updated every time evidence is entered regardless of where it occurs. The most common way to represent the values is through a bar chart, where the bars depict the likelihood of each state. Variable Name State Labels Probability Bars Probability Values (0 - 100) © 2009 David Trigg

RPBBN 2.0 Primary uses are: Prediction of concentrations of common ‘chemical’ pollutants from biological sample data. Scenario testing, prediction of new biological community and biological assessment ‘scores’ based on the modification of changeable environmental and chemical parameters for a site. © 2009 David Trigg

RPBBN 2.0 - Prediction © 2009 David Trigg

RPBBN 2.0 - Prediction © 2009 David Trigg

RPBBN 2.0 - Scenario Testing © 2009 David Trigg

RPBBN 2.0 - Scenario Testing © 2009 David Trigg

Summary RPDS organises the EA dataset allowing exploration and analysis and provides the ability to classify new samples and diagnose potential problems. RPBBN allows prediction of the states of variables in a system based on any available evidence. Making it useful for diagnosis, prognosis and scenario testing. Together these tools can help decision makers identify potential problems, suggest areas for further investigation, help develop programmes of remedial action and define targets. © 2009 David Trigg

Summary The models are based primarily on data analysis making them more objective than expert opinion. The systems robust and consistent in their operation. The software is easily reproduce and distributed meaning that the valuable expertise they hold can easily be spread through out an organisation © 2009 David Trigg

The Future River Quality - include more geographic information and move from site to river basin management. Improvement in algorithms, incorporation of sample bias and improved confidence measures. Major revision of software – potentially rewritten as web-based application. © 2009 David Trigg