An Ontology-Based Approach to Building BNs for the Weather Forecasting Domain Tali Boneh Ann Nicholson, Kevin Korb (Monash University) John Bally (Bureau of Meteorology) Monash Bayesian Reasoning Workshop April 2006
The weather forecasting domain The Australian Bureau of Meteorology (Bureau) is the national meteorological authority of Australia. Its role is to observe and understand Australian weather and climate and provide weather services. A service is defined by its clients. The output of a service is products, namely weather reports in a variety of formats (text and graphics) using several delivery media such as newspapers, radio and the internet.
Traditional weather forecasting process Forecasters: examine a large amount of data from different sources and in different formats; analyse and integrate these data to generate the weather products/reports; use several tools (Decision Support Systems), as well as their own judgment, to enable integration of information and make diagnosis-prediction decisions. the weather products are created by typing text into formatted forms using a specialised text editor
Characteristics of Decision Support Systems Aim of DSS: to digitally and graphically display data for forecasters The graphical representation enables forecasters to manually interact with the data, adjust it, change, integrate and create new data when necessary The manual interaction enables forecasters to digitally and graphically represent their thoughts The digital representation enables an automated text (product) generation
Requirements and limitations - I The information contains uncertainty incomplete knowledge missing data uncertainty in observation uncertainty in guidance Data quality is low, data are imprecise. It is not always clear how to combine the information how to weigh the different bits of information how to incorporate historical data how to include forecasters’ knowledge
Requirements and limitations - II Existing DSS focus only on data storage, graphical user interface, and automated products generation More advanced meteorological decisions are left to the subjective judgement, experience, knowledge and character of the forecaster: –how to derive weather elements based on others –how to manipulate forecast data –how to integrate data from different sources The final representation is still subjective.
Requirements and limitations - III The domain is highly complex and involves different dimensions –e.g. elements, locations, time issues (time of the year, time of the day, and lead times) The domain evolves and changes rapidly with better understanding of the atmosphere, better technology and better Numerical Weather Prediction models A rapid development of decision support systems is required
Requirements and limitations - IV In some cases it is desirable to implement more than one technology using the same data Approaches to DDSs should support multiple technologies (e.g. BNs, ANNs, rule-based)
New DSS approach Integration of information that can capture complex meteorological concepts in ways that match the forecaster's knowledge. DSSs that –can derive forecast weather elements based on local or synoptic-scale information –modify forecast data using complex meteorological concepts while ensuring weather element consistency –avoid comprehensive modelling and implementation –deal with separate small decision steps.
Current new tools: 'state of the art' The Australian Thunderstorm Interactive Forecast System (TIFS) –the DSS is incorporated in the software as a code –knowledge is not explicit and some may be lost The National Oceanic and Atmospheric Administration's National Weather Service the “Graphical Forecasting Editor” (GFE) –includes a framework, called Smart Tools ( based on Python) –lets forecasters write their own tools –can be documented at any level the forecaster finds appropriate –code can be verified and become available to all forecasters –code is kept in a central repository with its documentation –procedures can be created and become operational quickly.
Disadvantages of current tools Most of these DSSs are rule-based the tools do not appropriately deal with uncertainty Knowledge is not explicit: captured directly into a coding language –representation from which the domain knowledge is not easily recognisable and may be lost as a result of the modelling decisions taken and the representation itself the knowledge cannot be easily shared and reused
Possible Resolution: dealing with uncertainty Bayesian Network technology deals with uncertainty, missing data and poor data quality. Probability theory is one of the scientific ways of dealing with reasoning under uncertainty. Applying formal statistics can yield better results, compared with subjective judgment. The final output of the process is objective and is based on a solid mathematical basis.
Problems with Knowledge Engineering BNs Capturing the knowledge directly into a BN may result in: a representation from which the domain knowledge is not easily recognisable –example: row information may need to be processed before it can be provided to the network. The details will be buried as a code in the implementation loss of information as a result of modelling decisions. –example: omitting a variable from the network for efficiency reasons. The variable could be useful if different technologies are to be implemented
Ontology-based approach Knowledge Base Bayesian Networks Other Technology Data semi-automated construction
Ontology-based approach To overcome potential disadvantages: knowledge should be represented in a form that enables re-use and sharing across software and people need a knowledge-level-model that is independent of particular computer languages The concept of constructing small steps of DSSs requires that the domain expert should be able to develop their own networks. need to support the forecaster in constructing BN A consensual conceptualisation of a domain for the purpose of knowledge to be shared and re-used is called Ontology.
Ontology In Philosophy: a systematic explanation of being In knowledge engineering: a formal, explicit specification of a shared conceptualisation Ontologies aim to capture consensual knowledge in a generic way, for the purpose of re-use and sharing across machines and people.
Ontology design Declarative Knowledge –knowledge about what objects states and relations are in the domain –concepts: wind, temperature, fog Procedural Knowledge –knowledge about how to find relevant facts and make inferences –how to predict: wind, temperature, fog Ontology Declarative Knowledge Procedural Knowledge
Forecasting Ontology – declarative knowledge Weather services and products –Service: aviation, disaster, marine, public –Product: airport briefing, synoptic situation, recent events, media statement Weather data sources: NWPs, radar, satellite, tracker, guidance Weather phenomenon/information –weather elements: wind, temperature, fog, thunderstorm, inversion) –tools additional information: tracker length –other environment information: time issues Database schema
Forecasting Ontology – procedural knowledge procedure –rule based –bayesian network –decision theory –neural network procedure working data output relation algorithm –value description –description-description –general algorithm
Knowledge elicitation Bayesian Network –input variables –output variables –type of connection between input and output (predictor/environment/guidance/network refinements) –working data for learning the probabilities –multiple working data describing the inputs and outputs at runtime
Semi-automated Construction of BN Extraction from ontology –inputs, outputs variables → BN nodes –Direction of arcs: Predictors – from output to input (sensors) Environment – from input to output (background factor) Guidance – from output to input (sensor) Refinement of structure –more arcs can go from the environment to the predictors –Intermediate variables to reduce size of CPTs –CPTs (from data, from experts, combination) Updating ontology
Case study – forecasting fog Different types of variables –guidance: Stern-Parkyn, Regano –meteorological variables weather elements: Moisture, Pressure Gradient and Lapse Rate environment variables (background factors): Rainfall, Month Possible BN structure(s) can be constructed from this knowledge.
Fog – Ontology fragment Fog Y/N Prob Pressure Gradient 3pm Bendigo Pressure Gradient 3pm Wonthaggi pYWON-pYBDG Pressure Gradient 3pm East Sale Pressure Gradient 3pm Hamilton Combined Pressure Gradient 3pm Predictor Predicted Rainfall Amount Y/N 9am-9am Environment Moisture 6/9pm Predictor Stern/Parkyn Environment Regano Guidance Month Actual 6/9pm Data Actual MSLP Data
Incremental prototyping development model Construction in steps –guidance only –meteorology only –combined network
Bayesian Network: fog – guidance only
Bayesian Network: fog – meteorology only
Bayesian Network: fog – combined Environment Weather Guidance Meteorology
ROC curve evaluation Receiver Operating Characteristic (ROC) curves P(true positive) vs. P(false positive) Area under curve (AUC) is global measure –perfect test: AUC = 1 Can be used to find optimal cutoff values
Bureau Evaluation Measures POD (True Positive Rate) True Positive (True Positive + False Negative) = #fog events False Positive Rate False Positive (False Positive + True Negative) = #no-fog events False Positive Ratio (FAR) False Positive (False Positive + True Positive ) = #fog was forecasted
Evaluation Stratified 10-fold cross-validation used Dataset randomly divided into 90% (training) and 10% (validation) fractions –separately for fog and no-fog cases Process repeated for 3 networks
Results: ROC evaluation of the three networks
ROC evaluation of the Melbourne Network
POD & FAR – operations versus network Forecast Operational POD (%) Operational FAR (%) Network POD (%) Network FAR (%) 3pm TAF pm TAF and Code Grey pm TAF pm TAF and Code Grey % cutoff was used for Code Grey 20% cutoff was used for TAF
Ontology preferences FogSay No Fog-20 FogSay Code Grey - less than 5% chance of fog10 FogSay Code Grey - 5% chance of fog16 FogSay Code Grey - 10% chance of fog18 FogSay Code Grey - 20% chance of fog19 FogSay TAF – Prob Fog20 No FogSay No Fog2 No FogSay Code Grey - less than 5% chance of fog No FogSay Code Grey - 5% chance of fog-2 No FogSay Code Grey - 10% chance of fog-3 No FogSay Code Grey - 20% chance of fog-4 No FogSay TAF – Prob Fog-5
Ontology (POD – FAR) FORECAST OUTCOME Model PODModel FAR No fog10097 <5% Code Grey % Code Grey % Code Grey % Code Grey Fog on TAF
Fog decision network
Conclusions Small fragments of Bayesian Networks are beneficial in the forecasting domain The incremental development model supports the acceptance of the Bayesian Networks The ontology was found to be useful –for the explicit representation of all elicited knowledge including background information (variables, discretisation, arcs and probabilities) –for sharing information between domain experts and the knowledge engineer –as a guide for further elicitation –in supporting the domain experts in the construction of a Bayesian Network
Future Work Further development of the ontology More research on how to determine preferences Other forecasting case studies –Thunderstorms Testing Implementation issues
An Ontology-Based Approach to Building BNs for the Weather Forecasting Domain Tali Boneh Ann Nicholson, Kevin Korb (Monash University) John Bally (Bureau of Meteorology) Monash Bayesian Reasoning Workshop April 2006