Download presentation
Presentation is loading. Please wait.
Published byEstella Farmer Modified over 8 years ago
1
Data Quality Processes in MMEA platform 6.11.2013
2
Topics -Quality control processing chain overview -Real time vs. non-real time time QC/AD -Current state of QC/AD in the MMEA platform -Planned work, Syke water quality case
3
Quality control processing chain overview
4
Real time vs. non-real time QC and AD -Real time QC and AD -Usually computationally inexpensive tasks -Range checks, missing data detection, etc. -Complex event processing with Esper -Non-real time QC and AD -Missing value imputation, trend analysis, modeling, etc. -Large datasets, computationally heavy tasks -Batch jobs -QC/AD Library
5
QC/AD Library A reusable set of Java classes for data quality control computations and anomaly detection The library is independent of MMEA-specific schemas or components Supports Java generics (computation parameters and return types can be simple primitive data types, but also complex ones, such as objects)
6
Complex event processing with Esper Detecting patterns from data streams. Queries in EPL (‘Event Processing Language’), resembles SQL Data streams are run against the queries. A listener is attached to the query. It reacts when a matching pattern is found.
7
Current state of QC/AD in the MMEA platform -Detection of anomalies from water level and pollen concentration forecasts could be implemented in the near future. -Oulu university has been developing models that could be integrated with the platform. -Planned Syke water quality case.
8
QC/AD in the MMEA platform QC1 QC2 QC0 Mediator
9
Anomaly detection example Poller
10
ComputationService Prototype was developed earlier this year. Runs in Tomcat. Web service interfaces for managing tasks: –Starting computation jobs –Terminating running jobs –Polling for job status
11
Planned work, Syke water quality case Integration of the SYKE water quality measurement service into MMEA platform. A user can ask the MMEA platform for phosphorus and suspended solid contents in water for a specified area. The quality of the data will be controlled and quality estimate will be returned to the user.
12
Planned work, Syke water quality case QC tests: –Missing data –Missing value –Variation –Range –Outlier detection –Trend analysis –Comparison with other relevant meteorological or hydrological data Óther computations: –The result of the query, phosphorus and suspended solid contents in water, are computed from turbidity information.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.