Download presentation
Presentation is loading. Please wait.
Published byAlisha Gregory Modified over 8 years ago
1
Dynamic, Rule-based Quality Control Framework for Real-time Sensor Data Wade Sheldon Georgia Coastal Ecosystems LTER University of Georgia
2
Introduction Quality Control of high volume, real-time data from automated sensors is an emerging challenge Traditional techniques (plotting, stats) often don’t scale well Data validation and Q/C can be limiting factor in getting data “online” Difficulties lead to release delays or posting provisional data Software developed at Georgia Coastal Ecosystems LTER has proven useful for Q/C of real-time data Designed to automate GCE data processing and metadata generation, but very generalized and supports any tabular data Provides dynamic, rule-based Q/C framework for data processing, analysis and synthesis
3
Framework Components Comprehensive data model Implemented as hierarchical MATLAB ‘structure’ arrays Package dataset & attribute metadata, data, Q/C rules, qualifier flags Metadata-based MATLAB software (GCE Data Toolbox) Automatic (rule-based) and manual assignment of Q/C qualifier flags Transparent management of flags throughout all data manipulation Q/C-aware data management and analysis tools Q/C-aware data integration and synthesis tools Modular implementation supports many scenarios Interactive (command-line API and GUI forms) Automated workflows (timed or triggered) End-to-end (logger-to-scientist) or part of larger workflow Runs natively on multiple platforms (PC, *nix, MacOS)
4
GCE Data Toolbox Data Model
5
Quality Control Rules Basic syntax: [logical expression]=’[flag code]’ Logical Expressions: Any conditional statement or call to MATLAB function that returns logical array (0 = false, 1 = true) Dataset columns referenced in statements as: “x” – alias for current column (e.g. x<0) “col_[name]” – any dataset column by name (e.g. “col_Depth<0”) Flag Codes: Alphanumeric character to assign when expression true ( I, q, 9, *) Codes defined in the dataset metadata (I = invalid value, …) Unlimited rules per attribute, multiple flags per value
6
Quality Control Rule Examples Numeric Comparisons: Simple: x<0=‘I’ (flags negative values) x 100=‘I’;x 80=‘Q’ (overlapping bounds checks)
7
Quality Control Rule Examples Numeric Comparisons: Simple: x<0=‘I’ (flags negative values) x 100=‘I’;x 80=‘Q’ (overlapping bounds checks) Statistical: x>(mean(x)+3*std(x))=‘Q’;x<(mean(x)-3*std(x))=‘Q’ (flags values more than 3 standard deviations from column mean)
8
Quality Control Rule Examples Numeric Comparisons: Simple: x<0=‘I’ (flags negative values) x 100=‘I’;x 80=‘Q’ (overlapping bounds checks) Statistical: x>(mean(x)+3*std(x))=‘Q’;x<(mean(x)-3*std(x))=‘Q’ (flags values more than 3 standard deviations from column mean) Multi-column: col_DOC>col_TOC=‘I’ (in column DOC; flags DOC exceeding TOC) col_Dry_Weight<(col_Wet_Weight-col_Ash_Weight)*0.90 =’I’ (flags dry weights below 90% wet weight – ash weight) col_Depth<0=‘I’ (in column Salinity; flags Salinity when Depth < 0)
9
Quality Control Rule Examples Numeric Comparisons: Simple: x<0=‘I’ (flags negative values) x 100=‘I’;x 80=‘Q’ (overlapping bounds checks) Statistical: x>(mean(x)+3*std(x))=‘Q’;x<(mean(x)-3*std(x))=‘Q’ (flags values more than 3 standard deviations from column mean) Multi-column: col_DOC>col_TOC=‘I’ (in column DOC; flags DOC exceeding TOC) col_Dry_Weight<(col_Wet_Weight-col_Ash_Weight)*0.90 =’I’ (flags dry weights below 90% wet weight – ash weight) col_Depth<0=‘I’ (in column Salinity; flags Salinity when Depth < 0) Compound (Boolean operators): col_RH_Percent>100&col_Precip 100% except during significant precipitation events)
10
Quality Control Rule Examples (cont.) Text Comparisons: “IS”, “NOT” for string literals, “IN”, “NOT IN” for lists flag_notinlist(x,’Spartina,Juncus,Zizaniopsis’)=‘Q’
11
Quality Control Rule Examples (cont.) Text Comparisons: “IS”, “NOT” for string literals, “IN”, “NOT IN” for lists flag_notinlist(x,’Spartina,Juncus,Zizaniopsis’)=‘Q’ Algorithmic Criteria (custom functions): fn(columns,parameters)=‘Q’ Various included Q/C functions pattern checks, geographic checks, specialized algorithms (O2 saturation, etc) User-defined functions: Any MATLAB code or “wrapped” calls to FORTRAN, Java, Python, etc Unlimited scope
12
Quality Control Rule Examples (cont.) Text Comparisons: “IS”, “NOT” for strings, “IN”, “NOT IN” for lists flag_notinlist(x,’Spartina,Juncus,Zizaniopsis’)=‘Q’ Algorithmic Criteria (custom functions): fn(parameters)=‘Q’ Various included Q/C functions pattern checks, geographic checks, specialized algorithms (O2 saturation, etc) User-defined functions: Any MATLAB code or “wrapped” calls to FORTRAN, Java, Python, etc Unlimited scope Full suite of MATLAB numeric analysis capabilities supported, and extensible to use other technology
13
Q/C Rule Management Rule definitions can be defined in metadata “templates”, automatically applied to attributes when raw data imported Rules can also be created, managed using a GUI form
14
Q/C Flag Assignment Q/C criteria evaluated to assign/clear flags when: Metadata template applied or Q/C criteria edited New data records, columns added Values edited (GUI) or columns updated (CLI) Evaluation function (dataflag) invoked directly Flags can also be assigned/cleared manually by: Clicking/dragging on plots with the mouse Using a spreadsheet-like grid Importing from text attributes (e.g. 3 rd party codes) Propagating flags from source column(s) to dependent column(s) Manual assignment locks flags by inserting “manual” token in criteria, removing “manual” restores automatic evaluation
15
Q/C-Aware Data Management & Analysis Q/C flags can be visualized in data editor grid and plots Flagged values can be selectively removed from data sets Statistics can be generated with/without flagged values Flags can be instantiated as coded text columns for export Flagged, missing values can be summarized by parameter and date for metadata
16
Q/C-Aware Data Synthesis Flagged, missing values summarized in re-sampled data (aggregated, binned, date-time resampled), with automatic Q/C rule creation Flags automatically “locked” when merging multiple data sets (i.e. unions) All Q/C operations logged to processing history, reported in metadata to document lineage
17
Implementation Scenarios End-to-End (logger-to-scientist) Acquire raw data from logger or file system (standard or custom import filters) Assign metadata from template or using forms to validate and flag data Review data and fine-tune flag assignments Generate distribution files & plots, archive data, index for searching Desktop data management solution Data Pre-processing Acquire, validate and flag raw data (on demand or timed/triggered) Upload processed data files (e.g. csv) or value & flag arrays to RDBMS Workflow Step Call toolbox functions as part of another workflow process, custom program Kepler MATLAB actor?
18
Suitability for Real-Time Sensor Data Good Scalability Data volumes only limited by computer memory (tested >2 GB data sets) Multiple instances can be run on high-end, 64bit, clustered workstations Good flag evaluation performance in use, testing with diverse rule sets Good scope for automation Timed and triggered workflow implementations easy to deploy Support for multiple I/O formats, transport protocols Formats: ASCII, MATLAB, SQL, XML (partially implemented) Transport: local file system, UNC paths, HTTP, FTP, SOAP Already used for real-time GCE data, USGS data harvesting service (LTER HydroDB, CWT)
19
Concluding Remarks Benefits Flexible, modular design No qualifier vocabulary, semantics assumed – many purposes, standards Many operations on flagged values – supports different strategies for archiving and distributing data at different processing levels Limitations Requires MATLAB Rule syntax environment-specific – a more open standard would be ideal Support for XML metadata immature (but more development planned) More information and downloads at: http://gce-lter.marsci.uga.edu/public/im/tools/data_toolbox.htm This work was supported by the National Science Foundation under grant numbers OCE-9982133 and OCE-0620959
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.