Download presentation
Presentation is loading. Please wait.
Published byMarian Oliver Modified over 8 years ago
1
Big Data Quality Challenges for the Internet of Things (IoT) Vassilis Christophides INRIA Paris (MUSE team)
2
The Internet of Things (IoT) 1 Networks of physical objects (aka. things) with embedded sensing and actuating capabilities that communicate with other objects and information systems item identification (tagging things) sensors (feeling things) nanotechnology (shrinking things) … “Things” are highly heterogeneous: Small (RFID tag) or Big (car) Fixed (fridge) or Mobile (activity tracker) Environment (thermostat) or Person- oriented (body analyzer)
3
Urban Computing 2
4
3 physical exercise sleep quality biochemical markers psychological state genetics Quantified Self
5
Big Data = Transactions+Interactions+Observations Increasing Data Variety, Velocity, & Veracity Megabytes Gigabytes Terabytes Petabytes IoT devices are reporting even more personal data than humans are! hortonworks.com/blog/7-key-drivers-for-the-big-data-market
6
Adapted from www- 05.ibm.com/fr/events/netezzaDM_2012/Solutions_Big_Data.pdf 5 The 5+1 Vs of Big Data Variability* Data in Change Evolving data distributions, models etc. Insights+ Understanding Automation+ Optimization Value
7
IoT Data Value Chain 6 http://www.codit.eu/how-can-we-help/internet-of-things Capture, track & monitor Transmit data to external env. Ingest, store & integrate data Analyze data, control & automate PROCESS ANALYSE
8
Agility: – Availability: almost real-time, any- place – Accessibility: mostly in verticals, privacy constrained Relevance: – Fitness: depends on the granularity of observations & measurements in thematic, spatial & temporal dimensions Usability: – Trustworthiness: observations or measurements in the wild 7 IoT Data Quality (DQ)
9
Reliability: – Accuracy: depends on device calibration & sensing method – Validity: depends on the resources constraints (connectivity, bandwidth, power, memory, storage & processing capabilities) of devices and data infrastructure – Completeness: due to data variety a complete domain knowledge is infeasible due to data variability domain knowledge quickly becomes obsolete – Integrity: usually relative to a collection of raw data series originating from different devices 8 IoT Data Quality (DQ) Ben Stansall/Agence France-Presse/Getty Images
10
In Search of IoT DQ Solutions Let the data speak for itself! – Learn models (semantics) from the data robust to the presence noise (and anomalies) – Detect deviations of data from learned models – Evolve learned models according to data deviations Computing with Big Data! – Volume: Scalable algorithms (efficiency vs accuracy) – Variety: Looking at condition and context of data deviations – Velocity: Incremental and online algorithms 9 Data Quality: the “other” Face of Big Data B. Saha, D. Srivastava ICDE 2013
11
Towards DQ-aware IoT Analytics 10 Analyze a single data stream: – How we can incrementally detect deviations from data regions of normal behavior? – How we can distinguish between data glitches, meaningful events or even malicious attacks? – What types of data deviations can be identified (distance, density, contextual) and at what granularity level? Analyze multiple data streams: – How we can compute online correlations across time/space in case missing or delayed data ? – How we can progressively evolve extracted knowledge patterns (motifs, episodes)?
12
11 http://fr.slideshare.net/wclquang/the-analytics-value- chain-key-to-delivering-business-value-in-iot Key Analytics to Delivering Value in IoT
13
Thank you! 12
14
The Three Domains of Information 13 Source: Barry Devlin, “The Big Data Zoo --- Taming the Beasts
15
Computing with Things: Challenges Things are different than servers in a Data Center: they are used in the wild, and they are often constrained by limited connectivity, bandwidth, power, memory, storage & processing capabilities Things are different from UI clients: they don’t usually dispose on-board an UI inheriting more by a M2M communication than UI client-to-server interaction paradigm Things may directly communication with peers: It isn't all thin-client communication to the parent server in the cloud and hub-and-spoke model presents serious limitations for very large number of devices 14
16
15 re-workblog.tumblr.com
17
16 http://www.kdnuggets.com/2015/08/patterns-streaming-realtime-analytics.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.