Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Quality Challenges for the Internet of Things (IoT) Vassilis Christophides INRIA Paris (MUSE team)

Similar presentations


Presentation on theme: "Big Data Quality Challenges for the Internet of Things (IoT) Vassilis Christophides INRIA Paris (MUSE team)"— Presentation transcript:

1 Big Data Quality Challenges for the Internet of Things (IoT) Vassilis Christophides INRIA Paris (MUSE team)

2 The Internet of Things (IoT) 1  Networks of physical objects (aka. things) with embedded sensing and actuating capabilities that communicate with other objects and information systems item identification (tagging things) sensors (feeling things) nanotechnology (shrinking things) … “Things” are highly heterogeneous: Small (RFID tag) or Big (car) Fixed (fridge) or Mobile (activity tracker) Environment (thermostat) or Person- oriented (body analyzer)

3 Urban Computing 2

4 3 physical exercise sleep quality biochemical markers psychological state genetics Quantified Self

5 Big Data = Transactions+Interactions+Observations Increasing Data Variety, Velocity, & Veracity Megabytes Gigabytes Terabytes Petabytes IoT devices are reporting even more personal data than humans are! hortonworks.com/blog/7-key-drivers-for-the-big-data-market

6 Adapted from www- 05.ibm.com/fr/events/netezzaDM_2012/Solutions_Big_Data.pdf 5 The 5+1 Vs of Big Data Variability* Data in Change Evolving data distributions, models etc. Insights+ Understanding Automation+ Optimization Value

7 IoT Data Value Chain 6 http://www.codit.eu/how-can-we-help/internet-of-things Capture, track & monitor Transmit data to external env. Ingest, store & integrate data Analyze data, control & automate PROCESS ANALYSE

8  Agility: – Availability: almost real-time, any- place – Accessibility: mostly in verticals, privacy constrained  Relevance: – Fitness: depends on the granularity of observations & measurements in thematic, spatial & temporal dimensions  Usability: – Trustworthiness: observations or measurements in the wild 7 IoT Data Quality (DQ)

9  Reliability: – Accuracy: depends on device calibration & sensing method – Validity: depends on the resources constraints (connectivity, bandwidth, power, memory, storage & processing capabilities) of devices and data infrastructure – Completeness: due to data variety a complete domain knowledge is infeasible due to data variability domain knowledge quickly becomes obsolete – Integrity: usually relative to a collection of raw data series originating from different devices 8 IoT Data Quality (DQ) Ben Stansall/Agence France-Presse/Getty Images

10 In Search of IoT DQ Solutions  Let the data speak for itself! – Learn models (semantics) from the data robust to the presence noise (and anomalies) – Detect deviations of data from learned models – Evolve learned models according to data deviations  Computing with Big Data! – Volume: Scalable algorithms (efficiency vs accuracy) – Variety: Looking at condition and context of data deviations – Velocity: Incremental and online algorithms 9 Data Quality: the “other” Face of Big Data B. Saha, D. Srivastava ICDE 2013

11 Towards DQ-aware IoT Analytics 10  Analyze a single data stream: – How we can incrementally detect deviations from data regions of normal behavior? – How we can distinguish between data glitches, meaningful events or even malicious attacks? – What types of data deviations can be identified (distance, density, contextual) and at what granularity level?  Analyze multiple data streams: – How we can compute online correlations across time/space in case missing or delayed data ? – How we can progressively evolve extracted knowledge patterns (motifs, episodes)?

12 11 http://fr.slideshare.net/wclquang/the-analytics-value- chain-key-to-delivering-business-value-in-iot Key Analytics to Delivering Value in IoT

13 Thank you! 12

14 The Three Domains of Information 13 Source: Barry Devlin, “The Big Data Zoo --- Taming the Beasts

15 Computing with Things: Challenges  Things are different than servers in a Data Center: they are used in the wild, and they are often constrained by limited connectivity, bandwidth, power, memory, storage & processing capabilities  Things are different from UI clients: they don’t usually dispose on-board an UI inheriting more by a M2M communication than UI client-to-server interaction paradigm  Things may directly communication with peers: It isn't all thin-client communication to the parent server in the cloud and hub-and-spoke model presents serious limitations for very large number of devices 14

16 15 re-workblog.tumblr.com

17 16 http://www.kdnuggets.com/2015/08/patterns-streaming-realtime-analytics.html


Download ppt "Big Data Quality Challenges for the Internet of Things (IoT) Vassilis Christophides INRIA Paris (MUSE team)"

Similar presentations


Ads by Google