Department of Computer Science Provenance-based Trustworthiness Assessment in Sensor Networks Elisa Bertino CERIAS and Department of Computer Science,

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Fast Algorithms For Hierarchical Range Histogram Constructions

Yicheng Tu, § Shaoping Chen, §¥ and Sagar Pandit § § University of South Florida, Tampa, Florida, USA ¥ Wuhan University of Technology, Wuhan, Hubei, China.

Fault-Tolerant Target Detection in Sensor Networks Min Ding +, Dechang Chen *, Andrew Thaeler +, and Xiuzhen Cheng + + Department of Computer Science,

Infocom'04Ossama Younis, Purdue University1 Distributed Clustering in Ad-hoc Sensor Networks: A Hybrid, Energy-Efficient Approach Ossama Younis and Sonia.

Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,

All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.

An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Practical Belief Propagation in Wireless Sensor Networks Bracha Hod Based on a joint work with: Danny Dolev, Tal Anker and Danny Bickson The Hebrew University.

Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.

On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.

Ensemble Learning: An Introduction

Adaptive Sampling in Distributed Streaming Environment Ankur Jain 2/4/03.

Evaluating Hypotheses

Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.

1 Localization Technologies for Sensor Networks Craig Gotsman, Technion/Harvard Collaboration with: Yehuda Koren, AT&T Labs.

Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.

Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.

1 Efficient Placement and Dispatch of Sensors in a Wireless Sensor Network Prof. Yu-Chee Tseng Department of Computer Science National Chiao-Tung University.

A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.

SIGMOD'061 Energy-Efficient Monitoring of Extreme Values in Sensor Networks Adam Silberstein Kamesh Munagala Jun Yang Duke University.

Exposure In Wireless Ad-Hoc Sensor Networks S. Megerian, F. Koushanfar, G. Qu, G. Veltri, M. Potkonjak ACM SIG MOBILE 2001 (Mobicom) Journal version: S.

Energy-efficient Self-adapting Online Linear Forecasting for Wireless Sensor Network Applications Jai-Jin Lim and Kang G. Shin Real-Time Computing Laboratory,

Chapter 14 Introduction to Linear Regression and Correlation Analysis

Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.

Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.

Friends and Locations Recommendation with the use of LBSN

UNIVERSITY of NOTRE DAME COLLEGE of ENGINEERING Preserving Location Privacy on the Release of Large-scale Mobility Data Xueheng Hu, Aaron D. Striegel Department.

An Example Use Case Scenario

Aggregation in Sensor Networks

Trustworthiness Management in the Social Internet of Things

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.

Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 A Statistics-Based Sensor Selection.

De-Nian Young Ming-Syan Chen IEEE Transactions on Mobile Computing Slide content thanks in part to Yu-Hsun Chen, University of Taiwan.

Using Pattern of Social Dynamics in the Design of Social Networks of Sensors - Marello Tomasini, Franco Zambonelli, Ronaldo Menezes 한국기술교육대학교 전기전자통신 공학부.

OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :

TRICKLE: A Self-Regulating Algorithm for Code Propagation and Maintenance in Wireless Sensor Networks Philip Levis, Neil Patel, Scott Shenker and David.

Multi-Resolution Spatial and Temporal Coding in a Wireless Sensor Network for Long-Term Monitoring Applications You-Chiun Wang, Member, IEEE, Yao-Yu Hsieh,

Communication Paradigm for Sensor Networks Sensor Networks Sensor Networks Directed Diffusion Directed Diffusion SPIN SPIN Ishan Banerjee

Taiming Feng, Chuang wang, Wensheng Zhang and Lu Ruan INFOCOM 2008 Presented by Mary Nader.

A new Ad Hoc Positioning System 컴퓨터 공학과 오영준.

Probabilistic Coverage in Wireless Sensor Networks Authors : Nadeem Ahmed, Salil S. Kanhere, Sanjay Jha Presenter : Hyeon, Seung-Il.

Efficient Energy Management Protocol for Target Tracking Sensor Networks X. Du, F. Lin Department of Computer Science North Dakota State University Fargo,

Group 8: Denial Hess, Yun Zhang Project presentation.

Analyzing wireless sensor network data under suppression and failure in transmission Alan E. Gelfand Institute of Statistics and Decision Sciences Duke.

Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉教授 : 許毅然作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.

Ahmad Salam AlRefai.  Introduction  System Features  General Overview (general process)  Details of each component  Simulation Results  Considerations.

Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Distributed Ranked Data Dissemination in Social Networks Joint work with: Mo Sadoghi Vinod Muthusamy Hans-Arno.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.

Department of Computer Science The Challenge of Assuring Data Trustworthiness Elisa Bertino CERIAS and CS Department Purdue University

Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,

Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.

Advanced Database Aggregation Query Processing

Online Conditional Outlier Detection in Nonstationary Time Series

A paper on Join Synopses for Approximate Query Answering

Wireless Sensor Network Architectures

Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.

Range-Efficient Computation of F0 over Massive Data Streams

Probabilistic Databases

Pei Lee, ICDE 2014, Chicago, IL, USA

Distributed Edge Computing

COMP60611 Fundamentals of Parallel and Distributed Systems

Presentation transcript:

Department of Computer Science Provenance-based Trustworthiness Assessment in Sensor Networks Elisa Bertino CERIAS and Department of Computer Science, Purdue University, USA (Joint work with Hyo-Sang Lim and Yang-Sae Moon)

Department of Computer Science Data Streams Everywhere New computing environments –Ubiquitous/mobile computing, embedded systems, and sensor networks New applications –Traffic control systems monitoring data from mobile sensors –Location based services (LBSs) based on user's continuously changing location –e-healthcare systems monitoring patient medical conditions –Real-time financial analysis What are we interested in? –Data is originated by multiple distributed sources –Data is processed by multiple intermediate agents Assessing data trustworthiness is crucial for mission critical applications –Knowing where the data comes from is crucial for assessing data trustworthiness = data provenance 2 where the data comes from = how much we can trust a data item to be correct data trustworthiness

Department of Computer Science What is Provenance? In general, the origin, or history of something is known as its provenance. In the context of computer science, data provenance refers to information documenting how data came to be in its current state - where it originated, how it was generated, and the manipulations it underwent since its creation. 3

Department of Computer Science Focus of Our Work Data Provenance Data Trustworthiness assessed with in Data Stream Environments (especially, in sensor networks) 4

Department of Computer Science Sensor Networks Sensor networks collect large amounts of data that can convey important information for critical decision making. e.g., traffic monitoring, healthcare monitoring, environment/habitat monitoring, SCADA (supervisory control and data acquisition) systems, … In near future, –a large number of cheap and tiny sensor nodes will be deployed everywhere (e.g., Smart Dust project (UC Berkeley) to create grain-of-sand sized sensors) –there can be multiple sensor nodes monitoring a same event (e.g., mobile sensors deployed in vehicles which are moving in a same street) Data trustworthiness problems –Sensor nodes deployed in hostile environments can be manipulated by enemies. –Sensing accuracy can be temporally dropped due to environmental changes such as severe weather. –Malfunction, low battery, … 5

Department of Computer Science An Example Senor Network: Battlefield Monitoring Sensor Network Region ARegion BRegion C

Department of Computer Science What Makes It Difficult to Solve? Data stream nature –Data arrives rapidly  real-time processing requirement  high performance processing –Unbounded in size  not possible to store the entire set of data items  only can sequentially access data provenance –Dynamic/adaptive processing –Sometimes, only approximate (or summary) data are available Provenance nature –Annotation  increased as it is transmitted from the source to the server (i.e., snowballing effect) –Interpretation semantics differ from usual data Sensor network nature –Provenance processing in the intermediate node (e.g., provenance information can be merged/separated/manipulated) –Hierarchical structure for network and provenance 7

Department of Computer Science Our Solution: A Cyclic Framework for Assessing Data Trustworthiness

Department of Computer Science Modeling Sensor Networks and Data Provenance A sensor network be a graph, G(N,E) –N = { n i |n i is a network node of which identifier is i } : a set of sensor nodes a terminal node generates a data item and sends it to one or more intermediate or server nodes an intermediate node receives data items from terminal or intermediate nodes, and it passes them to intermediate or server nodes a server node receives data items and evaluates continuous queries based on those items –E = { e i,j | e i,j is an edge connecting nodes n i and n j.} : a set of edges connecting sensor nodes A data provenance, p d –p d is a subgraph of G server node intermediate nodes terminal nodes (a) a physical network(b) a simple path(c) an aggregate path(d) an arbitrary graph

Department of Computer Science Trust scores : quantitative measures of trustworthiness –[0, 1] : 0 means totally untrustworthy, 1 means totally trustworthy –We use 0.5 as an initial trust score since it means uncertain provides an indication about the trustworthiness of data items/sensor nodes and can be used for comparison or ranking purpose Two types of trust scores –Data trust scores : indicate about how much we can trust the correctness of the items –Node trust scores : indicate about how much we can trust the sensor nodes to collect correct data Interdependency between data and node trust scores Node Trust ScoresData Trust Scores The trust score of the data affects the trust score of the sensor nodes that created the data The trust score of the node affects the trust score of the data created by the node data arrives incrementally in data stream environments 10 A Cyclic Framework for Computing Trust Scores

Department of Computer Science Trust score of a data item d –The current trust score of d is the score computed from the current trust scores of its related nodes. –The intermediate trust score of d is the score computed from a set (d  ) D of data items of the same event. –The next trust score of d is the score computed from its current and intermediate scores. Trust score of a sensor node n –The intermediate trust score of n is the score computed from the (next) trust scores of data items. –The next trust score of n is the score computed from its current and intermediate scores. –The current trust score of n, is the score assigned to that node at the last stage. Current trust scores of nodes ( ) Next trust scores of nodes ( ) Intermediate trust scores of nodes ( ) + Current trust scores of data items ( ) Intermediate trust scores of data items ( ) Next trust scores of data items ( ) A set of data items of the same event in a current window A Cyclic Framework for Computing Trust Scores

Department of Computer Science Intermediate Trust Scores of Data (in more detail) Data trust scores are adjusted according to the data value similarities and the provenance similarities of a set of recent data items (i.e., history) –The more data items have similar values, the higher the trust scores of these items are –Different provenances of similar data values may increase the trustworthiness of data items Current trust scores of nodes ( ) Next trust scores of nodes ( ) Intermediate trust scores of nodes ( ) + Current trust scores of data items ( ) Intermediate trust scores of data items ( ) Next trust scores of data items ( ) A set of data items of the same event in a current window Similar Data ValueDifferent Data Value Similar Provenancescore ↑ score ↓↓↓ (conflict) Different Provenance score ↑↑↑ (cross checked) score ↓ 12

Department of Computer Science Using Data Value and Provenance Similarities Initiating with data value similarities –with the mean μ and variance σ 2 of the history data set D, we assume the current input data follow a normal distribution N (μ, σ 2 ) –because the mean μ is determined by the majority values in D, if x is close to the mean, it is more similar with the other values; if x is far from the mean, it is less similar with the other values. –with this observation, we obtain the initial intermediate score of d (whose value is v d ) as the integral area of f(x) a probability density function, where x is the data value of a data item d 13 )

Department of Computer Science Using Data Value and Provenance Similarities Adjusting with provenance similarities –we define the similarity function between two provenances p i, p j as sim(p i, p j ) sim(p i, p j ) returns a similarity value in [0, 1] it can be computed from the tree or graph similarity measuring algorithms –from the observation of value and provenance similarities, given two data items d, t  D, their values v d, v t, and their provenances p d, p t (here, notation ‘  ’ means “is similar to”, and notation ‘  ’ means “is not similar to”) if p d  p t and v d  v t, the provenance similarity makes a small positive effect on ; if p d  p t and v d  v t, the provenance similarity makes a large negative effect on ; if p d  p t and v d  v t, the provenance similarity makes a large positive effect on ; if p d  p t and v d  v t, the provenance similarity makes a small positive effect on ; –then, we first calculate the adjustable similarity between d and t, where dist(v d, v t ) is a distance between two values, δ 1 is a threshold that v d and v t are treated to be similar; δ 2 is a threshold to be not similar –with the sum of adjustable similarity of d, we adjust v d to

Department of Computer Science The next trust core is computed as c d s d + (1- c d ) Where c d is constant ranging in [0,1] –If c d is small trust scores evolve fast –If c d is large trust scores evolve slowly –In the experiments we set it to 1/2 Computing Next Trust Scores

Department of Computer Science Experimental Evaluation Simulation –Sensor network as an f-ary complete tree whose fanout and depth are f and h, respectively –Synthetic data that has a single attribute whose values follow a normal distribution with mean μ i and variance σ i 2 for each event i (1 ≤ i ≤ N event ) –Data items for an event are generated at N assign leaf nodes and the interval between the assigned nodes is N interleave –The number of data items in windows (for evaluating intermediate trust scores) is ω Goal of the experiments: showing efficiency and effectiveness of our cyclic framework 16

Department of Computer Science Experiment 1 Computation Efficiency of the Cyclic Framework Measure the elapsed time for processing a data item with our cyclic framework For showing scalability, we vary 1) the size of sensor networks (i.e., h) and 2) the number of data items for evaluating data trust scores (i.e., ω) The results show a reasonable computation overhead and scalability both with the size of sensor network and the number of data items in windows 17

Department of Computer Science Experiment 2 Effectiveness of the Cyclic Framework Inject incorrect data items into the sensor network, and then observed the change of trust scores of data items For observing the effect of provenance similarities, we vary the interleaving factor (i.e., N interleave )  if N interleave increases, the provenance similarity decreases Figure (a) shows the changes in the trust scores when incorrect data items are injected, and Figure (b) shows when the correct data items are generated again In both cases, we can see that our cyclic frame evolves trust scores correctly The results also show that our principles –different values with similar provenance result in a large negative effect –similar values with different provenance result in a large positive effect are correct 18

Department of Computer Science Conclusions We address the problem of assessing data trustworthiness based on its provenance in sensor networks We propose a solution providing a systematic approach for computing and evolving the trustworthiness levels of data items and network nodes Future work - more accurate computation of trust scores - secure delivery of provenance information - trust scores for aggregation and join in sensor networks 19