Presentation is loading. Please wait.

Presentation is loading. Please wait.

DEMS: A Data Mining Based Technique to Handle Missing Data in Mobile Sensor Network Applications Le Gruenwald Md. Shiblee SadikRahul Shukla Hanqing Yang.

Similar presentations


Presentation on theme: "DEMS: A Data Mining Based Technique to Handle Missing Data in Mobile Sensor Network Applications Le Gruenwald Md. Shiblee SadikRahul Shukla Hanqing Yang."— Presentation transcript:

1 DEMS: A Data Mining Based Technique to Handle Missing Data in Mobile Sensor Network Applications Le Gruenwald Md. Shiblee SadikRahul Shukla Hanqing Yang School of Computer Science University of Oklahoma Norman, Oklahoma, USA ggruenwald@ou.edu

2 Outline 2  Research objective  Current approaches  The proposed approach: DEMS  Performance Evaluation  Conclusions and future work

3 Mobile Sensor Networks 3  A typical mobile sensor network:  Sensor nodes are provided with motion capabilities  Sensor nodes can relocate themselves  Sensor nodes may move continuously/randomly  Sensor nodes may move periodically to make up for lost/missing sensors  Sensor nodes send data to a base station.

4 Missing Sensor Data 4  Missing sensor data = sensor readings that fail to reach the base station or are corrupted when reaching the base station  Reasons for missing sensor data:  Power shortage (sensor nodes are battery-powered)  Mal-functioning of sensor nodes (hardware failure)  Networking issues  Connection failures  Data package collision  etc.

5 Research Objective 5 Goal: Develop an effective algorithm to estimate missing sensors’ readings in a mobile sensor network application.

6 Research Issues  Issues common with static sensor networks:  Infiniteness, fast arrival rate, concept drifts  Additional issues: due to mobility of mobile sensors  Spatial relations:  The spatial relation between two sensors’ readings is distorted by the mobility of mobile sensors  Temporal relations:  The history data of a mobile sensor that are generated at different locations may not necessarily possess the temporal relationships with the data in the current round of sensor readings  Frequent power failure  Power outage is more common in mobile sensor network compared to static sensor network because mobility requires excessive power. 6

7 Current Approaches 7 Ignore missing data Ask the sensor again Add redundant sensors Estimate missing data Available approaches Use average of other sensors Use auto-regression model Use some other statistics based model (e.g., Kalman filter) Use data mining-based model Fig 1. A taxonomy of techniques for handling missing data Statistics based techniques

8 The Proposed Approach: DEMS 8  DEMS: Data Estimation for Mobile Sensors  Based on two important concepts:  Virtual Static Sensor (VSS)  A fictitious static sensor which mimics a real static sensor  helps reconstruct the spatial and temporal relations among the sensors’ readings  Association Rule Mining  A popular method of discovering relationships among different items  helps explore the relationships among sensors’ readings.

9 DEMS Components  DEMS has three major components:  Mapping Real Mobile Sensor (RMS) to Virtual Static Sensor (VSS)  Divides the entire area of coverage into small hexagons  A hexagon: the coverage area of a VSS with VSS being at the center of the hexagon  Converts RMS readings into VSS readings  Association rule mining  Constructs a novel data structure called MASTER-tree to capture the association rules among VSSs  Updates MASTER-trees to capture the most recent association rules among VSSs  Data estimation  Uses the most recent association rules to estimate a missing VSS reading  Uses the estimated value of the missing VSS reading as the value of the missing RMS reading. 9

10 DEMS: Mapping RSS to VSS  What is VSS?  A VSS is a fictitious static sensor  A VSS reading is based on one or more RMSs’ readings  A VSS has a unique identifier and has a unique area of coverage  Why do we need VSS?  Each VSS has a fixed location; hence the spatial relations among VSSs readings can be obtained  Each VSS reading is generated from a fixed location; hence history readings might have strong temporal relations with the current reading. 10

11 DEMS: Mapping RSS to VSS (Cont.)  How to construct a VSS?  We divide the entire monitoring area into small hexagons  A virtual static sensor is the center of a hexagon  Each hexagon is a coverage area of a virtual static sensor. 11 Fig 2. Monitoring area, hexagons and virtual static sensor a b

12 DEMS: Mapping RMS to VSS 12 Goal: map RMSs’ readings to VSSs’ readings so that spatial and temporal relations among the sensor readings can be restored.  Two types of mapping:  Mapping of a non-missing RMS to VSS  Mapping of a missing RMS to VSS

13 Fig 3. RMSs and VSSs DEMS: Mapping of a non-missing RMS to VSS  If a VSS contains one RMS within its coverage area, the RMS’s reading is used as the VSS reading  If a VSS contains more than one RMSs, the average of the RMSs’ readings is used as the VSS reading  If a VSS contains no RMS, the VSS is called inactive. 13

14 DEMS: Mapping of a missing RMS to VSS  Why mapping of a missing RMS is difficult?  RMS location is the key to RMS to VSS mapping  If a RMS is missing, it is very likely that its data and location would be missing together  Hence mapping of a missing RMS to VSS requires intelligence  The solution  A missing RMS is mapped to a VSS using a trajectory mining approach for location prediction [Morzy, 2007]. 14

15 DEMS: Mapping of a missing RMS to VSS (cont.)  What is a trajectory?  A trajectory is the sequence of hexagons that a mobile sensor traverses  If a mobile sensor is not missing, it reports its location and the location is contained by one hexagon  Hence the sequence of hexagons is called a trajectory. 15 Fig 4. Trajectory of a RMS (V 14,V 9,V 11,V 4,V 3,V 10 ) is the trajectory of M 1

16 DEMS: Mapping of a missing RMS to VSS (cont.)  Each RMS has a trajectory  DEMS periodically stores the trajectories (collected from all RMSs) into a frequency pattern tree  Frequency pattern tree  It has a root labeled null  Each node consists of an ID (hexagon ID) and count (number of times it appears in the trajectories) 16 Example: 5 trajectories 1. (V 14, V 9, V 11, V 4, V 2, V 8, V 1 ) 2. (V 14, V 9, V 11, V 4, V 3, V 10, V 1 ) 3. (V 14, V 9, V 5, V 4, V 3, V 10, V 8 ) 4. (V 14, V 9, V 11, V 4, V 3, V 10, V 1, V 8 ) 5. (V 2, V 3, V 6, V 10, V 8, V 1 ) Fig 5. A frequency pattern tree

17 DEMS: Mapping of a missing RMS to VSS (cont.)  If a RMS is missing, it is mapped to a VSS from the frequency pattern tree and its own trajectory  Consider the last known trajectory of M 1 : (V 14,V 9,V 11,V 4 ) 17 Fig 7. Frequency pattern tree Fig 6. Trajectory of a missing RMS V 3: Predicted next hexagon in trajectory of M 1

18 DEMS: Mapping RMS to VSS (cont.) 18 Procedure mapReal2Virtual(RealSensorData listRSData, VirtualSensorData listVSData) 1for each real sensor rs 2 if(rs is not missing) 3 location ← listRSData(rs).Location 4 vs ← findVirtualSensor(location) 5 listVSData(vs).addReading(listRSData(rs).Reading) 6 else 7 location ← predictLocation(rs) 8 vs ← findVirtualSensor(location) 9 listVSData(vs).status←missing 10end loop 11for each virtual static sensor vs 12 if(listVSData(vs) has data) 13 listVSData(vs).status←active 14listVSData(vs).reading←average(listVSData(vs).Readings) 15 else 16 if(listVSData(vs).status is not missing) 17 listVSData(vs).status ←inactive 18end loop end procedure Fig 8. Mapping algorithm

19 DEMS Components  DEMS has three major components:  Real Mobile Sensor (RMS) to Virtual Static Sensor (VSS)  Divides the entire area of coverage into small hexagons,  Each hexagon is the coverage area of a virtual static sensor where the virtual static sensor is assumed to be sitting in the middle of the hexagon,  Converts RMS readings into VSS readings.  Association rule mining  Constructs a novel data structure called MASTER-tree to capture the association rules among VSSs  Updates MASTER-trees to capture the most recent association rules among VSSs.  Data estimation  Uses the most recent association rules to estimate a missing VSS reading,  Uses the missing VSS reading as missing RMS reading. 19

20 DEMS: Association Rule Mining Goal: mine and represent the potential association rules among the VSS readings.  We propose a novel data structure (called MASTER-tree) to mine and represent the association rules among VSS readings  MASTER-tree basics:  A MASTER-tree is capable of mining any kind of association rules among any number of VSSs  A MASTER-tree represents potential association rules among the VSS readings  A path in MASTER-tree represents a potential association rule. 20 Fig 8. A MASTER-tree

21 DEMS: Association Rule Mining (cont.)  The potential number of association rules among VSSs grows exponentially with the number of VSSs  To restrict the number of association rules, DEMS clusters the VSSs into small groups and constructs one MASTER- tree for each group  DEMS uses Agglomerative clustering:  Agglomerative clustering starts with every VSS as an individual cluster  At each step it merges two closest clusters based on their pair- wise distances into one if the total number of VSSs in the new cluster does not exceed a user-defined maximum number of VSSs in one cluster. 21

22 DEMS: Association Rule Mining (cont.) Fig 9. A MASTER-tree without the grid stricture V2V2 Ø V1V1 V3V3 V1V1 V3V3 V3V3 V1V1 V3V3 V2V2 V1V1 V2V2 V2V2 Details

23 DEMS: The MASTER-tree Projection Module (cont.) 23 Fig 10. MASTER-tree with grid structure Ø Summary Stats V2V2 Summary Stats … Summary Stats Summary Stats V1V1 Summary Stats … Summary Stats Summary Stats V3V3 Summary Stats … Summary Stats Summary Stats V1V1 Summary Stats … Summary Stats Summary Stats V3V3 Summary Stats … Summary Stats Summary Stats V3V3 Summary Stats … Summary Stats Summary Stats V1V1 Summary Stats … Summary Stats Summary Stats V1V1 Summary Stats … Summary Stats Summary Stats V3V3 Summary Stats … Summary Stats Summary Stats V3V3 Summary Stats … Summary Stats Summary Stats V1V1 Summary Stats … Summary Stats

24 V 2 [11, 20], V 3 [11, 20] → V 1 [1, 20] Support : 60% Confidence: 66% DEMS: Association Rule Mining (cont.) 24 Fig 11. MASTER-tree with count 5 2 V2V2 3 … 2 V1V1 2 … 12 V3V3 2 … 1 V1V1 2 … 1 V3V3 3 … 1 V3V3 1 … 1 V1V1 1 … 1 2 V1V1 … 2 V3V3 … V3V3 2 … 1 V1V1 1 … V 2 [1, 10], V 1 [1, 10] → V 3 [11, 20] Support : 40% Confidence: 100% (2, 31, 485, 7657, 121937)

25 DEMS: Association Rule Mining (cont.) 25  Let  The minimum support 50%  The minimum confidence 50%  A typical association rule becomes  V 2 [11, 20], V 3 [11, 20] → V 1 [1, 20]  The rule meaning: if the VSS reading of V 2 is within 10 to 20 and the VSS reading for V 3 is within 10 to 20, the VSS reading for V 1 is most likely within 0 to 20.  There exists a path from the root node to V 1 [1, 20] via V 2 [11, 20] and V 3 [11, 20] in the Master-tree.

26 DEMS Components  DEMS composed of three major components  Real Mobile Sensor (RMS) to Virtual Static Sensor (VSS)  Divides the entire area of coverage into small hexagons,  Each hexagon is the coverage area of a virtual static sensor where the virtual static sensor is assumed to be sitting in the middle of the hexagon,  Converts RMS readings into VSS readings.  Association rule mining  Construct a novel data structure called MASTER-tree to capture the association rules among VSSs,  Update MASTER-trees to capture most recent association rules among VSSs.  Data estimation  Uses the most recent association rules to estimate a missing VSS reading  Uses the estimated value of the missing VSS reading as the estimated value of the missing RMS reading. 26

27 DEMS: Data Estimation Goal: estimate the missing VSS reading.  The data estimation modules estimates the missing VSS  The estimated reading for the missing VSS is used as the estimated reading for the missing RMS. 27

28 DEMS: Data Estimation (cont.) 28 Fig 12. Flowchart of the data estimation module (A step by step example)A step by step example

29 Performance Evaluation 29  Simulation Model  We simulate the missing data for our datasets  A sensor is missing randomly (approximately 5-10%) for a consecutive random number (10 - 20) of rounds  Data and location both are missing for a missing sensor  We use DEMS, TinyDB, SPIRIT and Average method to estimate missing readings  TinyDB  An average based technique which estimates the missing data by taking the average of the readings from other sensor readings in the current round.  SPIRIT  An auto-regression based technique which estimates the missing data based on the readings in the previous rounds  Average  The average of other sensor readings is used as the estimated reading  We compare the techniques based on mean absolute error (MAE)  MAE = Σ|estimation error|/number of estimations.

30 Performance Evaluation (cont.) 30  Datasets  DAPPLE Project Dataset: A real life dataset  The carbon monoxide (CO) readings in the range [0, 6] were collected over a period of two weeks around Marylebone Road in London  The mobile sensors monitoring the atmospheric CO level are attached to PDAs which store these readings  We chose Thursday, 20th May 2004, when three sensors were simultaneously recording for about 32 minutes, resulting in 600 rounds (after disregarding the missing rounds) of CO readings  Factory Floor Temperature Dataset: A synthetic dataset  A simulation of a mobile sensor network for monitoring factory floor temperatures  Machines are placed on a floor  Some machines are turned on for a number of rounds; the temperatures on these machines reach a high constant temperature and heat disperse on the floor.  100 mobile sensors were roaming around in random directions to monitor the factory floor and report the temperature readings in the range [0, 100C] from different locations.  The mobile sensor readings were sampled once per hour; the total rounds of readings are 5000 from 100 mobile sensors.

31 Performance Evaluation (cont.) 31 Fig 13. Impacts of number of rounds on MAE for DAPPLE project dataset ApproachAverage MAE DEMS0 Average1.2717 TinyDB0.6331 SPIRIT0.9437 Table 1. Average MAE for DAPPLE project dataset

32 Performance Evaluation (cont.) 32 Fig 14. Impacts of number of rounds on MAE for factory floor dataset ApproachAverage MAE DEMS2.2538 Average14.778 TinyDB6.9621 SPIRIT4.7472 Table 2. Average MAE for factory floor dataset

33 Conclusions and Future Work 33  We proposed DEMS:  A novel data estimation technique for mobile sensor networks based on data mining and virtual static sensor concepts  Estimates missing sensor data with high accuracy  Future work: Extend DEMS to include  Multiple base stations  De-synchronized mobile sensor networks  Cluster sensor networks.

34 Thanks 34  Questions?

35 MASTER-tree Construction Ø Fig. Merged tree for figure a and b S2S2 S1S1 S3S3 S1S1 S3S3 S3S3 S3S3 S2S2 S1S1 S2S2 S2S2 Fig (a). A Pattern tree for S 3 S2S2 Ø S1S1 S3S3 S1S1 S3S3 S3S3 S3S3 Fig. (c) A Pattern tree for S 1 S2S2 Ø S3S3 S1S1 S3S3 S1S1 S1S1 S1S1 Fig (b). A Pattern tree for S 1 S3S3 Ø S1S1 S2S2 S1S1 S2S2 S2S2 S2S2 Ø Fig. Merged tree for figure a, b and c S2S2 S1S1 S3S3 S1S1 S3S3 S3S3 S1S1 S3S3 S2S2 S1S1 S2S2 S2S2 Back

36 MASTER-tree projection and data estimation: An example Simulation

37 Assume…  Three Node (A, B, C)  One dimension of Data (Temperature)  Upper bound 30 lower bound 0, cell size = 10  dis(A,B) = 4, dis(A,C) = 3 and dis(B,C) = 5  MCSS = 10  minSup = 25%  minConf = 75% C AB

38 Pattern trees Ø ABC BCC C Ø ACB CBB B Ø CBA BAA A Ø AB BC CB AC C BA A C A B Final MASTER tree without GS Pattern tree for C Pattern tree for B Pattern tree for A

39 Data Sequence A414111868 B8 15221012 C71714219?

40 A B C BC A A A BC BC Ø A414111868 B8 15221012 C71714219?

41 A B C BC A A A BC BC Ø A414111868 B8 15221012 C71714219? BC CB ACB A A

42 A B C BC A A A BC BC Ø A414111868 B8 15221012 C71714219? BC CB ACB A A

43 A B C BC A A A BC BC Ø A414111868 B8 15221012 C71714219? BC CB ACB A A CB ACB A A

44 A B C BC A A A BC BC Ø A414111868 B8 15221012 C71714219? BC CB ACB A A CB ACB A A C A

45 A B C BC A A A BC BC Ø A414111868 B8 15221012 C71714219? BC CB ACB A A CB ACB A A C A

46 A B 221 BC A A A BC BC Ø A414111868 B8 15221012 C71714219? BC CB ACB A A CB ACB A A C A MCSS = 10 Rule: Ø → C = [0, 29] Supp = 100% Conf = 100%

47 A B 221 BC A A A BC BC Ø A414111868 B8 15221012 C71714219? BC CB ACB A A CB ACB A A C A MCSS = 10 Rule: Ø → C = [0, 19] Supp = 80% Conf = 80%

48 23 B 221 B2 A A A BC BC Ø A414111868 B8 15221012 C71714219? BC CB ACB A A CB ACB A A C A MCSS = 10 Rule: Ø → C = [0, 29] Supp = 80% Conf = 80% Rule: A → C = [0, 9] Supp = 40% Conf = 100% Back to presentation


Download ppt "DEMS: A Data Mining Based Technique to Handle Missing Data in Mobile Sensor Network Applications Le Gruenwald Md. Shiblee SadikRahul Shukla Hanqing Yang."

Similar presentations


Ads by Google