Data dissemination in wireless computing environments
Introduction Characteristics of wireless computing environments Limited wireless channel bandwidth Unreliable transmission Asymmetric communication environments Limited effective battery lifespan Threat to security High cost
Design issues in wireless data dissemination Efficient wireless bandwidth utilization Efficient and effective scheduling strategies at the server Energy-efficient data access for battery-powered portable devices Support for disconnection Support for secured and reliable transmission
Models for information dissemination Point to point Push-based Pull-based Hybrid Static dynamic
Data broadcast scheduling Organization of broadcast for push-based broadcast system Access time Tuning time Broadcast program Flat program Skewed non-flat program Regular non-flat program Broadcast cycle
Generation of broadcast programs Flat program The union of all objects that needed by clients are broadcast The average access time is the same for all objects 20/80 rule Broadcast the frequently accessed objects more regularly than those that are less popular Naïve approach Probabilistically pick an object for transmission Problems?
Optimal broadcast program Copies of an object are equally spaced For any two objects x and y, fx/fy= fx: number of copies of item x in a broadcast cycle qx: access probabilities of item x It’s not always possible to generate such broadcast program
Broadcast disk Data is split into n partitions Data with similar access frequency is put in the same partition Partitions with larger access frequencies will be broadcast more often than those with smaller access frequencies
Scheduling strategies for pull-based broadcast system Methods FCFS LWF MRF R*W Performance metrics Responsiveness Scheduling overhead Robustness Fairness
Indexing on air Basic protocol for retrieving broadcast data
Flat broadcast programs with indexes (1,m) index A complete index is broadcast m times during a bcast All buckets have an offset to the beginning of the next index segment discussion High average access time Good tuning time Consideration Is it need to replicate the complete index between successive data blocks?
Tree-based index A data file is associated with a B+ tree index structure Broadcast media is a sequential medium, the data file and index must be flattened Preorder traversal First k levels of the index will be partially replicated in the broadcast, and the remaining levels will not be replicated All non-replicated buckets contain pointers that will direct the search to the next copy of its replicated ancestors
Hash-based index Data are hashed into a set of partitions Partitions may have different sizes (nonuniform distribution) Hpartition(k) : determines the partition that that object k belong to Hhash: determines the hash bucket that contains the shift pointer The gap between hash buckets is given by the size of the smallest partition Hhash = 1+(hpartition-1)*gap
Signature-based index An abstraction of the information stored in an object or a file K-levels of signature The higher the level is, the coarser the granularity of the grouping is Each integrated signature is broadcast before the corresponding group of objects. To reduce the number of false drops, the hashing functions used in generating the signatures at different levels should be different.
Flexible indexing scheme Split a sorted list of objects into several equal-sized segments At the beginning of each segment, there is a control index Global index Local index
Discussion
Broadcast program generation for skewed data access Access frequencies can be exploited to design index methods that further minimize the average number of index probes Two kinds of approach Imbalanced tree approach Non-flat broadcast programs with indexes
Non-flat broadcast programs with indexes Segment-level index Similar to the (1,m) index scheme Broadcast a full index at the beginning of each segment The broadcast program is generated under broadcast disks Problems The cost to find the index may be very large
Distributed indexing Each segment index is split into sub-indexes that are distributed within its corresponding segment