Department of Computer Science Improving Search Efficiency Using Bloom Filters in Partially Connected Ad Hoc Networks: A Node-centric Analysis Wing Ho Yuen and Henning Schulzrinne Department of Computer Science Columbia University www.cs.columbia.edu/IRT/projects
7DS Application Motivation: Internet access is not ubiquitous More Wi-Fi hotspots? Ad hoc network to extend coverage of hotspots? Node density insufficient to sustain connected networks Instead of dropping packets, should store and forward in node and AP encounters Goal To emulate Internet services when users are disconnected such as email delivery and web access
3 Categories of 7DS Application Download App Subway map download Upload App Email delivery P2P App music exchanges
Data Retrieval Problem Push-based Data holder (DH) acts as server Data querier (DQ) acts as client Push-based Small query overhead Summary overhead Pull-based (Query-based) Large query overhead summary data query DH DQ Pull-based data query DH DQ
Bloom Filter Used in supporting membership queries Data Structure DH transmits a Bloom filter DQ queries an object only if a match occurs Data Structure Bloom filter consists of a binary m-bit vector DH has n data objects x1,…,xn Each object hashed by k independent hash functions h1,…,hk with range{0,1,2…,m-1} if h(x1)=a, set BF[a]=1
Bloom Filter Example Bloom filter length (m=12) # hash functions (k=3) Data Holder has {x1,x2}, Data Querier wants y1,x2,y2 Testing with Bloom filter y1 not available x2 is available (false negative not possible) Y2 is available (false positive possible) x2 x1 1 x2 y1 1 y2
Single Neighbor Scenario One DH; one DQ Utilization Fraction of time used for data transmission measures inefficiency due to query overhead and Bloom Filter overhead Assuming query success probability is constant Result E[Z]: mean data Tx time E[Y]: mean query Tx time DQ DH
- Utilization E[T] E[T] BF length E[T] #hash fcns E[T]: expected connection time Small # hash functions, small complexity High utilization Small Bloom filter overhead E[T]
Multiple Neighbors Scenario DH DH Querying is more effective More DHs to answer a query Multiple Bloom filter transmissions in a single busy period of an observer node Node-centric model DQ DH DH DQ DH DQ BF BF BF BF DH BF BF BF BF BF DQ DH BF BF BF DQ DQ
Search Efficiency K=0 K=1 K=2 BF BF BF BF B I 1 cycle Timeline Fraction of effective busy time fraction of green colored blocks over 1 cycle utilization gives the fraction of data transmission time in the green colored blocks K=0 K=1 K=2 BF BF BF BF B I 1 cycle
Queueing Formulation Observer node is a server, providing service to every node in range Arrival occurs when a node enters observer range N(t) neighbors receives service at time t Modeled by M/M/∞ queue n neighbors, departure rate is n N(t) is the system state =/ is the average number of nodes seen by observer 1 2 3 4 5 6 3 2 5 4 6 7
Bloom Filter Based Scheme TBF Data+Query Binit Bsub,1 Bsub,2 Bsub,K tK tK+1 t0 t1 t2 t3 Sub-busy period begins at random N(tk), e.g. N(t1)=3 Busy period ends at tK+1 when N(tk+1)=0
Efficiency vs. Low bandwidth scenario High bandwidth scenario
Conclusion Push is better than pull Suitable for web access where query success probability is small Node-centric model is more versatile than location-centric model Realistic mobility model Both node encounters and residence time can be measured online Realistic interference model Poisson field of interferers rather than collocated nodes