Presentation is loading. Please wait.

Presentation is loading. Please wait.

Querying The Internet With PIER Nitin Khandelwal.

Similar presentations


Presentation on theme: "Querying The Internet With PIER Nitin Khandelwal."— Presentation transcript:

1 Querying The Internet With PIER Nitin Khandelwal

2 Motivation Inject a degree of distribution into databases Internet scale systems vs. hundred node systems Large scale applications requiring database functionaity

3 Applications P2P Databases Highly distributed and available data Network Monitoring Intrusion detection Fingerprint queries

4 Design Principles Relaxed Consistency Sacrifice Consistency in face of Availability and Partition tolerance Organic Scaling Growth with deployment Natural Habitats for Data Data remains in original format with a DB interface Standard Schemas Achieved though common software

5 DHTs Implemented with CAN (Content Addressable Network). Node identified by hyper-rectangle in d-dimensional space Key hashed to a point, stored in corresponding node. Routing Table of neighbours is maintained. O(d)

6 DHT Design Routing Layer Mapping for keys (-- dynamic as nodes leave and join) Storage Manager DHT based data Provider Storage access interface for higher levels

7 Provider Couples the routing and storage layers namespace – relation resourceId – primary key namespace + resourceId >> key instanceId – distinguishes objects with same namespace and resourceID lifetime – item storage duration LScan, Multicast, Newdata

8 PIER Query Processor Operators: Selection, proj, joins, grouping, agg Operators push and pull data Relaxed Consistency and reachable snapshot: - working with nodes reachable at query issue. - Instead, use arrival of query multicast message.

9 Join Algorithm R, S – relations Nr, Ns – relation namespaces Nq - DHT-based temporary table Symmetric Hash Join: - Rehashes the relations - Scan and copy in new namespace Nq Fetch Matches - One relation(S) already hashed on join attribute - Selections on non-join attributes of S cannot be pushed into the DHT

10 Join Rewriting Aimed at lowering the bandwidth utilization Symmetric semi-join - Local projections to Resource ID + join keys - Symmetric Hash Join on two projections - Global fetch matches join using Resource Ids of R and S Bloom joins(Hashed semi-join) - Bloom filter is hashing based bit-vector - Local bloom filters are published into temporary namespaces - Filters are OR-ed and multicast to opposite relation’s nodes

11 Workload Parameters CAN configuration: d = 4 R 10 times larger than S Constants provide 50% selectivity f(x,y) evaluated after the join 90% of R tuples match a tuple in S Result tuples are 1KB each Symmetric hash join used

12 Simulation Setup Up to 10,000 nodes Network cross-traffic, CPU and memory utilizations ignored Data shipped from source to computation node for every query operation 1. 100ms and 10Mbps fully connected links 2. GT-ITM transit-stub topology (similar results)

13 Join Algorithms Infinite Bandwidth (Observe Impact of just propagation delay) 1024 data and computation nodes Core Join Algorithms : Performs faster Rewrites: Bloom Filter: two multicasts Semi-join: two CAN lookups

14 Join Algorithms -- 2 Limited Bandwidth Symmetric Hash Join: - Rehashes both tables Semi Joins: - Transfer only matching tuples At 40% selectivity, bottleneck switches from computation nodes to query sites

15 Conclusions Scalability of PIER dervies from relaxed design principles - adoption of soft states - dilated snapshot semantics Limitation: Just equality predicates  Directions: - Pushdown of selections into DHT - Caching and replication of DHT data - Catalog Manager – Stringent consistency and availability requirements.

16 Sophia: An Information Plane Nitin Khandelwal

17 Shared Information Plane Distributed System running throughout the network. - Collects information about network elements Local state(load/memory usage), local perspective (reachability of other nodes) - Evaluate statements(questions) about the state - Reacting according to conclusions Killing misbehaving service

18 Challenges Information is widely distributed and dynamic Statements formulated at run-time – not a- priori Centralized analysis not practical Push analysis to the nodes(push into the network)

19 Approach Use logic programming model - In dynamic and distributed system, therefore temporal and positional logic Why? - Expressivity: Intuitive to make statements about the state of the system - Performance: :: Logic expression transformation for efficient evaluation :: Partial results caching

20 Time and Position in the Language Every term in the system has an environment containing time and location Eval( bandwidth( env (at(node(Node), time(Time), Time > 1032445465, BwVar), BwVar > 40000))

21 Performance Aggressive Caching: - Evaluation results are cached - Sometimes latency is more important then freshness - Time environment used to control freshness Scheduling - Pre-scheduling results to be available when and where they may be needed. - Cache can be refreshed with fresh values

22 Evaluation Planning Given an expression, plan - where(close to data) - when (time when dependencies resolved) - what to evaluate Logic expressions can be transformed at runtime

23 Extensibility Users can add new functionality at run-time Capabilities : to protect modules, grant and revoke privileges. cap569354(Val) :- read sensor. cap435456(Val) :- cap569354(Val). bandwidth(Val) :- cap(435456(Val) Module Protection: All predicates transformed into capabilities, shared through master key capability Danger in caching – different interfaces

24 PIER and Sophia Sophia: location of code execution is both explicit in the language and can be evaluated in the course of evaluation. PIER: details of query execution left to underlying implementation to optimize. Consequence: Sophia queries are more sophisticated: both user and system participate in evaluation planning.


Download ppt "Querying The Internet With PIER Nitin Khandelwal."

Similar presentations


Ads by Google