A Symmetric and Polyvalent Resource Location System Candidate: Chuang Liu Advisor: Ian Foster University of Chicago.

A Symmetric and Polyvalent Resource Location System Candidate: Chuang Liu Advisor: Ian Foster University of Chicago

2 Growth of the Internet l The broad deployment of the Internet and the emergence of service-oriented architectures have led to a remarkable increase in the number of resources to which a user, program, or community may have access.

3 Infrastructures of Resource Pools Sites: ~100 CPUs:~ 100,000 Sites:298 CPUs:629 Pools:1079 CPUs:105146 Node:1.5 M Condor Globus Planetlab Gnutella

4 Applications and Challenges l Applications –Scientific computing application –Content distribution systems –On-demand and utility computing –… l Challenges –Applications need to run on one resource (or resource collection) with desired individual and aggregation properties to achieve good performance or efficiency –Resources are heterogeneous and dynamic –Large number of resources  selection expensive –Resource owners impose policies concerning, e.g., who can use a resource and for what purpose –In Internet environments, resources are distributed

5 Applications Resources We Hypothesize a Unifying Mechanism: Resource Location Service Resource Location Service AlgorithmStructure We want a declarative representation of resources and queries Can we represent access policy and queries on resource sets?  Symmetric evaluation model We need efficient algorithms for polyvalent queries, e.g.: Resource set based on their aggregation properties Resource set based on their network locations In Internet environments, resources are distributed Organization of information, distributed query evaluation  Scalable Internet resource location service

6 Outline l Resource and query description –Data Model –Syntax l Search algorithms l A computer location service l Summary

7 Requirements l Resource description –Resource properties l Query description –Search condition: constraints on resource properties l Traditionally, resources and queries look different –MDS, UDDI, etc. We want to treat resources & queries as symmetric Access policies: constraints on user properties User properties Condor pioneered such an approach (matchmaking) But many limitations in its features

8 Symmetric Data Model l A (query or resource) description –Data section >attribute/value pairs ( Resource/ Query properties) –Constraint section >constraints on properties (Access policy / search condition) –Rank section l Symmetric evaluation –A query and a (set of ) resource(s) match each other if all constraints in their descriptions are satisfied –Focus here on 1-1 and 1-N matches –Have addressed N-N in other work [CCGrid 2005]

9 Syntax l Description uses XML- based syntax –Extensibility l XML Schema –Define the valid attributes and values that can appear in the description of a particular type resource/query ”linux” … 100 … <rl:constraint name=”access time” errmsg=”not accessible”> rq.user:accesstime between (6:00PM, 6:00AM)

10 New Features (Relative to Previous Approaches) l Resources may show different properties or different access policies to different users –Condition structure If( condition1 ) attribute = value1 If( condition2 ) attribute = value2 –Option structure [attribute1= value2, attribute2 = value3] or [attribute1= value3, attribute2 = value4] l Queries for resource sets –Constraints on aggregation properties of resource set rs1 ISASET “computer”; sum(rs1.memorysize) > 100; rank = -count(Rs1) A Constraint Language Approach to Matchmaking. Liu, C., Foster, I., 14 th Intl Workshop on Research Issues on Data Engineering (RIDE 2004), Boston, 2004.

11 Outline l Resource and query description l Search algorithms l A computer location service l Summary and future work

12 l Locating one resource with desired properties –MDS, Condor Matchmaker, RGIS, Gnutella, UDDI –Relational and other databases l Locating resource set with desired properties (polyvalent queries) a) Resource sets with required aggregation properties b) Resource sets with required network connections Search Algorithms

13 Queries with Aggregation Properties: Extending Relational Databases l A query for a resource set with aggregation properties can be represented by a database query requiring the simultaneous satisfaction of arithmetic constraints on multiple attributes (ACMA) from different relations l Database search engine solves ACMA queries by join operations. Unfortunately, current algorithms have poor performance.  Introduce ACMA Join operator and ACMA query evaluation plan SELECT * FROM T as A, T as B, T as C, T as D WHERE A.price + B.price + C.price + D.price <= 5 AND A.cpuSpeed + B.cpuSpeed + C.cpuSpeed + D.cpuSpeed > 100 AND A.memory + B.memory + C.memory + D.memory > 100

14 Execution Plan of ACMA Join SELECT * FROM T as A, T as B, T as C, T as D WHERE A.price + B.price + C.price + D.price <= 5 AND A.cpuSpeed + B.cpuSpeed + C.cpuSpeed + D.cpuSpeed > 100 AND A.memory + B.memory + C.memory + D.memory > 100 l Add selection operators l Use new join operators called constrained join operator Unable to remove intermediate results Read in too many tuples

15 l Selection operators –Use consistency algorithm to initialize selection conditions, which is range constraints on single attributes, in selection operators l Constrained join operator –Extends nested-loop join operator –Use consistency algorithms to foretell if an intermediate result will lead to any final query results Implementation of ACMA Join ACMA query Consistency algorithm Range constraints on single attribute ACMA query Intermediate result Consistency algorithm Yes/no

16 Evaluation of Our Method l Traditional plan l Plan with selection operators and constrained join operator

17 Performance Experiments: Example Results l Plan I reads from 10 4 to 10 6 times more tuples than do the other two plans l Plan III performs a factor of ten times fewer tuple reads than does plan II. Efficient Combinatorial Search in Relational Databases, Liu, C., Yang, L., Foster, I., 9 th International Database Applications and Engineering Symposium (IDEAS 2005), Montreal, 2005

18 Outline l Resource and query description l Search algorithms –Resource sets with required aggregation properties –> Resource sets with required network connections l A computer location service l Summary

19 Resource Set with Required Network Connection l Locate a set of resources with particular network connections in the Internet. l Q1: Find a set of R resources close to each other: –The network latency between any pair of those resources is less than L milliseconds –Useful for e.g. computational applications l Q2: Find a set of R resources far from each other: –The network latency between any pair of those resources is more than L milliseconds –Useful for e.g. content distribution applications

20 Challenges l Direct computation –Such as tree search algorithm l Challenges –It is a NP-hard problem –It may require a large number of measurements –Unstable networks and resources may lead to individual measurements failing  only partial data –Network latency data is noisy because of the sharing of network resources among users

21 Intuition of Our Heuristic Method l Clustering –We partition resources into clusters based on end-to- end network latency –A cluster is set of resources having much smaller latency with each other than with other resources l Search based on the cluster structure –Q1. Search for resources in a cluster –Q2. Search for resources from different clusters

22 Outline l Resource and query description l Search algorithms –Resource sets with required aggregation properties –Resource sets with required network connection >Cluster Algorithms l Cluster Algorithm I l Cluster Algorithm II >Search Algorithm l A computer location service l Summary

23 Cluster Algorithm I – Resource Pool l Resource pools such as OSG, PlanetLab, etc. –Hundreds of resources –Resources are relatively stable –Latency measurements between resources exist l Available latency measurements are only a subset of all possible measurements Latency data on PlanetLab Collected by Stribling

24 Cluster Algorithm I l Markov cluster algorithm [ Dongen 2000 ] l If there are many short paths between two resources, it is highly possible that these two resources have a small latency, and therefore belong to the same cluster l Details in –S. Dongen “A cluster algorithm for graphs”, 2000

25 Effectiveness of the Cluster Algorithm l Compute cluster structures using 10-90% of data. l Quantify, as fraction of changes D, difference between each structure and the structure obtained with all data  We conclude that the cluster algorithm is still effective when running on an incomplete set of data Frac90%80%70%60%50%40%30%20%1% D0.060.1450.1520.1610.1980.2280.3360.380.46

26 Variation of the Cluster Structure l ~30% of cluster structures change less than 10% from one hour ago l >60% of cluster structures change between 10% and 15% from one hour ago l Difference does not increase over time Efficient and Robust Computation of Resource Clusters in the Internet, Liu, C., Foster, I. 6 th IEEE International Conference on Cluster Computing (Cluster 2005), Boston, 2005 l Compare each clustering structure with the one based on data one, two and four hours ago.

27 Outline l Resource and query description l Search algorithms –Resource sets with required aggregation properties –Resource sets with required network connection >Cluster Algorithms l Cluster Algorithm I l Cluster Algorithm II >Search Algorithm l A computer location service l Summary

28 Cluster Algorithm II – Resource Pool l Resource pools such as Gnutella, Kazaa, etc. –Resources join the resource pool incrementally –Very large number of resources –Very expensive to measure and store latency between all resources l Requirements –Incrementally modify cluster structure when resources leave and join the resource pool –Only a modest number of latency measurements –Need small storage space

29 l Storage space O(N) Hierarchical Cluster Structure Average Standard deviation

30 Incremental Cluster Algorithm Number of Measurements: Log(N) N is much closer to C than any member of R N is much closer to A than any member of R Distance between between N and members of R is much bigger than average distance in R The distance between N and members of R is similar to average distance in R

31 Outline l Resource and query description l Search algorithms –Resource sets with required aggregation properties –Resource sets with required network connections >Cluster Algorithms >> Search Algorithm l A computer location service l Summary

32 Modified Tree Search Algorithm l Tree search algorithm –Starts with an empty set –Repeatedly picks from available resources one resource that has required connections with current members in the set, and adds it to the set –Rolls back the addition in previous step if no such resource exists –Finishes when the set contains all required resources l Modified tree search algorithm –Q1: pick resources from the same clusters –Q2: pick resources from different clusters

33 Evaluation of Performance l Cumulative distribution of execution time l Our algorithm answers 70% of queries within a few milliseconds Algorithm70%90% tree0.6 s26 s modified1.6 ms0.4 s

34 Outline l Resource and query description l Search algorithms –Resource sets with required aggregation properties –Resource sets with required network connections l > A computer location service l Summary

35 Computer Location Service l Build a resource location service for computers connected by Internet l Requirements –Support polyvalent queries for computer sets –Support queries for one computer with requirements on multiple properties –Support queries based on network locations –Support resource access policy –Scalable to handle large number of computers and queries

36 Related Work Requirements MDSCondorKazaaSWORDMeridian Polyvalent queries yes Queries on network location yes Queries on multiple properties yes Access policy yes Scalabilitypoor HighMediumHigh We need a new service

37 System Structures Centralized structure Short response time Poor scalability E.g., MDS2, Napster, UDDI Peer-to-peer structure Good scalability Long response time Poor support of queries for resource set E.g., Gnutella 0.4, SWORD Super-peer structure Medium response time Good scalability Good support of queries for resource set E.g., Gnutella 0.6, Kazaa

38 l Partition computers based on the latency hierarchy l One computer in each group acts as the super-peer l Advantages –Answer polyvalent queries locally –Support queries for computer based on their network location –Low network traffic l Cannot find solutions that span groups Super-peer Structure

39 Load Balance l Update of computer information –Each computer reports to the super-peer in its group l Query processing –Each computer knows about K super-peers and sends queries to them randomly

40 Fault Tolerance l Restart of a super-peer –A super-peer periodically sends out a backup list to each computer managed by it –If a super-peer fails, all related computers report to the first computer in the backup list l Recovery of data in a super-peer –Each computer reports to the new super-peer its clusterID that will be used to reconstruct the cluster structure

41 Work Remaining to be Done l Measure: –Query success rates –Query response times –Average and maximum input/output traffic l For: –Our super-peer structure and algorithm –Random super-peer structure and our algorithm –Others? l Using: –Workloads TBD l Assuming –Computer characteristics change randomly

42 Outline l Description of resources and queries l Matchmaking algorithms –An algorithm to locate resource sets with required aggregation properties –Algorithms for locating resource sets with required network connection l A matchmaking service l Summary

43 My Contributions l A matchmaking language to describe resources and queries –Symmetric mechanism that enables both resource owner and requesters to control matching between resources and queries –Support polyvalent queries l Fast algorithms to solve polyvalent queries that search for a resource set with desired aggregation properties and network connections –Order-of-magnitude(s) faster than other approaches l Scalable resource location service that supports a large set of queries for networked computers –Evaluation in progress

44 Publications 1. Efficient and Robust Computation of Resource Clusters in the Internet, Liu, C., Foster, I., 6 th IEEE International Conference on Cluster Computing (Cluster 2005), Boston, 2005 2. Matchmaking Systems: A Survey, Liu, C., Foster, I., unpublished document, 2005 3. Efficient Combinatorial Search in Relational Databases, Liu, C., Yang, L., Foster, I., 9 th International Database Applications and Engineering Symposium (IDEAS 2005), Montreal, 2005 4. Online Resource Matching in a Heterogeneous Grid Environment, Naik, V., Liu, C., Yang, L., Wagner, J., 6 th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2005), Cardiff, UK, 2005. 5. DB_CSP: A Framework and Algorithms for Applying Constraint Solving within Relational Databases, Liu, C., Foster, I., 19th Workshop on (Constraint) Logic Programming (WLP 2005), Ulm, Germany, 2005. 6. A Constraint Language Approach to Matchmaking. Liu, C., Foster, I., 14 th International Workshop on Research Issues on Data Engineering (RIDE 2004), Boston, 2004. 7. Scheduling in the Grid Application to Grid Resource Selection. Dail, H., Sievert, O., Berman, F., Casanova, H., Yarkhan, A., Vadhiyar, S., Dongarra, J., Liu, C., Yang, L., Angulo, D., Foster, I., In Grid Resource Management, Kluwer Publishing, 2003. 8. Design and Evaluation of a Resource Selection Framework. Liu, C., Yang, L., Foster, I. and Angulo, D., 11 th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), Edinburgh, Scotland, 2002. 9. The Cactus Worm: Experiments with Dynamic Resource Discovery and Allocation in Grid Environments, Allen, G., Angulo, D., Foster, I., Lanfermann, G., Liu, C., Radke, T., Seidel, E., Shalf, J., International Journal of Supercomputer Applications, Winter, 2001, v15(4).

45 l Questions? l Thank you

46 Consistency algorithm l ACMA l Consistency algorithm attributes constants Logic operators, such as >, <, =, etc. Upper bound Lower bound if I k > 0Upper bound H k Lower Bound L k

47 To Do l Refine slide 23.

48 Infrastructures Sites:27 Users:~100 CPUs:~ 2700 Sites: ~100 Users:~25,000 CPUs:~ 100,000 Data: ~ 10 PB Sites:298 CPUs:629 Pools:1079 CPUs:105146 Node:1.5 M Condor

49 Protocol

50 System Structures l Centralized vs. peer-to-peer vs. super-peer structure l Reasons to choose super-peer structure –It is necessary to aggregate computer information to process polyvalent queries efficiently –Balance between scalability and efficiency –Suitable for queries with high selectivity

51 Incremental Cluster Algorithm N is closer to C than current members of R N is closer to A than current members of R Distance between current members of R is smaller than that the distance between N and these members The distance between N and members of R is similar to the distance between current members in R Number of Measurements: N Log(N)

52 Benchmark l Three relations A, B, and C with two attributes K1000 and K10000 –Values of K1000 (K10000) distribute uniformly from 1 to 1000 (10000). (Wisconsin benchmark) –Values of K1000 (K10000) follow a normal distribution with medium value 500 and standard division 250 (medium 5000 and standard division 2500 ) l Query SELECT * FROM A, B, C WHERE A.K1000 + B.K1000 + C.K1000 > N1 AND A.K1000 + B.K1000 + C.K1000 < N2 AND A.K10000 + B.K10000 + C.K10000 > N3

53 New Features l Resources may show different properties or different access policies to different users –Condition structure –Option structure l Queries for resource set –constraints on aggregation properties of resource set, such as connected(), etc A Constraint Language Approach to Matchmaking. Liu, C., Foster, I., Proceedings of the 14 th International Workshop on Research Issues on Data Engineering (RIDE 2004), Boston, 2004.

54 Cluster Structure of Resources on Planetlab l Cluster structures with different granularity –East America, West America, Central America, East Asian, South European, etc… –California, Texas, China, Korean, etc… –San Jose (HP, UCB, Stanford), Boston (BU, MIT), etc.. G # of clusters Median latency 1.23917 ms 1.3874 ms 1.41071 ms

A Symmetric and Polyvalent Resource Location System Candidate: Chuang Liu Advisor: Ian Foster University of Chicago.

Similar presentations

Presentation on theme: "A Symmetric and Polyvalent Resource Location System Candidate: Chuang Liu Advisor: Ian Foster University of Chicago."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Symmetric and Polyvalent Resource Location System Candidate: Chuang Liu Advisor: Ian Foster University of Chicago.

Similar presentations

Presentation on theme: "A Symmetric and Polyvalent Resource Location System Candidate: Chuang Liu Advisor: Ian Foster University of Chicago."— Presentation transcript:

Similar presentations

About project

Feedback