Presentation is loading. Please wait.

Presentation is loading. Please wait.

Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof.

Similar presentations


Presentation on theme: "Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof."— Presentation transcript:

1 Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof. Hakan Ferhatosmanoglu Prof. Christopher Stewart

2 2 Explosion of Scientific Data Sources The amount of scientific data has increased dramatically over the years In just one example, ‣ Large Hadron Collider (LHC) ‣ 15 petabytes annually ‣ 60 petabytes overall Management and processing have become challenging

3 3 Data Sources A Live Cyber Infrastructure

4 4 Computing & Storage Resources A Live Cyber Infrastructure

5 5 Shared/Proprietary Web Services = Web Service A Live Cyber Infrastructure

6 6... A Live Cyber Infrastructure

7 7 Service Interaction with Cyber Infrastructure... invoke results

8 8 Current GUI for Creating Workflows

9 9 Scientific Workflow Challenges ??? ‣ Difficulties for the scientist: ‣ How to identify which data sets to use, and from where to get them? ‣ Which services are available to me to use? ‣ What resources to utilize? ‣ How can I accelerate workflow execution? ‣ Do I really have to do all this myself?

10 10 Contributions Workflow System-- with the following support High-level scientific user querying ‣ D. Chiu and G. Agrawal. A Keyword Querying Interface for Invoking Scientific Workflows. (OSU-TR, submitting to ACM-GIS’10) ‣ D. Chiu and G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets. (SSDBM'09) Automatic workflow planning ‣ D. Chiu and G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets. (SSDBM'09) ‣ D. Chiu and G. Agrawal. Ad Hoc Scientific Workflows through Data-driven Service Composition. (eScience'07)

11 11 Contributions (continued) Quality of Service ‣ D. Chiu, S. Deshpande, G. Agrawal, and R. Li. A Dynamic Approach toward QoS- Aware Service Workflow Composition. (ICWS’09) ‣ D. Chiu, S. Deshpande, G. Agrawal, and R. Li. Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid Environments. (GRID'08) ‣ D. Chiu, S. Deshpande, G. Agrawal, and R. Li. Composing Geoinformatics Workflows with User Preferences. (GIS’08) Accelerating Workflow Execution ‣ D. Chiu and G. Agrawal. Evaluating Caching and Storage Options on the Amazon Web Service Cloud. (OSU-TR, submitted to GRID’10) ‣ D. Chiu, A. Shetty, and G. Agrawal. Elastic Cloud Caches for Derived Data Reuse. (OSU-TR, submitted to SC’10) ‣ D. Chiu and G. Agrawal. Hierarchical Caches for Grid Workflows. (CCGrid’09)

12 12 Presentation Outline Motivation & Introduction Our Service Composition System: Auspice ‣ Metadata Framework ‣ Cost-Aware Service Planning ‣ Supporting Keyword Queries ‣ Elastic Cache Deployment Conclusion Auspice

13 13 Auspice System

14 14 Auspice System D. Chiu & G. Agrawal, eScience ’07 D. Chiu & G. Agrawal, SSDBM ’09

15 15 What known data or services can derive a coast line? Systematic Way to Plan Workflows? Goal-Driven, Recursive Concept Derivation Example User Goal: Coastline Extraction Coa st Line We are targeting some coastline concept in the geospatial domain

16 16 What known data or services can derive water level? Available ServicesAvailable Data What known data or services can derive a CTM? Available ServicesAvailable Data Coa st Line Coast Extrac t 1 Coast Data 1 Coast Data N Available ServicesAvailable Data Types What are its parameters? Systematic Way to Plan Workflows? Coast Extrac t K Wate r Leve l CTM

17 17 Coa st Line Systematic Way to Plan Workflows? Coast Extrac t K Wate r Leve l CTM........................ Coast Extrac t 1 Coast Data 1 Coast Data N

18 18 Coa st Line Systematic Way to Plan Workflows?........................ Workflow 1Workflow 2 Workflow 3...

19 19 Ontology for Applying Domain Information Domain concepts can be derived from executing a service Domain concepts can be derived from retrieving an existing data Service parameters can be represented by certain domain concepts

20 20 Example Subset of Some Ontology

21 21 Auspice Metadata Registration Given a data set or service, ‣ Ontology is applied to new resources ‣ Resources are indexed and immediately usable in workflow planner ‣ Non-intrusive

22 22 Registering Data Sets

23 23 Registering Services

24 24 Subset of Ontology, with Shoreline Target

25 25 Service Planning: An Example A Derived Execution Plan for shoreline concept

26 26 What Users Want Do what you can to provide me results in under 20 minutes. I want the fastest results with at least 75% accuracy - Exec time prediction, - Online data reduction - Domain-specific error modeling........................

27 27 Presentation Outline Motivation & Introduction Our Service Composition System: Auspice ‣ Metadata Framework ‣ Cost-Aware Service Planning ‣ Supporting Keyword Queries ‣ Elastic Cache Deployment Conclusion Auspice

28 28 Auspice System

29 29 Auspice System D. Chiu, S. Deshpande, G. Agrawal, & R. Li, GRID ’08 D. Chiu, S. Deshpande, G. Agrawal, & R. Li, ACM-GIS ’08 D. Chiu, S. Deshpande, G. Agrawal, & R. Li, ICWS ’09

30 30 Challenges We wish to project workflow execution time and workflow accuracy costs at planning time Allow input models per service We should prune all workflows unlikely to meet the user’s demands

31 31 Estimating Workflow Execution Time Service execution time (t x ) ‣ Each service is trained beforehand with various sized inputs Data output size (d size ) ‣ Known for files. But models are again trained for service output Network transmission time (t net ) ‣ Bandwidth between nodes are typically known Recall the workflow structure:

32 32 Estimating Workflow Error/Accuracy The recursive sum is similar for error propagation The errors,, attributed from services and data are implemented by domain scientists is an accuracy parameter, e.g., sampling rate, resolution,..

33 33 Cost Models Declared per Operation

34 34 Water Level Workflow Example Workflow Plan 1Workflow Plan 2 [t_total=3.5001 t_x=1 t_d=0 o=47889 e=0.004] SRVC.getWL( X=482593 Y=4628522 StnID= [t_total=2.5 t_x=0.5 t_d=0 o=0 e=0.004] SRVC.getKNearestStations( Longitude=482593 Latitude=4628522 ListOfStations= [t_total=2 t_x=2 t_d=0 o=47889 e=0] SRVC.GetGSListGreatLakes() RadiusKM=100 K=3 ) time=00:06 date=01/30/2008 ) [t_total=2 t_x=2 t_d=0 o=47889 e=2.4997] SRVC.getWLfromModel( X=482593 Y=4628522 time=00:06 date=01/30/2008 ) Total Projected Costs: Workflow Execution Time = 3.251 Workflow Error = 0.004 Total Projected Costs: Workflow Execution Time = 1.674 Workflow Error = 2.4997

35 35 On Meeting QoS? Users specify QoS accuracy with respect to domain, not data quality ‣ For instance, what does +/- 3 meters mean in terms of image resolution or sampling rate? But service planner is interested in data quality ‣ Inverse the error model? ‣ Adaptive precision logic

36 36 Adaptive Precision Logic Sampling Rate 0.01.0 Time? Error? 0.01.0 Time? Error? sample more 0.01.0 sample less Time? Error? ‣ Often, the error model is read-only ‣ Suggest a new value for parameters via binary-search for the best possible value by repeatedly invoking the model

37 37 System Configuration Computing Environment ‣ Auspice (local)  Linux  Pentium IV 3.0GHz Dual Core  1GB RAM ‣ Service Node  Across OSU campus in Dept of Civil Engg and Geodetic Science  10MBps Interconnection ‣ Data Storage Node  Across state at Kent State University Dept of Computer Science

38 38 Cost Model Overheads

39 39 Experimented Workflow Shoreline Extraction Users can specify the following QoS Parameters: Allowed execution time Allowed error

40 40 On Meeting Time Constraints

41 41 On Meeting Error Constraints

42 42 Presentation Outline Motivation & Introduction Our Service Composition System: Auspice ‣ Metadata Framework ‣ Cost-Aware Service Planning ‣ Supporting Keyword Queries ‣ Elastic Cache Deployment Conclusion Auspice

43 43 Current GUI for Creating Workflows

44 44 Auspice System

45 45 Auspice System D. Chiu & G. Agrawal, SSDBM’09 D. Chiu & G. Agrawal, (submitting to GIS’10)

46 46 Supporting Keyword Querying Planning workflows is hard, while keyword search has become an extremely popular interface for information retrieval ‣ No need to know underlying structure of data ‣ No need to understand structured query languages like SQL Goal: Given set of key terms in the scientific domain, return ranked list of workflow plans to the user for execution

47 47 Keyword Decomposition coastCTM7/8/2003(41.30, -82.4)“”line Filter stopping/stemming/pattern-match map

48 48 Keyword Maximization coast CTM 7/8/2003 41.30 line C C C longitude C C date -82.4 C latitude D D D Data-Substantiated Concepts Unsubstantiated Concepts Any combination of these is potentially what the query is targeting! Potential query parameters

49 49 Keyword Querying coast CTM line C C C Merged Super Concept Query Target CandidateRequisite Concepts 7/8/2003 41.30 C longitude C date -82.4 C latitude D D D Query Parameters

50 50 Keyword Querying coast CTM line C C C Merged Super Concept Query Target CandidateRequisite Concepts 7/8/2003 41.30 C longitude C date -82.4 C latitude D D D Query Parameters

51 51 Keyword Querying coast CTM line C C C Merged Super Concept Query Target CandidateRequisite Concepts 7/8/2003 41.30 C longitude C date -82.4 C latitude D D D Query Parameters Enumerate Workflows

52 52 Ranking Workflow Plans by Relevance Method: ‣ Let be the set of input keyword-concepts ‣ Rank workflow plans on

53 53 A Case Study The following keyword queries were submitted to Auspice

54 54 Search Time

55 55 Precision

56 56 Result Set for QueryID 3 “(41.48335,-82.687778) 7/8/2003 wind CTM”

57 57 Presentation Outline Motivation & Introduction Our Service Composition System: Auspice ‣ Metadata Framework ‣ Cost-Aware Service Planning ‣ Supporting Keyword Queries ‣ Elastic Cache Deployment Conclusion Auspice

58 58 Problem: Query Intensive Circumstances...

59 59 Caching Intermediate Results Shoreline Extraction Time consuming! Can’t we cache the result from when it was last computed??

60 60 Caching Intermediate Results

61 61 Auspice System

62 62 Auspice System D. Chiu & G. Agrawal, CCGrid’09 D. Chiu, A. Shetty, & G. Agrawal, (submitted to SC’10) D. Chiu & G. Agrawal, (submitted to GRID’10)

63 63 Cloud Computing Pay as you go computing Elasticity ‣ Cloud applications can stretch and relax their resource requirements “Infinite” compute and storage resources

64 64 A Workflow Cache Compute Cloud... A B

65 65... A B 75 25 8 Consistent Hashing

66 66... A B 75 25 8 invoke: service(35) (35 mod 100) = 35 Which proxy has the page? h(k) = (k mod 100) h(35) Consistent Hashing

67 67 A B 75 25 8 50 C Only records hashing into (25,50] need to be moved from A to C! Our algorithm for Scaling up GBA: Greedy Bucket Allocation

68 68 Experimental Configuration Workload ‣ Shoreline Extraction Workflow ‣ Takes 23 seconds to complete without benefits of cache ‣ Executed on a miss Amazon EC2 Cloud ‣ Each Cloud node:  Small Instances (Single core 1.2Ghz, 1.7GB, 32bits)  Ubuntu Linux ‣ Caches start out cold ‣ Cache stored in memory only

69 69 Experimental Configuration Our approach exploits a dynamic Cloud environment: ‣ Consistent Hashing: Greedy Bucket Allocation (GBA) ‣ Elastic number of nodes We compare GBA against statically allocated Cloud environments: ‣ 2 fixed nodes (static-2) ‣ 4 fixed nodes (static-4) ‣ 8 fixed nodes (static-8) ‣ Cache overflow --> LRU eviction

70 70 Relative Speedup Querying Rate: 255 invocations/sec Cost Savings

71 71 Maximum Execution Times (intensive rate) Querying Rate: 255 invocations/sec

72 72 That’s Not Completely Elastic What about relaxing the amount of nodes to help save Cloud save costs? First, we need an eviction scheme

73 73 Exponential Decay Eviction At eviction time: ‣ A value,, is calculated for each data record in the evicted slice ‣ is higher:  if was accessed more recently  if was accessed frequently ‣ If is lower than some fixed threshold, evict

74 74 Experimental Configuration Amazon EC2 Cloud ‣ Each Cloud node:  Small Instances (Single core 1.2Ghz, 1.7GB, 32bits)  Ubuntu Linux ‣ Caches start out cold ‣ Data stored in memory ‣ When 2 nodes become < 30% capacity, merge Sliding Window Configuration: ‣ Time Slice: 1 sec ‣ Size: 100 Time Slices

75 75 Data Eviction: 50/255/50 queries per sec Sliding Window Size = 100 sec 50 q/sec255 q/sec50 q/sec

76 76 Cache Contraction: 50/255/50 queries per sec

77 77 Cache Contraction: 50/255/50 queries per sec

78 78 Cache Contraction: 50/255/50 queries per sec Sliding Window Size = 100 sec 50 q/sec255 q/sec50 q/sec

79 79 Cache Hits over Varying Decay Sliding Window Size = 100 sec

80 80 Presentation Outline Motivation & Introduction Our Service Composition System: Auspice ‣ Metadata Framework ‣ Cost-Aware Service Planning ‣ Supporting Keyword Queries ‣ Elastic Cache Deployment Conclusion Auspice

81 81 Future Work Dynamic sliding window size Evaluate and model various Cloud infrastructure options to optimize cost for sustaining the cache Transparent remote data analysis over Clouds Deep Web Integration into querying framework

82 82 Summary and Conclusion Auspice is a workflow system, which ‣ Supports high-level keyword/NLP user queries ‣ Automatically composes workflows, and adapts to QoS Constraints ‣ Caches workflow results to accelerate workflow execution Questions? Auspice

83 83 Capturing Concept Derivability Domain concepts can be derived from executing a service Domain concepts can be derived from retrieving an existing data Service parameters represent different domain concepts

84 84 Indexing Data Sets

85 85 Applying Domain Information Domain concepts can be derived from executing a service Domain concepts can be derived from retrieving an existing data Service parameters represent different domain concepts

86 86 latitude A Case for Semantics Service Identification: ‣ Assume the following service retrieves a satellite image pertaining to (x,y) with resolution respective to r Questions to ask the system: ‣ How to deduce that this service can be used? ‣ How to determine what information is needed for input? ‣ Did the user provide enough information to invoke this service? get_image(double x, double y, double r) inputsTo longitudegrid_size outputsTo satellite image

87 87 Indexing Services Services (inputs, outputs) are also registered in much the same way

88 88 Systematic Service Planning Ontology, O Compose workflows in this form: data derivation service derivation

89 89 Presentation Outline Motivation & Introduction Our Service Composition System: Auspice ‣ Metadata Framework ‣ Cost-Aware Service Planning ‣ Supporting Keyword Queries ‣ Caching Intermediate Results ‣ Elastic Cache Deployment Conclusion Auspice

90 90 Caching Intermediate Results

91 91 A Hierarchical Cache

92 92 Misses Fast Slow Hits (Slow) Wouldn’t it be faster to centralize the index on the broker node? Do we really need the broker index? Isn’t hashing faster? Cache Access Types

93 93 Experimental Workflows Against Heterogeneous Bandwidths

94 94 Centralized on Broker vs. Hierarchical Out-of-core! In-core


Download ppt "Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof."

Similar presentations


Ads by Google