Designing a Federated Testbed as a Distributed System Robert Ricci, Jonathon Duerig, Gary Wong, Leigh Stoller, Srikanth Chikkulapelly, Woojin Seok 1
Emulab 2
ProtoGENI 3
Why Federate? Diversity – Geographical – Physical Resource – Approach – Expertise Scale 4
Federation Challenges Multiple Administrative Domains – Establish Trust – Maintain Autonomy Local Policy Decisions – Resource Control Coordination Across Federation – Pre-existing Infrastructure – Single Interface 5
Federation Challenges Multiple Administrative Domains – Establish Trust – Maintain Autonomy Local Policy Decision – Resource Control Coordination Across Federation – Pre-existing Infrastructure – Single Interface 6
Federation Challenges Distributed Administration – Establish Trust – Maintain Autonomy Local Policy Decisions – Resource Control Coordination Across Federation – Pre-existing Infrastructure – Single Interface 7
Federation Challenges Distributed Administration – Establish Trust – Maintain Autonomy Distributed Policy – Resource Control Coordination Across Federation – Pre-existing Infrastructure – Single Interface 8
Federation Challenges Distributed Administration – Establish Trust – Maintain Autonomy Distributed Policy – Resource Control Distributed Framework – Pre-existing Infrastructure – Single Interface 9
Key Principles Partitioned trust Distributed knowledge Minimal abstraction Decentralized architecture Minimal dependencies 10
Partitioned Trust Federates operate within trust domains 11
Distributed knowledge No global consistency 12
Minimal abstraction Low-level API providing Platform for tools 13
Decentralized architecture No single point of… – Failure – Policy 14
Minimal dependencies Self-contained API calls Minimize online communication 15
Outline Motivation Architecture Allocation API Federation Identifiers Resource Specification Slice Lifetime Failure Scenarios Conclusion 16
GENI Architecture Aggregate Manager (AM) – Allocates and provisions resources PCs, VMs, VLANs, etc. Slice – Global container for resources Sliver – Instantiation of a single resource 17
ProtoGENI Architecture Slice Authority (SA) – Authorizes Users (Identity Provider) – Creates Slices Clearinghouse (CH) – Facilitates Trust – Central Directory Aggregate Manager List History 18
SA Create Slice Register User Receive Certificate Receive Credential AM Create Sliver Receive Manifest ProtoGENI Architecture
Create Sliver Create Sliver Receive Manifest Receive Manifest AM Create Sliver Receive Manifest Slices Span AMs AM
Outline Motivation Architecture Allocation API Federation Identifiers Resource Specification Slice Lifetime Failure Scenarios Conclusion 21
Create Sliver Receive Manifest Simple Allocation AM
Create Sliver Failed Allocation Simple Failure AM
Create Sliver Create Sliver Receive Manifest Failed Allocation AM Create Sliver Receive Manifest Distributed Allocation AM
Handling Failure Possible Strategies – All or nothing? – Take what you can get? – Change plans? Depends on the tool and user 25
Key Principles Partitioned trust Distributed knowledge Minimal abstraction Decentralized architecture Minimal dependencies 26
API State Machine 27 StartCreatingAllocatedUpdating BeginCommitUpdate Commit Abort Delete Abort Update
Outline Motivation Architecture Allocation API Federation Identifiers Resource Specification Slice Lifetime Failure Scenarios Conclusion 28
Named Federation Entities Authorities (AM, SA, CH) Resources, Slivers Slices Users 29
Named Federation Entities Authorities (AM, SA, CH) – Named by Themselves Resources, Slivers – Named by Aggregate Managers Slices – Named by Slice Authority Users – Named by Slice Authority 30
Key Principles Partitioned trust Distributed knowledge Minimal abstraction Decentralized architecture Minimal dependencies 31
URNs urn:publicid:IDN+emulab.net+user+jay 32
Hierarchical urn:publicid:IDN+emulab.net+user+jay 33
Typed urn:publicid:IDN+emulab.net+user+jay 34
Outline Motivation Architecture Allocation API Federation Identifiers Resource Specification Slice Lifetime Failure Scenarios Conclusion 35
Resource Specification Advertisement – Spec Sheet – AM describes available resources Request – Shopping Cart – User selects resources to use Manifest – User Manual – AM describes resources obtained 36
Key Principles Partitioned trust Distributed knowledge Minimal abstraction Decentralized architecture Minimal dependencies 37
Basic Shape … … 38
Extensions Resource specification as a platform – New resource types – Different measurements – New kinds of entities Build on xsi:schemaLocation – Choose schema based on namespace – Core schema – Extension schemata 39
Properties of Extensions Safely ignored – Unknown namespaces are passed through intact Modular – Multiple extensions can co-exist Validated – Every extension has its own schema. 40
Simple Extension 41
Simple Extension (XML) … 42
Outline Motivation Architecture Allocation API Federation Identifiers Resource Specification Slice Lifetime Failure Scenarios Conclusion 43
Slice Lifetime No definitive list of all slivers in a slice – Distributed Knowledge Cannot delete slices Slices have a renewable lifetime Sliver lifetime cannot exceed slice lifetime 44
Outline Motivation Architecture Allocation API Federation Identifiers Resource Specification Slice Lifetime Failure Scenarios Conclusion 45
Slice Authority Fails Failed Operations – Create Slice – Renew Slice – Obtain a Slice Credential 46
Slice Authority Fails Successful Operations – Create Slivers – Start and Stop Slivers – Delete Slivers – Sliver Login 47
Outline Motivation Architecture Allocation API Federation Identifiers Resource Specification Slice Lifetime Failure Scenarios Conclusion 48
Conclusion Federation as a distributed system Designed and implemented Growing user base – More than 200 users – More than 3000 slices 49
50
API State Machine 51 StartCreatingAllocatedUpdating BeginCommitUpdate Commit Abort Delete Abort Update
Annotation (Advertisement) Multiple Knowledge Domains – Availability (dynamic) – Technical Specifications (static) – Usage and Reliability (dynamic) – Compatibility (static) Different Sources – Aggregate Manager – Measurement Services – Others 52
Progressive Annotation (Base)... 53
Progressive Annotation (AM) 54
Progressive Annotation (AM) 55
Progressive Annotation (Inference Service) 56
Progressive Annotation (Measurement Service) 57