Download presentation
Presentation is loading. Please wait.
1
Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project
2
Overview Introduction What is a distributed system What is robustness Aims of study Method Development of simulated model Investigation of model Results & Findings
3
Introduction What is a distributed system? Network of connected entities (agents) Entities communicate via message passing Agents can engage with any other agent Decentralised control Contrast to P2P with centralised indexes Behavior of individual agents conform to the goals of the system
4
Introduction Example: Web services Services are offered by a collection remote computers in a network Services are combined to build complex super-services Appearance that super-services are provided by a single interface agent (web- server)
5
Introduction Insurance policy comparison service Insurance Broker
6
Introduction What is robustness? Correct operation under varied conditions High tolerance of failure (extreme conditions) Graceful in defeat
7
Introduction Aim of study To improve robustness in distributed systems Implementation and comparison of two alternative systems How is robustness achieved? Redundancy Regeneracy (Adaptation) Load balancing
8
Method Simulation model Offers services (tasks) Responds to service requests Services are highly coupled Agent based network consisting of 20 agents Measures Success Response time
9
Method Model framework Components Network mechanics Sequence of execution Representation of time Agent communication Messages
10
Method Components Object oriented Simulator object Controls timing of events Agent objects Provide services Communicate with other agents Message objects Method of communication
11
Method Components
12
Method Components
13
Method Network mechanics Underlying interconnection network (Internet) An agent can engage any other agent Agents form a subset of all possible relationships Routing and propagation latencies are abstracted by the Simulator Message types Service Capability sharing Agent information
14
Method Network mechanics (underlying network)
15
Method Network mechanics (agent relationships)
16
Method Sequence of execution Simulation is sliced into a sequence of time steps (abstraction of real time) In each time step: Messages are forwarded by Simulator Agents are prompted sequentially (any ordering) to execute the time step Execute scheduled services Respond to received messages Messages are received and ordered by Simulator
17
Method Sequence of execution
18
Method Representation of time Floating point number Global time (GT) Current time step Local time (LT) Function of the total number of processing cycles used prior to event Result: Time = GT.LT
19
Method Agent communication Messages Passed between agents Forwarded via Simulator Types Request, Response, Forward, Receipt 11 across the 3 areas: Service Capability sharing Agent information
20
Method Agent communication Role of Simulator Accepts messages and calculates delivery time Applies latency Orders messages according to time of delivery Forwards all messages that reach their destination within the current time step
21
Method Agent communication (sending)
22
Method Agent communication (sending)
23
Method Agent communication (receiving) Messages in inbox are processed according to delivery time At start of time step inbox contains all messages for that time step Agent only reads a message when agent time reaches time of delivery
24
Method Service Unit of work provided by an agent Can be requested by external users or agents within the system Complex workflows (dependencies) Dependence represents the results of a service being used by another service Virtual knowledge communities
25
Method Service Example, task 1 1: do part1 2: do part2 3: request task2,task3,task4 wait task2 4: do part3 wait task3,task4 5: do finish
26
Method Virtual knowledge communities Groups of agents with similar interests Coupled services Benefits More advantageous agent relationships Priority treatment for agents within a community Improvements in reliability and response times
27
Method Virtual knowledge communities
28
Method Agent implementation Service advertising Knowledge Validation Scheduling Failure Performance metrics Entropy Strategy
29
Method Service advertising (indexing) Performance metrics mitigate risk of inefficient routing
30
Method Knowledge Two forms Neighbour relationships Capabilities (services) Fixed storage allocation Services require twice the storage of a relationship Initialised to: 8 relationships (50%) 4 services (50%)
31
Method Knowledge Neighbour relationships Services offered by neighbour is recorded Service directory is stored by each agent Ranked collection of agents that provide a service Ranking based on weighted average of past results Performance and utilization metrics are recorded Capabilities (services) Space not used by simulation Allocation represents the demands of data carried by a service (i.e. databases)
32
Method Knowledge Validation Occurs at set intervals Updates services advertised by neighbours Updates neighbour utilization
33
Method Routing Agents have multiple options Choose the best route based on knowledge Rerouting requests upon failure Investigations Limiting number of hops Limiting number of routing options
34
Method Routing
35
Method Scheduling Agent receives service request for service offered by agent Agent schedules service by appending to service schedule Services are suspended if blocked Services resumed are pre-pended to service schedule Fairness
36
Method Failure Period of time an agent is non-responsive Randomly generated for each time step Based on the reliability (parameter) of network Implemented as a failure schedule 100 time steps, looped Ensures identical conditions for comparison of the systems under analysis
37
Method Performance metrics Weighted measures of ability Response times Failures Used to assess and rank neighbours Usage records Popularity of Relationships Services Used to reallocate storage
38
Method Entropy Perceived unreliability of system Maintained by each agent Formed through interactions with neighbours and subsequent analysis Will evolve over time Triggers strategic response
39
Method Strategy Response to environmental conditions Aims to improve service delivery Two alternatives implemented/tested Standard approach Adaptive memory approach
40
Method Standard approach Fixed allocations of storage Aims to store most frequently used Relationships Services Weighted usage records Swap least popular allocated knowledge with most popular unallocated knowledge Swapping occurs at set intervals
41
Method Adaptive memory approach Dynamic allocations of storage Manipulation Triggered by entropy Tied to strategy Low reliability = Increase neighbours High reliability = Increase services Limited to avoid extreme reactions
42
Method Adaptive memory approach Expectations High reliability More services offered by agents Less time of response to global requests Agents specialise in services according to its employment. Low reliability More contingencies or routing options maintained by agents Higher probability of success Brokers and workers Run-time evolution
43
Method Investigation of model Effect of varying indexing depth Effect of limiting hops from source of a request Effect of limiting routing options explored by agents Effect of enforcing a minimum level of redundancy
44
Method Effect of varying indexing depth
45
Results & Findings Effect of varying indexing depth on success (Depth 2)
46
Results & Findings Effect of varying indexing depth on success (Depth 3)
47
Results & Findings Effect of varying indexing depth on success (Depth 4)
48
Results & Findings Effect of varying indexing depth on time (Depth 2)
49
Results & Findings Effect of varying indexing depth on time (Depth 3)
50
Results & Findings Effect of varying indexing depth on time (Depth 4)
51
Results & Findings Conclusions Comparison of two systems indicates occasional improvements made by adaptive memory technique Improved through optimisations Results indicate that increasing indexing depth does not improve robustness Affecters Limitations of hops and routing options Service dependencies
52
Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.