Distributed Systems Sukumar Ghosh Department of Computer Science University of Iowa
Definition? A distributed system is one in which I can’t do my work, because some computer that I’ve never even heard of, has failed (Leslie Lamport)
Distributed Systems Network of processes communicating with one another to meet some objective. Growth and innovations fueled by Declining hardware cost and improved device functionality Better networking facility Our dreams
Distributed Systems Traditional Client server systems Peer to peer networks Communicating micro-robots Sensor networks Vehicular networks
A client-server system S clients server ( boring … )
Communicating micro-robots Courtesy: the iSwarm project at the University of Karlsruhe
Numerous Challenges Processes have local views, but the goals are global. Failures and perturbations are expected events and not catastrophic exceptions! Clocks are not perfectly synchronized The topology may change from time to time
Replicated servers S0 S1 S2 S3 S Client-server Replicated client-server clients server not so easy
9 Vehicular Networks Applications Accident alerts/prevention Dynamic route planning Entertainment Roadside infrastructure Internet Cellular Vehicle-to-vehicle Roadside infrastructure Communications Cellular network Vehicle to roadside Vehicle to vehicle
Topics to explore Designing fault-tolerant distributed systems (The term “fault” has a wide scope. It does not necessarily mean crash, but include selfishness, malicious behavior, node mobility, environmental changes etc)
Topics to explore To prevent disruptions caused by failures and perturbations, distributed systems must learn to manage themselves without external intervention (which is often costly, and sometimes not practical). This means, most non-trivial distributed systems must satisfy one or more of the following properties: self-organization, self-healing, self-stabilization, self-optimization etc. (These are yardsticks of “smartness”)
Topics to explore “Scalable algorithms” for distributed systems. Some large scale systems have millions of nodes in them. Will your solution be practical at that scale? Dealing with “big data” in distributed systems (cloud computing, MapReduce, Hadoop etc)
Topics to explore The goal is to guarantee that the system will work in real life. If it does not, then you have to question and revisit the model assumptions, algorithm correctness etc. theory practice
Graduate courses If you are interested in such topics, then consider taking: (Fall 2012) 22C:166 Distributed Systems and Algorithms (Sukumar Ghosh) 22C: 196 Sensing the world (Octav Chipara) (Other semesters) 22C:196 Parallel and Distributed Programming: Forms and Limits Cloud Computing (Ted Herman) Sensor Networks (Ted Herman) Advanced Distributed Algorithms (Sriram Pemmaraju)