Scalable and Secure Architectures for Online Multiplayer Games Thesis Proposal Ashwin Bharambe May 15, 2006.

Slides:



Advertisements
Similar presentations
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Advertisements

Colyseus: A Distributed Architecture for Online Multiplayer Games
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
SDN Controller Challenges
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
CompSci 356: Computer Network Architectures Lecture 21: Content Distribution Chapter 9.4 Xiaowei Yang
Small-Scale Peer-to-Peer Publish/Subscribe
Small-world Overlay P2P Network
Peer-to-Peer Support for Massively Multiplayer Games Bjorn Knutsson, Honghui Lu, Wei Xu, Bryan Hopkins Presented by Mohammed Alam (Shahed)
SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy,
1 Defragmenting DHT-based Distributed File Systems Jeffrey Pang, Srinivasan Seshan Carnegie Mellon University Phillip B. Gibbons, Michael Kaminsky Intel.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Applications over P2P Structured Overlays Antonino Virgillito.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
M ERCURY : A Scalable Publish-Subscribe System for Internet Games Ashwin R. Bharambe, Sanjay Rao & Srinivasan Seshan Carnegie Mellon University.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan.
1 IMPROVING RESPONSIVENESS BY LOCALITY IN DISTRIBUTED VIRTUAL ENVIRONMENTS Luca Genovali, Laura Ricci, Fabrizio Baiardi Lucca Institute for Advanced Studies.
Mercury: Scalable Routing for Range Queries Ashwin R. Bharambe Carnegie Mellon University With Mukesh Agrawal, Srinivasan Seshan.
1 AINA 2006 Wien, April th 2006 DiVES: A DISTRIBUTED SUPPORT FOR NETWORKED VIRTUAL ENVIRONMENTS The IEEE 20th International Conference on Advanced.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Rendezvous Points-Based Scalable Content Discovery with Load Balancing Jun Gao Peter Steenkiste Computer Science Department Carnegie Mellon University.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.
Colyseus: A Distributed Architecture for Online Multiplayer Games Ashwin Bharambe, Jeffrey Pang, Srini Seshan Carnegie Mellon University May 7,
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
Scaling Melees in Peer-to-Peer Games with Donnybrook Jeffrey Pang, Frank Uyeda, John Douceur, Jay Lorch.
Internet Indirection Infrastructure (i3) Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana UC Berkeley SIGCOMM 2002.
Algorithms for Self-Organization and Adaptive Service Placement in Dynamic Distributed Systems Artur Andrzejak, Sven Graupner,Vadim Kotov, Holger Trinks.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
1 The Google File System Reporter: You-Wei Zhang.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
SCAN: a Scalable, Adaptive, Secure and Network-aware Content Distribution Network Yan Chen CS Department Northwestern University.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
1 Scalable and transparent parallelization of multiplayer games Bogdan Simion MASc thesis Department of Electrical and Computer Engineering.
Peer-to-Peer Support for Massively Multiplayer Games Zone Federation of Game Servers : a Peer-to-Peer Approach to Scalable Multi-player Online Games [INFOCOM.
Gil EinzigerRoy Friedman Computer Science Department Technion.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Overlay Network Physical LayerR : router Overlay Layer N R R R R R N.
Streaming over Subscription Overlay Networks Department of Computer Science Iowa State University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Live Streaming over Subscription Overlay Networks CS587x Lecture Department of Computer Science Iowa State University.
Chapter 2: System Models. Objectives To provide students with conceptual models to support their study of distributed systems. To motivate the study of.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Temporal-DHT and its Application in P2P-VoD Systems Abhishek Bhattacharya, Zhenyu Yang & Shiyun Zhang.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
MIDDLEWARE SYSTEMS RESEARCH GROUP Adaptive Content-based Routing In General Overlay Topologies Guoli Li, Vinod Muthusamy Hans-Arno Jacobsen Middleware.
M ERCURY : A Scalable Publish-Subscribe System for Internet Games Ashwin R. Bharambe To appear in NetGames’02.
DHT-based unicast for mobile ad hoc networks Thomas Zahn, Jochen Schiller Institute of Computer Science Freie Universitat Berlin 報告 : 羅世豪.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
TRUST Self-Organizing Systems Emin G ü n Sirer, Cornell University.
1 Scalability and Accuracy in a Large-Scale Network Emulator Nov. 12, 2003 Byung-Gon Chun.
Network Topologies for Scalable Multi-User Virtual Environments Lingrui Liang.
Virtual World Architecture II
Plethora: Infrastructure and System Design
A Case for Mutual Notification
Small-Scale Peer-to-Peer Publish/Subscribe
Presentation transcript:

Scalable and Secure Architectures for Online Multiplayer Games Thesis Proposal Ashwin Bharambe May 15, 2006

2 Online Games are Huge! 1 million 2 million 3 million 4 million 5 million 6 million 7 million 8 million Number of subscribers World of Warcraft Final Fantasy XI Everquest Ultima Online 1.These MMORPGs have client- server architectures 2.They accommodate ~0.5 million players at a time 1.These MMORPGs have client- server architectures 2.They accommodate ~0.5 million players at a time Some more facts

3 Why MMORPGs Scale Role Playing Games have been slow-paced Players interact with the server relatively infrequently Maintain multiple independent game-worlds Each hosted on a different server Not true with other game genres FPS or First Person Shooters (e.g., Quake) Demand high interactivity Need a single game-world

4 FPS Games Don’t Scale Bandwidth (kbps) Quake II server Bandwidth and computation, both become bottlenecks

5 Goal: Cooperative Server Architecture Focus on fast-paced FPS games Focus on fast-paced FPS games

6 Distributing Games: Challenges Tight latency constraints As players or missiles move, updates must be disseminated very quickly  < 150 ms for FPS games High write-sharing in the workload Cheating Execution and state maintenance spread over untrustworthy nodes

7 Talk Outline ProblemBackground Game Model Related Work Colyseus Architecture Expected Contributions

8 Game Model Player Game Status Monsters Ammo Interactive 3-D environment (maps, models, textures) Immutable State Mutable State Screenshot of Serious Sam

9 Game Execution in Client-Server Model void RunGameFrame() // every ms { // every object in the world // thinks once every game frame foreach (obj in mutable_objs) { if (obj->think) obj->think(); } send_world_update_to_clients(); }; void RunGameFrame() // every ms { // every object in the world // thinks once every game frame foreach (obj in mutable_objs) { if (obj->think) obj->think(); } send_world_update_to_clients(); };

10 Object Partitioning Player Monster

11 Distributed Game Execution class CruzMissile { // every object in the world // thinks once every game frame void think() { update_pos(); if (dist_to_ground() < EPSILON) explode(); } void explode() { foreach (p in get_nearby_objects()) { if (p.type == “player”) p.health -= 50; } }; class CruzMissile { // every object in the world // thinks once every game frame void think() { update_pos(); if (dist_to_ground() < EPSILON) explode(); } void explode() { foreach (p in get_nearby_objects()) { if (p.type == “player”) p.health -= 50; } }; Object Discovery Replica Synchronization Missile Monster Item

12 Talk Outline ProblemBackground Game Model Related Work Colyseus Architecture Expected Contributions

13 Related Work Distributed Designs Distributed Interactive Simulation (DIS)  e.g., HLA, DIVE, MASSIVE, etc.  Use region-based partitioning, IP multicast Butterfly, Second-Life, SimMUD [INFOCOM 04]  Use region-based partitioning, DHT multicastCheat-proofing Lock-step synchronization with commitment

14 Related Work: Techniques Region-based Partitioning Parallel Simulation Area-of-Interest Management with Multicast

15 Related Work: Techniques Region-based Partitioning Divide the game-world into fixed #regions Assign objects in one region to one server + Simple to place and discover objects – High migration rates, especially for FPS games – Regions exhibit very high skews in popularity  can result in severe load imbalance Parallel Simulation Area-of-Interest Management with Multicast

16 Related Work: Techniques Region-based Partitioning Parallel Simulation Peer-to-peer: each peer maintains full state Writes to objects are sent to all peers + Point-to-point link  updates go fastest – Needs lock-step + bucket synchronization – No conflict resolution  inconsistency never heals Area-of-Interest Management with Multicast

17 Related Work: Techniques Region-based Partitioning Parallel Simulation Area-of-Interest Management with Multicast Players only need updates from nearby region 1 region == 1 multicast group, use one shared multicast tree per group Bandwidth load-imbalance due to skews in region popularity Updates need multiple hops, bad for FPS games

18 Talk Outline ProblemBackground Colyseus Architecture Scalability [NSDI 2006]  Evaluation Security Expected Contributions

19 Colyseus Components R3R4 P1 P2 Server S1 Server S2 P3P4 Object DiscoveryReplica ManagementObject Placement Server S3 get_nearby_objects ()

20 Flexible and dynamic object placement Permits use of clustering algorithms Not tied to “regions” Previous systems use region-based placement Frequent, disruptive migration for fast games Regions in a game have very skewed popularity Object Placement Region Rank Popularity

21 Writes are serialized at the primary Primary responsible for executing think code Replica trails from the primary by one hop Weakly consistent Low latency is critical Replication Model Single Primary Read-only Replicas Primary-Backup Replication 1-hop

22 Object Discovery Most objects only need other “nearby” objects for executing think functions get_nearby_objects ()

23 Distributed Object Discovery My position is x=x 1, y=y 1, z=z 1 Located on My position is x=x 1, y=y 1, z=z 1 Located on Publication Find all objects with obj.x ε [x 1, x 2 ] obj.y ε [y 1, y 2 ] obj.z ε [z 1, z 2 ] Find all objects with obj.x ε [x 1, x 2 ] obj.y ε [y 1, y 2 ] obj.z ε [z 1, z 2 ] Subscription S S S P Use a structured overlay to achieve this

24 Mercury: Range Queriable DHT Supports range queries vs. exact matches No need for partitioning into “regions” Places data contiguously Can utilize spatial locality in games Dynamically balances load Control traffic does not cause hotspots Provides O(log n)-hop lookup About 200ms for 225 nodes in our setup [SIGCOMM 2004]

25 Object Discovery Optimizations Pre-fetch soon-to-be required objects Use game physics for prediction Pro-active replication Piggyback object creation on update messages Soft-state subscriptions and publications Add object-specific TTLs to pubs and subs

26 Colyseus Design: Recap Mercury Monster on Find me nearby objects Replica Direct point-to-point connection

27 Putting It All Together

28 Talk Outline ProblemBackground Colyseus Architecture Scalability  Evaluation [NSDI 2006] Security Expected Contributions

29 Evaluation Goals Bandwidth scalability Per-node bandwidth usage should scale with the number of nodes View inconsistency due to object discovery latency should be small Discovery latency, pre-fetching overhead in [NSDI 2006] in [NSDI 2006]

30 Experimental Setup Emulab-based evaluation Synthetic game Workload based on Quake III traces P2P scenario 1 player per server Unlimited bandwidth Modeled end-to-end latencies More results including a Quake II evaluation, in [NSDI 2006]

31 Per-node Bandwidth Scaling Mean outgoing bandwidth (kbps) Number of nodes

32 View Inconsistency Avg. fraction of mobile objects missing Number of nodes no delay 100 ms delay 400 ms delay

33 Planned Work Consistency models Game operations demand differing levels of consistency and latency response  Causal ordering of events  AtomicityDeployment Performance metrics depend crucially on the workload A real game workload would be useful for future research

34 Talk Outline ProblemBackground Colyseus Architecture Scalability  Evaluation Security [Planned Work] Expected Contributions

35 Cheating in Online Games Why do cheats arise? Distributed system (client-server or P2P) Bugs in the game implementation Possible Cheats in Colyseus Object Discovery  map-hack, subscription-hijack Replication  god-mode, event-ordering, etc. Object Placement  god-mode

36 Object Discovery Cheats map-hack cheat [Information overexposure] Subscribe to arbitrary areas in the game Discover all objects, which may be against game rules Subscription-hijack cheat Incorrectly route subscriptions of your enemy Enemy cannot discover (see) players  Other players can see her and can shoot her

37 Replication Cheats god-mode cheat Primary node has arbitrary control over writes to the object Timestamp cheat Primary node decides the serialized write order You die! No, I don’t! Node A Node B

38 Replication Cheats Suppress-update cheat Primary does not send updates to the replicas Inconsistency cheat Primary sends incorrect or conflicting updates to the replicas Hide from this guy I am dead I moved to another room Player A Player C Player D Player B

39 Related Work NEO protocol [GauthierDickey 04] Lock-step synchronization with commitment  Send encrypted update in round 1  Send decryption key in round 2, only after you receive updates from everybody + Addresses  suppress-update cheat  timestamp cheat – Lock-step synchronization increases “lag” – Does not address god-mode cheat, among others

40 Solution Approach Philosophy: Detection rather than Prevention Preventing cheating ≈ Byzantine fault tolerance Known protocols emphasize strict consistency and assume weak synchrony  Multiple rounds  unsuitable for game-play High-level decisions 1. Make players leave an audit-trail 2. Make peers police each other 3. Keep detection out of critical path always

41 Distributed Audit Log Randomly chosen witness Centralized Auditor

42 Player Log Witness Log Logging Using Witnesses Think code Witness Node Player Node Optimistic Update path Serialized Updates

43 Using Witnesses: Good and Bad + Player, witness logs can be used for audits  Potentially address timestamp, god-mode and inconsistency cheats + Witness can generate pubs + subs  Addresses map-hack cheat – Bandwidth overhead – Does not handle suppress-update cheat and the subscription-hijack cheat

44 Using Witnesses: Alternate Design Move the primary directly to the witness node Code execution and writes directly applied at the witness – Primary  replica updates go through witness – Witness gets arbitrary power Player cannot complain to anybody Witness Node has primary copy of player

45 Challenges Balance power between player and witness Use cryptographic techniques How do players detect somebody is cheating? Extraction of rules from the game code Securing the object discovery layer Leverage DHT security research Keep bandwidth overhead minimal

46 Talk Outline ProblemBackground Colyseus Architecture Scalability  Evaluation Security Expected Contributions

47 Expected Contributions Mercury range-queriable DHT Design and evaluation of Colyseus Real-world measurement of game workloads Anti-cheating protocols

48 Expected Contributions Mercury range-queriable DHT First structured overlay to support range queries and dynamic load balancing Implementation used in other systems Design and evaluation of Colyseus Real-world measurement of game workloads Anti-cheating protocols

49 Expected Contributions Mercury range-queriable DHT Design and evaluation of Colyseus First distributed design to be successfully applied for scaling FPS games Demonstrated that low-latency game-play is feasible Flexible architecture for adapting to various types of games Real-world measurement of game workloads Anti-cheating protocols

50 Expected Contributions Mercury range-queriable DHT Design and evaluation of Colyseus Real-world measurement of game workloads Deployment of Quake III Anti-cheating protocols

51 Expected Contributions Mercury range-queriable DHT Design and evaluation of Colyseus Real-world measurement of game workloads Anti-cheating protocols Encourage real-world deployments Lead towards lighter-weight fault-tolerance protocols

52 Summary of Thesis Statement Design of scalable, secure architectures for games utilizing key properties Design of scalable, secure architectures for games utilizing key properties Game workload is predictable Players tolerate loose, eventual consistency

53 Differences from Related Work Avoid region-based object placement Frequent migration when objects move Load-imbalance due to skewed region popularity 1-hop unicast update path between primaries and replicas Previous systems used overlay multicast Replication model with eventual consistency Avoid parallel execution

54 Timeline Development of newer consistency and anti-cheat protocols May 06  Jul 06 Integration of Colyseus with Quake III May 06  Jul 06 Implementation of consistency and anti-cheat protocols Jul 06  Sep 06 Deployment and evaluation Jul 06  Dec 06 Thesis writing Dec 06  Mar 07

55 Thanks

56 Object Discovery Latency Mean object discovery latency (ms) Number of nodes

57 Object Discovery Latency Observations: 1. Routing delay scales similarly for both types of DHTs: both exploit caching effectively. Median hop-count = DHT gains a small advantage because it does not have to “spread” subscriptions Observations: 1. Routing delay scales similarly for both types of DHTs: both exploit caching effectively. Median hop-count = DHT gains a small advantage because it does not have to “spread” subscriptions

58 Bandwidth Breakdown Number of nodes Mean outgoing bandwidth (kbps)

59 Bandwidth Breakdown Observations: 1. Object discovery forms a significant part of the total bandwidth consumed 2. A range-queriable DHT scales better vs. a normal DHT (with linearized maps) Observations: 1. Object discovery forms a significant part of the total bandwidth consumed 2. A range-queriable DHT scales better vs. a normal DHT (with linearized maps)

60 Goals and Challenges 1. Relieve the computational bottleneck Challenge: partition code execution effectively 2. Relieve the bandwidth bottleneck Challenge: minimize bandwidth overhead due to object replication 3. Enable low-latency game-play Challenge: replicas should be updated as quickly as possible

61 Key Design Elements Primary-backup replication model Read-only replicas Flexible object placement Allow objects to be placed on any node Scalable object lookup Use structured overlays for discovering objects

62 View Consistency Object discovery should succeed as quickly as possible Missing objects  incorrect rendered viewChallenges O(log n) hops for the structured overlay  Not enough for fast games Objects like missiles travel fast and short-lived

63 Distributed Architectures: Motivation Server farms? $$$ Significant barrier to entry Motivating factors Most game publishers are small Games grow old very quickly What if you are ~1000 university students wanting to host and play a large game? What if you are ~1000 university students wanting to host and play a large game?

64 Colyseus Components Object LocationReplica Management Mercury server s2 P1 P2 R3R4 3. Register Replicas: R3 (to s2), R4 (to s2) 4. Synch Replicas: R3, R4 1. Specify Predicted Interests: (5 < X < 60 & 10 < y < 200) TTL 30sec 2. Locate Remote Objects: P3 on s2, P4 on s2 Object Store server s1 P3P4 Object Placement 5. Optimize Placement: migrate P1 to server s2

65 Object Pre-fetching On-demand object discovery can cause stalls or render an incorrect view Use game physics for prediction Predict which areas objects will move to Subscribe to object publications in those areas

66 Normal object discovery and replica instantiation slow for short-lived objects Piggyback object-creation messages to updates of other objects Replicate missile pro-actively wherever creator is replicated Pro-active Replication

67 Objects need to tailor publication rate to speed Ammo or health-packs don’t move much Add TTLs to subscriptions and publications Stored pubs act like triggers to incoming subs Soft-state Storage

68 Per-node Bandwidth Scaling Observations: 1. Colyseus bandwidth-costs scale well with #nodes 2. Feasible for P2P deployment (compare single-server or broadcast) 3. In aggregate, Colyseus bandwidth costs are 4-5 times higher  there is overhead Observations: 1. Colyseus bandwidth-costs scale well with #nodes 2. Feasible for P2P deployment (compare single-server or broadcast) 3. In aggregate, Colyseus bandwidth costs are 4-5 times higher  there is overhead

69 View Inconsistency Observations: 1. View inconsistency is small and gets repaired quickly 2. Missing objects on the periphery Observations: 1. View inconsistency is small and gets repaired quickly 2. Missing objects on the periphery no delay 100 ms delay 400 ms delay

70 Cheating in Games Examples of some cheats Information overexposure ( maphack ) Get arbitrary health, weapons ( god-mode ) Precise and automatic weapons ( aimbot ) Event ordering  Did I shoot you first or did you move first? Exploiting bugs inside the game ( duping )

71 Distributed Design Components Object Replica Object Discovery Object Discovery Instantiate Replicas Instantiate Replicas