Multi-user Extensible Virtual Worlds Increasing complexity of objects and interactions with increasing world size, users, numbers of objects and types.

Slides:



Advertisements
Similar presentations
Integrating 3D Geodata in Service-Based Visualization Systems Jan Klimke, Dieter Hildebrandt, Benjamin Hagedorn, and Jürgen Döllner Computer Graphics Systems.
Advertisements

Client Server Virtual Worlds Erik Hill, Sheldon Brown, Daniel Tracy, David Firestein, Kittinan Ponkaew UCSD Center for Hybrid Multicore Productivity Research.
Shredder GPU-Accelerated Incremental Storage and Computation
Distributed Processing, Client/Server and Clusters
Using MapuSoft Instead of OS Vendor’s Simulators.
Multi-user Extensible Virtual Worlds Increasing complexity of objects and interactions with increasing world size, users, numbers of objects and types.
Processes Management.
Virtualisation From the Bottom Up From storage to application.
Chapter 13 Review Questions
Sheldon Brown, UCSD, Site Director Milton Halem, UMBC Director Yelena Yesha, UMBC Site Director Tom Conte, Georgia Tech Site Director Fundamental Research.
Combining Incremental and Parallel Methods for Large- scale Physics Simulation OpenCL Physics 1 Sheldon Brown, Site Director Daniel Tracy, Programmer Analyst.
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
HARDWARE ACCELERATED WEB BROWSER Berlian Juliartha M.P Indah Yudi Suryani Wais Al Qonri H
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Technical Architectures
Dealing with Computational Load in Multi-user Scalable City with OpenCL Assets and Dynamics Computation for Virtual Worlds.
DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D. Nguyen and Christopher Peery Department of Computer Science.
Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.
Multi-user Extensible Virtual Worlds Increasing complexity of objects and interactions with increasing world size, users, numbers of objects and types.
Assets and Dynamics Computation for Virtual Worlds.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Tiered architectures 1 to N tiers. 2 An architectural history of computing 1 tier architecture – monolithic Information Systems – Presentation / frontend,
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Leveling the Field for Multicore Open Systems Architectures Markus Levy President, EEMBC President, Multicore Association.
Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays Paper by Kenneth Moreland, Brian Wylie, and Constantine Pavlakos Presented.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Computer System Architectures Computer System Software
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Computer Graphics Graphics Hardware
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
Types of Computers Mainframe/Server Two Dual-Core Intel ® Xeon ® Processors 5140 Multi user access Large amount of RAM ( 48GB) and Backing Storage Desktop.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
MAPLD Reconfigurable Computing Birds-of-a-Feather Programming Tools Jeffrey S. Vetter M. C. Smith, P. C. Roth O. O. Storaasli, S. R. Alam
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
Full and Para Virtualization
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Background Computer System Architectures Computer System Software.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Computer Graphics Graphics Hardware
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
VirtualGL.
Parallel Programming By J. H. Wang May 2, 2017.
Types of Computers & Computer Hardware
Chapter 17: Database System Architectures
Objective Understand the concepts of modern operating systems by investigating the most popular operating system in the current and future market Provide.
CLUSTER COMPUTING.
Types of Computers Mainframe/Server
Software models - Software Architecture Design Patterns
Hybrid Programming with OpenMP and MPI
Multithreaded Programming
Computer Graphics Graphics Hardware
Database System Architectures
Objective Understand the concepts of modern operating systems by investigating the most popular operating system in the current and future market Provide.
Presentation transcript:

Multi-user Extensible Virtual Worlds Increasing complexity of objects and interactions with increasing world size, users, numbers of objects and types of interactions. Sheldon Brown, Site Director CHMPR, UCSD Daniel Tracy, Programmer, Experimental Game Lab Erik Hill, Programmer, Experimental Game Lab Todd Margolis, Technical Director, CRCA Kristen Kho, Programmer, Experimental Game Lab

Current schemes using compute clusters break virtual worlds into small “shards” which have a few dozen interacting objects. Compute systems with large amounts of coherent addressable memory alleviate cluster node jumping and can create worlds with several orders of higher level data complexity. Tens of thousands of entities vs. dozens per shard. Takes advantage of techniques hybrid compute techniques for richer object dynamics.

Central server manages world state changes Number of clients and amount of activity determines world size and shape

City road schemes are computed for each player when they enter a new city, using Hybrid multicore compute accelerators

Each player has several views of the world: Partial view of one city Total view of one city Partial view of two cities View of entire globe Within a city are several thousand objects. The dynamics of these objects are computed on the best available resource, balancing computability and coherency and alleviating world Sharding.

Many classes of computing devices are used. Multi-core portable devices (i.e. snapdragon based cell phone) Computing cloud data storage. z10 mainframe – transaction processing state management Server side compute accelerators: NVidia Tesla, Cell processor and x86 Varied desktop comptuation including hybrid multicore

Increasing complexity of objects and interactions with increasing world size, users, numbers of objects and types of interactions. Server services are distributed across cloud clusters, and redistributed across clients as performance or local work necessitates. Coherency with overall system is pursued, managed by centralized server. Virtual world components have dynamic tolerance levels for discoherency and latency. Cell Processor, x86 and GPU compute accelerators for asset transformation, physics and behaviors. Multiple 10gb interfaces to compute accelerators, storage clusters and compute cloud.

Development Server Framework 5/ gb interfaces to compute accelerators Z10 mainframe computer at San Diego Supercomputer Center 2- IFL’s with 128mb Ram, zVM virtual OS manager with Linux guests 6 tb storage fast local storage – 15K disks 4 SR and 2 LR 10gb ethernet interfaces 2 QS22 blades – 4 Cell Processors 2 HS22 blades - 4 Xeons 1 10gb interfaces to internet 4 QS20 blades nVidia Tesla accelerator – 4 GPU’s on linux host, external dual pci connection. Many Clients

SDSC View

Producing a multi-user networked virtual world from a single-player environment Multi-user Extensible Virtual Worlds

Goals Feasibility – Transformation from single-player program to client/server multi-player networking is non-trivial – Structured methodology for transformation required Scalability – Support large environments, massively multi-player – After working version, iteratively tackle bottlenecks Multi-platform server – Explore z10, x86, CellBE, Tesla accelerators – Cross-platform communication required

Evaluate “ drop in ” solutions Benefits and liabilities of client/server side schemes such as OpenSIM and Darkstar.

Custom virtual reality engine ERSATZ Ogre3D real time3D rendering engine OpenGL Direct3D The (Original) Scalable City Technology Infrastructure ODE, Newton Open source physics libraries NVIDIA FX Composer, ATI Render Monkey IDEs for HLSL and GLSL, GPU programming Autodesk Maya, 3DMax Procedural assets creation through our own plug-ins Loki, Xerces, Boost Utilities Libraries CGAL Computational Geometry Library Intel OpenCV Real time computer vision fmod Sound library Chromium, DMX, Sage Distributed rendering Libraries Serial pipeline. Increase performance by increasing CPU speed.

Moore’s law computational gains have not been achievable via faster clock speeds for the past 8 years. Multicore computing is the tactic New computing architectures New algorithmic methods New software engineering New systems designs Sony/Toshiba/IBM Cell BE Processor 1 PPU, 8 SPU’s per chip Intel Larrabee Processor 32 x86 cores per chip IBM System z processor 4 cores 1 service procesor nVidia Fermi GPGPU 16 units with 32 cores each

Ogre3D Scene graph Open Source Libraries – needs work for adding data level parallelism The Scalable City Next Stage Technology Infrastructure Abstract physics to use multiple physics libraries (ODE, Bullet, etc.) Replace computational bottlenecks in these libraries with data parallel operations. Computational Geometry Library Intel OpenCV Real time computer vision Fmod Sound library Cell Processors compute Dynamic Assets Input Data Output Data Data Parallel n threads + SIMD Thread Barrier ERSATZ ENGINE Input Data Output Data Convert assets to data parallel meshes after physics transformation, boosts rendering ~33%

Ogre3D Scene graph Max’s out at about 12 clients for world as complex as Scalable City The Scalable City Next Stage Technology Infrastructure Abstract physics to use multiple physics libraries (ODE, Bullet, etc.) Replace computational bottlenecks in these libraries with data parallel operations. Computational Geometry Library Intel OpenCV Real time computer vision Fmod Sound library Cell Processors compute Dynamic Assets Input Data Output Data Data Parallel n threads + SIMD Thread Barrier ERSATZ ENGINE Input Data Output Data Convert assets to data parallel meshes after physics transformation, boosts rendering ~33% DarkStar Server

Systems are not designed for interaction of 10,000’s of dynamic objects Even a handful of complex objects overload dynamics computation. Extensive re-engineering makes to provide capability and use hybrid multicore infrastructure – defeating their general purpose platform Open Sim Server ERSATZ ENGINE Real Xtend or Linden Client

Challenges & Approach Software Engineering Challenges: – SC: Large, Complex, with many behaviors. – Code consisted of tightly coupled systems not conducive to separation into client and server. – Multi-user support takes time, and features will be expanded by others simultaneously! Basic Approach - Agile methodology : – Incrementally evolve single-user code into a system that can be trivially made multi-user in the final step. – Always have a running and testable program. – Test for unwanted behavioral changes at each step. – Allows others to expand features simultaneously.

Step by Step Conversion 1. Data-structure focused: is it client or server? – Some data structures may have to be split.

Data Structures BlackBoard (Singleton) Player House Piece Landscape Manager Camera Audio Clouds Rendering User Input Physics Inverse Kinematics Road Animation House Lots Visual Component MeshHandler

Abstracting Client & Server Object Representations Server: Visual Component – Visual asset representation on the server side – Consolidates task of updating clients – Used for house pieces, cyclones, landscape, roads, fences, trees, signs (animated, static, dynamic). – Dynamic, run-time properties control update behavior Client: Mesh – Mesh properties communicated from Visual Component – Used to select rendering algorithm – Groups assets per city for quick de-allocation

Step by Step Conversion 1. Data-structure focused: is it client or server? – Some data structures may have to be split. 2. All data access paths must be segmented into c/s – Cross-boundary calls recast as buffered communication.

Data Access Paths Systems access world state via the Blackboard (singleton pattern) After separating into Client & Server Blackboard, Server systems must be weaned off of Client Blackboard and vice versa. Cross-boundary calls recast as buffered communication.

Step by Step Conversion 1. Data-structure focused: is it client or server? – Some data structures may have to be split. 2. All data access paths must be segmented into c/s – Cross-boundary calls recast as buffered communication. 3. Initialization & run loop separation – Dependencies on order must be resolved.

Initialization & Run-loop Initialize Graphics Initialize Physics Init Loading Screen Load Landscape Data Initialize Clouds Create Roads Place Lots Place House Pieces Place Player Get Camera Position Initialize Graphics Initialize Physics Init Loading Screen Load Landscape Data Initialize Clouds Create Roads Place Lots Place House Pieces Place Player Get Camera Position Initialize Graphics Init Loading Screen Initialize Clouds Get Camera Position Initialize Graphics Init Loading Screen Initialize Clouds Get Camera Position Initialize Physics Load Landscape Data Create Roads Place Lots Place House Pieces Place Player Initialize Physics Load Landscape Data Create Roads Place Lots Place House Pieces Place Player

Step by Step Conversion 1. Data-structure focused: is it client or server? – Some data structures may have to be split. 2. All data access paths must be segmented into c/s – Cross-boundary calls recast as buffered communication. 3. Initialization & run loop separation – Dependencies on order must be resolved. 4. Unify cross-boundary comm. to one subsystem. – This will interface with network code in the end.

Unify Communication Single buffer, common format, ordered messages Communicate in one stage: solve addiction to immediate answers MovePlayer Animations ReadClient Physics/IK WriteClient Transforms Render ReadServer UserInput WriteServer

Step by Step Conversion 1. Data-structure focused: is it client or server? – Some data structures may have to be split. 2. All data access paths must be segmented into c/s – Cross-boundary calls recast as buffered communication. 3. Initialization & run loop separation – Dependencies on order must be resolved. 4. Unify cross-boundary comm. to one subsystem. – This will interface with network code in the end. 5. Final separation of client & server into two programs – Basic networking code allows communication

Separate Two programs, plus basic synchronous networking code Loops truly asynchronous (previously one called the other)

Step by Step Conversion 1. Data-structure focused: is it client or server? – Some data structures may have to be split. 2. All data access paths must be segmented into c/s – Cross-boundary calls recast as buffered communication. 3. Initialization & run loop separation – Dependencies on order must be resolved. 4. Unify cross-boundary comm. to one subsystem. – This will interface with network code in the end. 5. Final separation of client & server into two programs – Basic networking code allows communication 6. Optimize! – New configuration changes behavior even for single player

Experience Positives – Smooth transition to multi-user possible – All features/behaviors retained or explicitly disabled – Feature development continued successfully during transition (performance, feature, and behavioral enhancements on both client and server side, CAVE support, improved visuals, machinima engine, etc). Negatives – Resulting code structure not ideal for client/server application (no MVC framework, some legacy structure). – Feature development and client/server work sometimes clash, require re-working in client/server fashion.

Initial Optimizations Basic issues addressed in converting to a massively multi-user networked model

Multi-User Load Challenges Communications Graphics Rendering Geometry Processing Shaders Rendering techniques Dynamics Computation Physics AI or other application specific behaviors Animation

Multi-User Load Challenges Communications Graphics Rendering Geometry Processing Shaders Rendering techniques Dynamics Computation Physics AI or other application specific behaviors Animation

Communication In a unified system, subsystems can share data and communicate quickly. In a Client/Server model, subsystems on different machines have to rely on messages sent over the network – Data marshalling overhead – Data unmarshalling overhead – Bandwidth/latency limitations

New Client Knowledge Model Stand-Alone version had all cities in memory – All clients received updates for activity in all cities – Increased memory & bandwidth use as environment scales Now: Clients only given cities they can see – City assets dynamically loaded onto client as needed – Reduces the updates the clients need Further Challenge: Dynamically loading cities without server or client hiccups.

Communication Challenges More Clients leads to: – More activity – Physics object movements – Road/Land Animations – House Construction – More communication – Per client due to increase in activity – More clients for server to keep up to date – Server communication = activity x clients! Dynamically loading large data sets (cities in this case) without server or client hiccups

Communication Subsystem – Code-generation for data marshalling – Fast data structure serialization – Binary transforms for cross-platform – Token or text-based too slow – Endian issues resolved during serialization – Tested on z10, Intel Asynchronous reading and writing – Dedicated threads perform communication – Catch up on all messages each game cycle

Reducing Data Marshalling Time Reduce use of per-player queues: – Common messages sent to a queue associated with the event’s city – Players receive buffers of each city they see, in addition to their player-specific queue. – Perform buffer allocation, data marshalling, & copy once for many players. – Significantly reduces communication overhead for server.

Preventing Stutters Send smaller chunks of data – Break up large messages Incrementally load cities as a player approaches them – Space out sending assets over many cycles – Large geometry (landscape) subdivided – If player arrives, finish all transfers Prevent disk access on client – Pre-load resources