OS support for Teraflux A Prototype Avi Mendelson Doron Shamia
System and Execution Models Data Flow Based System is made out of clusters. Each cluster contains 16 cores (may change) Each cluster is controlled by a single “OS kernel”; e.g., Linux, L4 Execution is made up of tasks; each task Has no side effects Are scheduled with their data (may use pointers) May return results If fail to complete, can be reschedule on the same core/other core Tasks can be executed on any (service) cluster and has a unified view of system memory All resource allocation/management is done in two levels, a local one and a global one Jan 17-18 2011, Rome, Italy
System Overview Target Protoyped System Cores View Memory View CPU Linux L4 Linux L4 Configuration Page Message Buffers CPU == Cluster Jan 17-18 2011, Rome, Italy
Target System OS Requirements Linux (Full OS) Each uK runs a job Jobs sent by full OS (FOS) Jobs have no side-effects Failed jobs are simply restarted Runs low level FT, reporting to FOS Single chip Multi cores CPU Linux L4 L4 (uKernel) Manages jobs on uKernel (uK) cores Proxies uKs I/O requests Remote debug uKs/self Runs high level (system) FT managing uK/self faults Jan 17-18 2011, Rome, Italy
Communications (1) Buffer L4 Ownership (L4/Linux) Ready flag Type Length (bytes) Data Fixups (optional) Configuration Page Message Buffers Linux Jan 17-18 2011, Rome, Italy
Communications (2) Ownership: who currently uses the buffer Linux L4 Configuration Page Message Buffers Buffer Ownership (L4/Linux) Ready flag Type Length (bytes) Data Fixups (optional) Ownership: who currently uses the buffer Ready: Signals the buffer is ready to be transferred to the other side (inverse owner) Type: The message type Data: simply the raw data (according to type) Fixups: A list of fixups in case we pass pointers Jan 17-18 2011, Rome, Italy
Current Prototype Goal: Quick development of OS support, and applications (later to move on COTson full prototype) Quick prototyping via VMs Linux on both ends (Fedora 13) Main node = Linux (host) Service Nodes = Linux (VMs) Using shared memory between Host and VMs Between VMs Shared memory uses kernel driver (ivshmem) Jan 17-18 2011, Rome, Italy
Prototype Architecture Linux F13 (Host) App Linux F13 QEMU Linux F13 QEMU User space Kernel space IVSHMEM Linux F13 QEMU Linux F13 QEMU Jan 17-18 2011, Rome, Italy
QEMU maps shared-memory into RAM IV Shared Memory Arch mmap to user level Exposed as a PCI BAR QEMU maps shared-memory into RAM Jan 17-18 2011, Rome, Italy
Communications App Linux F13 (Host) App Host App Logic Message queue API Host App Logic Linux F13 QEMU Linux F13 (Host) App Message queue API Data Flow App Linux F13 QEMU User space Kernel space Shared RAM Msg Msg Msg Linux F13 QEMU Linux F13 QEMU Jan 17-18 2011, Rome, Italy
Demo (toy) Apps Distributed sum app Distributed Mandelbrot Single work dispatcher (host) Multiple sum-engines (VMs) Distributed Mandelbrot Single work dispatcher – lines (host) Multiple compute engines – compute pixels of each line (VMs) Jan 17-18 2011, Rome, Italy
Futures Single Boot Distributed Fault Tolerance Cores Repurposing A TeraFlux chips boots a FOS FOS boots the uKs on the other cores Looks like a single boot process Distributed Fault Tolerance Allow uK/FOS to test each others health One step beyond FOS-centric FT Cores Repurposing If FOS cores fail, uK cores re-boot as FOS New FOS takes over using last valid data snapshot Jan 17-18 2011, Rome, Italy
References Inter-VM Shared memory Jan 17-18 2011, Rome, Italy