Status of the vector transport prototype Andrei Gheata 12/12/12.

Slides:



Advertisements
Similar presentations
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Advertisements

Chapter 7 Protocol Software On A Conventional Processor.
Architectural Support for Operating Systems. Announcements Most office hours are finalized Assignments up every Wednesday, due next week CS 415 section.
CS220 Software Development Lecture: Multi-threading A. O’Riordan, 2009.
Real-Time Kernels and Operating Systems. Operating System: Software that coordinates multiple tasks in processor, including peripheral interfacing Types.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Processes.
G Robert Grimm New York University Scheduler Activations.
Collage of Information Technology University of Palestine Advanced programming MultiThreading 1.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads Iraklis Psaroudakis (EPFL), Tobias Scheuer (SAP AG), Norman May.
Prototyping particle transport towards GEANT5 A. Gheata 27 November 2012 Fourth International Workshop for Future Challenges in Tracking and Trigger Concepts.
Extracted directly from:
1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.
Threads Many software packages are multi-threaded Web browser: one thread display images, another thread retrieves data from the network Word processor:
RE-THINKING PARTICLE TRANSPORT IN THE MANY-CORE ERA J.APOSTOLAKIS, R.BRUN, F.CARMINATI,A.GHEATA CHEP 2012, NEW YORK, MAY 1.
Threads. Java Threads A thread is not an object A thread is a flow of control A thread is a series of executed statements A thread is a nested sequence.
Threads in Java. History  Process is a program in execution  Has stack/heap memory  Has a program counter  Multiuser operating systems since the sixties.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Parallel transport prototype Andrei Gheata. Motivation Parallel architectures are evolving fast – Task parallelism in hybrid configurations – Instruction.
1 RTOS Design Some of the content of this set of slides is taken from the documentation existing on the FreeRTOS website
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Concurrency Patterns Emery Berger and Mark Corner University.
ADAPTATIVE TRACK SCHEDULING TO OPTIMIZE CONCURRENCY AND VECTORIZATION IN GEANTV J Apostolakis, M Bandieramonte, G Bitzes, R Brun, P Canal, F Carminati,
Concurrent Programming and Threads Threads Blocking a User Interface.
Multithreading Chapter Introduction Consider ability of _____________ to multitask –Breathing, heartbeat, chew gum, walk … In many situations we.
Multithreading Chapter Introduction Consider ability of human body to ___________ –Breathing, heartbeat, chew gum, walk … In many situations we.
Threads in Java1 Concurrency Synchronizing threads, thread pools, etc.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
The High Performance Simulation Project Status and short term plans 17 th April 2013 Federico Carminati.
Multi-Threading in Java
Processes & Threads Introduction to Operating Systems: Module 5.
Threads. Objectives You must be able to answer the following questions –What code does a thread execute? –What states can a thread be in? –How does a.
Threads-Process Interaction. CONTENTS  Threads  Process interaction.
GeantV scheduler, concurrency Andrei Gheata GeantV FNAL meeting Fermilab, October 20, 2014.
Martin Kruliš by Martin Kruliš (v1.1)1.
Processor Memory Processor-memory bus I/O Device Bus Adapter I/O Device I/O Device Bus Adapter I/O Device I/O Device Expansion bus I/O Bus.
Update on G5 prototype Andrei Gheata Computing Upgrade Weekly Meeting 26 June 2012.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Multithreading. Multitasking The multitasking is the ability of single processor to perform more than one operation at the same time Once systems allowed.
1.  System Characteristics  Features of Real-Time Systems  Implementing Real-Time Operating Systems  Real-Time CPU Scheduling  An Example: VxWorks5.x.
Part IVI/O Systems Chapter 13: I/O Systems. I/O Hardware a typical PCI bus structure 2.
Report on Vector Prototype J.Apostolakis, R.Brun, F.Carminati, A. Gheata 10 September 2012.
GeantV – status and plan A. Gheata for the GeantV team.
GeantV prototype at a glance A.Gheata Simulation weekly meeting July 8, 2014.
GeantV – Adapting simulation to modern hardware Classical simulation Flexible, but limited adaptability towards the full potential of current & future.
Scheduler overview status & issues
Chapter 4: Multithreaded Programming
Multithreading / Concurrency
Module 12: I/O Systems I/O hardware Application I/O Interface
Chapter 4: Threads.
Multi Threading.
Advanced Topics in Concurrency and Reactive Programming: Asynchronous Programming Majeed Kassis.
Processes and Threads Processes and their scheduling
A task-based implementation for GeantV
Report on Vector Prototype
Multithreading Chapter 23.
Chapter 15, Exploring the Digital Domain
Chapter 4: Threads.
I/O Systems I/O Hardware Application I/O Interface
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
Multithreading.
Multithreaded Programming
Threads and Concurrency
21 Threads.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 4: Threads & Concurrency
Thomas E. Anderson, Brian N. Bershad,
Process State Model -Compiled by Sheetal for CSIT
CSC Multiprocessor Programming, Spring, 2011
Module 12: I/O Systems I/O hardwared Application I/O Interface
Presentation transcript:

Status of the vector transport prototype Andrei Gheata 12/12/12

Current implementation transport pick-up baskets transportable baskets recycled baskets full track collections recycled track collections Worker threads Dispatch & garbage collect thread Crossing tracks (itrack, ivolume) Push/replace collection Main scheduler n Inject priority baskets recycle basket ivolume loop tracks and push to baskets n Stepping(tid, &tracks) Digitize & I/O thread Priority baskets Generate(N events ) Hits Hits Digitize(iev) Disk Inject/replace baskets deque generate flush

Disk Current prototype Extending the transport data flow Bottlenecks in scheduling ? Other type of work and resources, sharing the concurrency model ? Scheduler Baskets with tracks Crossing tracks Hit blocks deque Priority events Transport()Digitize(block) ProcessHits(vector) Ev 0 Ev 1 Ev 2 Ev n Digits data I/O buffers Ev 0 Ev 1 Ev 2 Ev n Buffered events

Runnable and executor (Java) A simple task concurrency model based on a Run() interface Single queue management for all different processing tasks – Minimizing overheads of work balancing – Priority management at the level of the work queue In practice, our runnables are transport, digitization, I/O, … – Lower level splitting possible: geometry, physics processes, … Flow = Runnable producing other runnable that can be processed independently Further improvements: – Scheduling executed by worker threads (no need for separate scheduler) – Workers busy -> same thread processing its own runnable result(s) Runnable Data Run() Executor Runnable Future Concurrent queue Concurrent queue Task

GPU-friendly tasks Task = code with clearly defined input/output data which can be executed in a flow Independent GPU tasks – Fully mapped on GPU Mixed CPU/GPU tasks – GPU kernel result is blocking for the CPU code How to deal with the blocking part which has the overhead of the memory bus latency ? CPU code GPU kernel CPU code GPU kernel CPU code GPU kernel Run()

Idle CPU threads CPU thread pool Scheduling work for GPU-like resouces Resource pool of idle threads, controlled by a CPU broker (messaging wake-up) – CPU broker policy: resource balancing (N cores -> N active threads) Some of the runnables are “GPU friendly” – i.e. contain a part in the Run() processing having both CPU and GPU implementations CPU thread taking GPU friendly runnable -> ask GPU broker if resources available – If yes scatter work and push to GPU, then thread goes to wait/notify, else just run on the CPU – … Not before notifying CPU resource broker who may decide to wake up a thread from the pool When result comes back from GPU, thread resumes processing At the end of a runnable cycle, CPU broker corrects the workload Keep both CPU and GPU busy, avoiding hyperthreading Runnable queue Sleep/Wake-up Active CPU threads GPU work embedded GPU broker Scatter/Gather Low latency Push/Sleep Notify() Resume CPU broker