OCR hints All of Mark’s suggestions are on the mark – (no pun intended) Scheduling hints – Temporal affinity – Device affinity – Priority – Concurrency Generation – Memory Consumpution Mapping to a hierarchy [instead of a two-level APGAS]
Concurrency And Memory Balancing OCR knows at runtime – # outstanding ready EDTs – # outstanding data blocks Programmer knows – Which EDTs generate more EDTs – Which EDTs generate more datablocks – Which EDTs consume unused datablocks Together, this can allow the system to control concurrency and memory usage around a setpoint
Mapping to the Hierarchy I don’t know how to do this, but I know the experiment to teach us: – Take Cholesky (or another sufficiently complex but simple problem) – Treat the system as a hierarchy My current experience has been simple distributed systems: here / not here. Let’s make it: here, on this rack, somewhere else Here: shared memory On this rack: distributed memory, but not too expensive: low latency, high bandwidth Somewhere else: higher latency, lower bandwidth – Questions to answer: How to distribute the tiles (datablocks)? Which EDT’s should run where? How do we keep the system busy? – This could either be a real OCR problem, or a simulation target for the next level communication simulator. The answers to these exercises will start to tell us what the right hints are for hierarchical memory