Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted by: KAPIL CHOGGA CAO JIANFENG RAMASESHAN KANNAN
Hardware issues for large scale parallel computing. Cost, Power and Processor Challenge The Memory and Storage Challenge Communication Resiliency Challenge -Power consumption is now a critical issue. -Power, required cooling affect density and floor space are other issues. -For example, The 10 petaflop Opteron- based system was estimated to cost $1.8 billion and required 179 megawatts to operate. This kind of approach is not feasible. -Use of Smaller Processors Energy Efficient [Chandrakasan et al1992] Large processors can have limitations of clock speed Highest performance per unit area for parallel codes Smaller is easily manageable. (in case of defect, it might be easy to deal with it.) -FPGAs is an option. -Different or same processors Amdahl’s Law [Hennessy and Patterson 2007] suggests that heterogenous many core systems yield better performance. -major consequence of the power challenge. - The currently available main memories (DRAM) and disk drives (HDD) consume way too much power. main memories (DRAM)disk drives (HDD) - New technologies are needed. -Exascale requires higher bandwidth. -Higher band width can be achieved by point to point connectivity between cores. (new ways to connect cores is required) -Chip-scale multiprocessors (CMPs) provides greater inter- core bandwidth and less Inter- core latencies -Synchronization Using Transactional Memory( to avoid locks) -The problem with tightly coupled designs is that any delays in moving information from any node to any other node can cause a delay for all the nodes. In other words, small delays can quickly add up to big drops in performance. -Resilience the ability of a system (with such huge number of components) to continue operations in the presence of faults. -An exascale system must be truly autonomic in nature, constantly aware of its status, and optimizing and adapting itself to rapidly changing conditions, including failures of its individual components. autonomic
Software issues for large scale parallel computing. Security Synchronicity -Data input/output is a considerable problem on petascale machines. As a trivial example, imagine a 100K machine in which all processors try to open a file for reading. The resulting file system storm would probably swamp any single-interface storage server. Furthermore, without intelligent file system semantics, 100K copies of exactly the same file could be pushed through the network. -The amount of data that can be generated by a petascale machine is staggering. There should be one dedicated I/O node for every 8 compute node. -Obviously, no single fileserver can currently handle data input in the range of 100GB/s. Thus, file I/O must be parallelized. A dedicated parallel filesystem has become a standard component for leadership-class architectures. -TLB is a cache used to improve the speed of virtual address translation. One major challenge is to avoid TLB trash -Cache pollution occurs when multiple programs attempt to use the same processor core cache. Cache "pollution" is bad and techniques to avoid it should be developed. -The crash of one component in a browser, such as the Acrobat Reader or the Flash Player, should not cause the entire browser or worse yet, the entire machine to falter. -Also important is how the petascale OS coordinates its fault response with other parts of the system. The most common and robust method for providing fault tolerance in scientific applications is the checkpoint/restart (CPR). -Increasing Need of Protection required because of enormous users; their privacy and security. Handling I/OFault Tolerance
References: Operating System Issues for Petascale Systems Argonne National Laboratory {beckman, iskra, kazutomo, Software Challenges for Extreme Scale Computing: Going from Petascale to Exascale Systems Michael A. Heroux, Sandia National Laboratories The Landscape of Parallel Computing Research: A View from Berkeley Irving Wladawsky-Berger Productive Petascale Computing: Requirements, Hardware, and Software