Multiclustered and Multithreaded Architecture

Multiclustered and Multithreaded Architecture

Multithreading The ability for a CPU to run multiple processes/threads at the same time, supported properly by the computer’s operating system. Multithreading is a major way of increasing a system’s throughput, leading to gains in performance as a result. Differs from Multiprocessing (another throughput-increasing method) in that all threads share the same set of resources. Often used in conjunction with Multiprocessing: Multithreading optimizes utilization of a single core, while Multiprocessing runs multiple cores in concert with each other.

Advantages Processes can continue to utilize unused resources if one process stalls out Maximizes usage CPU resources that would have been idle otherwise If multiple threads are using the same data, sharing the same cache can lead to better usage of the cache as well as data synchronization

Disadvantages Potential exists for threads to interfere with each other when sharing hardware resources Performance gains vary from system to system Hand-crafted assembly programs can actually see performance degradation Requires software support at both the operating system and application level to work properly

Types Temporal Multithreading (two main sub-categories that differ by their granularity) Coarse-Grained Fine-Grained (Interleaving) Simultaneous Multithreading Distinction between the two is how many threads can be at a given pipeline stage during a cycle: Temporal: Allows only one thread per execution cycle Simultaneous: Allows more than one per execution cycle

Coarse-Grained architecture
When a thread is stalled due to some event, switch to a different hardware context. CPU switches every few cycles to a different thread.

Fine-Grained Architecture
Also called Cycle-by-Cycle Interleaved. One core with separate sets of register to manage multiple threads The core can make a context switch from one thread to another at every cycle. When there is a long period of cache missed and the current thread is idle; you still be able to run another thread. Tolerates the control and data dependency latencies by overlapping the latency with useful work from other threads

Fine-Grained Architecture

Simultaneous Multithreading( SMT )
Used exclusively for increasing the efficiency of superscalar CPUs Initially developed for use in IBM’s supercomputer project during the 1960’s Allows multiple threads to issue instructions per CPU cycle Enabled without major changes to a processor’s architecture: Ability to accept instructions from multiple threads Larger than normal register to accommodate the data from extra threads

Simultaneous Multithreading( SMT )

Simultaneous Multithreading (Cont.)
Advantages: Increased processor performance (varies, see below) Increased power efficiency Cuts memory latency down to near unnoticeable levels Disadvantages: Can actually decrease performance depending on processor architecture if there are resource bottlenecks Makes software development more difficult, as testing needs to be done to determine if the application benefits or suffers from the feature followed by logic to turn it off if necessary Potential security issues with shared resources

Multithreading architecture summary

How do we increase computing power?
Increasing Performance: A farmer seeks to increase performance of his ox and plow Should the farmer try to breed a stronger ox?

Increasing Performance:

Increasing Performance: Or should the farmer use more oxen yoked together?

Increasing Performance: Processors have become faster, smaller, and transistor-denser, but these advances will quickly diminish while production costs increase rapidly Limitations of increasing Processor performance: Transistor density limited by electromagnetic / heat interference Cost increase per Performance increase diminishes, when compared to adding additional processors

Cluster Computing What is a cluster?
Commodity computers using customized operating systems, connected by network interconnects, managed by an application

Cluster Computing What is cluster computing used for?
Distributed computing: A network of computers that communicate with each other to achieve a common goal A job to be processed is split into tasks, and the tasks are processed by individual computers or nodes Amdahl’s Law: every algorithm has a section that must be executed serially, this limits the speedup that can be achieved, through distributed computing

Multicluster Architectures
Grid Computing: Loosely coupled and geographically dispersed clusters Generally used in scientific research by institutions Utilize thousands to hundreds of thousands of processor cores spread across many institutions Connected via Storage Area Network or SAN

Grid Computing: Tommy Minyard, TACC

Grid Computing Limitations: Suitable for computationally intensive jobs, but ill-equipped for handling and transferring large amounts of data SAN becomes a bottleneck, when large amounts of data must be transferred to multiple clusters

Supercomputers and High Performance Computing (HPC): Highly tuned computer clusters using commodity processors, with customized network interconnects and operating systems

Supercomputers and High Performance Computing (HPC): FLOPS: Floating-point Operations per second Currently the fastest Supercomputers operate at peta-scale Quadrillions of FLOPS or 1,000,000,000,000,000 (1015)

China’s Supercomputer Sunway TaihuLight: 93 petaFLOPS (2016) = 93,000,000,000,000,000 FLOPS

Hadoop Clusters for Big Data: Data Locality: data is stored locally on the nodes themselves; very fast Unlike grid architectures, there is no bottleneck in data transfer over SAN Unlike RDBMS, Hadoop clusters stream through data at disk transfer rate, rather than using point queries at slower disk “seek” rate 2008 – 1 TB sorted in 209 seconds using 900 nodes 2009 – 100 TB sorted in 173 minutes using 3400 nodes

Common Hadoop Cluster Networking scheme: Higher latency between racks Store data locally

Hadoop Clusters for Big Data: Fault tolerance Large number of parts, increases the likelihood of hardware failure in the system Hardware Redundancy: Data and Task outputs replicated, three copies are made Error Detection: Large quantities of data transferred, increases likelihood of data corruption in the system CRC – 32 (cyclic redundancy check)

Sources Xie, Maoyuan & Yun, Zhifeng & Lei, Zhou & Allen, Gabrielle. (2007). Cluster Abstraction: Towards Uniform Resource Description andAccess in Multicluster Grid /IMSCCS Raicu, I. Introduction to Distributed Systems [slides]. (2011). Illinois Institute of Technology. White, T. Hadoop: The Definitive Guide, 3rd ed. (2012). Null, L., Lobur, J. The Essentials of Computer Organization and Architecture, 4th ed. (2015). Simultaneous Multithreading Project (Information Repository):

Multiclustered and Multithreaded Architecture

Similar presentations

Presentation on theme: "Multiclustered and Multithreaded Architecture"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multiclustered and Multithreaded Architecture

Similar presentations

Presentation on theme: "Multiclustered and Multithreaded Architecture"— Presentation transcript:

Similar presentations

About project

Feedback