Multiclustered and Multithreaded Architecture

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
1: Operating Systems Overview
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
Chapter 17 Parallel Processing.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Chapter 18 Multicore Computers
PMIT-6102 Advanced Database Systems
Computer System Architectures Computer System Software
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Classic Model of Parallel Processing
Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Background Computer System Architectures Computer System Software.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Processor Level Parallelism 1
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
These slides are based on the book:
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
COMP 740: Computer Architecture and Implementation
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
18-447: Computer Architecture Lecture 30B: Multiprocessors
Distributed Processors
Chapter 1: Introduction
Simultaneous Multithreading
Multi-core processors
Assembly Language for Intel-Based Computers, 5th Edition
Instant replay The semester was split into roughly four parts.
Architecture & Organization 1
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
/ Computer Architecture and Design
Hyperthreading Technology
Multi-Processing in High Performance Computer Architecture:
Computer Architecture: Multithreading (I)
The Basics of Apache Hadoop
Levels of Parallelism within a Single Processor
Chapter 17: Database System Architectures
Chapter 17 Parallel Processing
Simultaneous Multithreading in Superscalar Processors
Fault Tolerance Distributed Web-based Systems
Chapter 1 Introduction.
Fine-grained vs Coarse-grained multithreading
Subject Name: Operating System Concepts Subject Number:
/ Computer Architecture and Design
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Chapter 11: Alternative Architectures
CSC3050 – Computer Architecture
Levels of Parallelism within a Single Processor
Chapter 4 Multiprocessors
LO2 – Understand Computer Software
Multiclustered & Multithreaded Architecture
Database System Architectures
Chapter 2 Operating System Overview
Operating System Overview
Facts About High-Performance Computing
Presentation transcript:

Multiclustered and Multithreaded Architecture

Multithreading The ability for a CPU to run multiple processes/threads at the same time, supported properly by the computer’s operating system. Multithreading is a major way of increasing a system’s throughput, leading to gains in performance as a result. Differs from Multiprocessing (another throughput-increasing method) in that all threads share the same set of resources. Often used in conjunction with Multiprocessing: Multithreading optimizes utilization of a single core, while Multiprocessing runs multiple cores in concert with each other.

Advantages Processes can continue to utilize unused resources if one process stalls out Maximizes usage CPU resources that would have been idle otherwise If multiple threads are using the same data, sharing the same cache can lead to better usage of the cache as well as data synchronization

Disadvantages Potential exists for threads to interfere with each other when sharing hardware resources Performance gains vary from system to system Hand-crafted assembly programs can actually see performance degradation Requires software support at both the operating system and application level to work properly

Types Temporal Multithreading (two main sub-categories that differ by their granularity) Coarse-Grained Fine-Grained (Interleaving) Simultaneous Multithreading Distinction between the two is how many threads can be at a given pipeline stage during a cycle: Temporal: Allows only one thread per execution cycle Simultaneous: Allows more than one per execution cycle

Coarse-Grained architecture When a thread is stalled due to some event, switch to a different hardware context. CPU switches every few cycles to a different thread.

Fine-Grained Architecture Also called Cycle-by-Cycle Interleaved. One core with separate sets of register to manage multiple threads The core can make a context switch from one thread to another at every cycle. When there is a long period of cache missed and the current thread is idle; you still be able to run another thread. Tolerates the control and data dependency latencies by overlapping the latency with useful work from other threads

Fine-Grained Architecture

Simultaneous Multithreading( SMT ) Used exclusively for increasing the efficiency of superscalar CPUs Initially developed for use in IBM’s supercomputer project during the 1960’s Allows multiple threads to issue instructions per CPU cycle Enabled without major changes to a processor’s architecture: Ability to accept instructions from multiple threads Larger than normal register to accommodate the data from extra threads

Simultaneous Multithreading( SMT )

Simultaneous Multithreading (Cont.) Advantages: Increased processor performance (varies, see below) Increased power efficiency Cuts memory latency down to near unnoticeable levels Disadvantages: Can actually decrease performance depending on processor architecture if there are resource bottlenecks Makes software development more difficult, as testing needs to be done to determine if the application benefits or suffers from the feature followed by logic to turn it off if necessary Potential security issues with shared resources

Multithreading architecture summary

How do we increase computing power? Increasing Performance: A farmer seeks to increase performance of his ox and plow Should the farmer try to breed a stronger ox?

How do we increase computing power? Increasing Performance:

How do we increase computing power? Increasing Performance: Or should the farmer use more oxen yoked together?

How do we increase computing power? Increasing Performance: Processors have become faster, smaller, and transistor-denser, but these advances will quickly diminish while production costs increase rapidly Limitations of increasing Processor performance: Transistor density limited by electromagnetic / heat interference Cost increase per Performance increase diminishes, when compared to adding additional processors

Cluster Computing What is a cluster? Commodity computers using customized operating systems, connected by network interconnects, managed by an application

Cluster Computing What is cluster computing used for? Distributed computing: A network of computers that communicate with each other to achieve a common goal A job to be processed is split into tasks, and the tasks are processed by individual computers or nodes Amdahl’s Law: every algorithm has a section that must be executed serially, this limits the speedup that can be achieved, through distributed computing

Multicluster Architectures Grid Computing: Loosely coupled and geographically dispersed clusters Generally used in scientific research by institutions Utilize thousands to hundreds of thousands of processor cores spread across many institutions Connected via Storage Area Network or SAN

Multicluster Architectures Grid Computing: Tommy Minyard, TACC

Multicluster Architectures Grid Computing Limitations: Suitable for computationally intensive jobs, but ill-equipped for handling and transferring large amounts of data SAN becomes a bottleneck, when large amounts of data must be transferred to multiple clusters

Multicluster Architectures Supercomputers and High Performance Computing (HPC): Highly tuned computer clusters using commodity processors, with customized network interconnects and operating systems

Multicluster Architectures Supercomputers and High Performance Computing (HPC): FLOPS: Floating-point Operations per second Currently the fastest Supercomputers operate at peta-scale Quadrillions of FLOPS or 1,000,000,000,000,000 (1015)

Multicluster Architectures China’s Supercomputer Sunway TaihuLight: 93 petaFLOPS (2016) = 93,000,000,000,000,000 FLOPS

Multicluster Architectures Hadoop Clusters for Big Data: Data Locality: data is stored locally on the nodes themselves; very fast Unlike grid architectures, there is no bottleneck in data transfer over SAN Unlike RDBMS, Hadoop clusters stream through data at disk transfer rate, rather than using point queries at slower disk “seek” rate 2008 – 1 TB sorted in 209 seconds using 900 nodes 2009 – 100 TB sorted in 173 minutes using 3400 nodes

Multicluster Architectures Common Hadoop Cluster Networking scheme: Higher latency between racks Store data locally

Multicluster Architectures Hadoop Clusters for Big Data: Fault tolerance Large number of parts, increases the likelihood of hardware failure in the system Hardware Redundancy: Data and Task outputs replicated, three copies are made Error Detection: Large quantities of data transferred, increases likelihood of data corruption in the system CRC – 32 (cyclic redundancy check)

Sources Xie, Maoyuan & Yun, Zhifeng & Lei, Zhou & Allen, Gabrielle. (2007). Cluster Abstraction: Towards Uniform Resource Description andAccess in Multicluster Grid. 220-227. 10.1109/IMSCCS.2007.79. Raicu, I. Introduction to Distributed Systems [slides]. (2011). Illinois Institute of Technology. White, T. Hadoop: The Definitive Guide, 3rd ed. (2012). Null, L., Lobur, J. The Essentials of Computer Organization and Architecture, 4th ed. (2015). Simultaneous Multithreading Project (Information Repository): https://dada.cs.washington.edu/smt/