Multiclustered and Multithreaded Architecture

Slides:

Advertisements

Similar presentations

Threads, SMP, and Microkernels

Advertisements

 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Lecture 6: Multicore Systems

WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.

1: Operating Systems Overview

Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:

Chapter 17 Parallel Processing.

Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.

1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.

Chapter 18 Multicore Computers

PMIT-6102 Advanced Database Systems

Computer System Architectures Computer System Software

LOGO OPERATING SYSTEM Dalia AL-Dabbagh

Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.

Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.

1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.

Super computers Parallel Processing By Lecturer: Aisha Dawood.

Classic Model of Parallel Processing

Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.

Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

Background Computer System Architectures Computer System Software.

Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)

Processor Level Parallelism 1

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

These slides are based on the book:

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

COMP 740: Computer Architecture and Implementation

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

18-447: Computer Architecture Lecture 30B: Multiprocessors

Distributed Processors

Chapter 1: Introduction

Simultaneous Multithreading

Multi-core processors

Assembly Language for Intel-Based Computers, 5th Edition

Instant replay The semester was split into roughly four parts.

Architecture & Organization 1

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

/ Computer Architecture and Design

Hyperthreading Technology

Multi-Processing in High Performance Computer Architecture:

Computer Architecture: Multithreading (I)

The Basics of Apache Hadoop

Levels of Parallelism within a Single Processor

Chapter 17: Database System Architectures

Chapter 17 Parallel Processing

Simultaneous Multithreading in Superscalar Processors

Fault Tolerance Distributed Web-based Systems

Chapter 1 Introduction.

Fine-grained vs Coarse-grained multithreading

Subject Name: Operating System Concepts Subject Number:

/ Computer Architecture and Design

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

Chapter 11: Alternative Architectures

CSC3050 – Computer Architecture

Levels of Parallelism within a Single Processor

Chapter 4 Multiprocessors

LO2 – Understand Computer Software

Multiclustered & Multithreaded Architecture

Database System Architectures

Chapter 2 Operating System Overview

Operating System Overview

Facts About High-Performance Computing

Presentation transcript:

Multiclustered and Multithreaded Architecture

Multithreading The ability for a CPU to run multiple processes/threads at the same time, supported properly by the computer’s operating system. Multithreading is a major way of increasing a system’s throughput, leading to gains in performance as a result. Differs from Multiprocessing (another throughput-increasing method) in that all threads share the same set of resources. Often used in conjunction with Multiprocessing: Multithreading optimizes utilization of a single core, while Multiprocessing runs multiple cores in concert with each other.

Advantages Processes can continue to utilize unused resources if one process stalls out Maximizes usage CPU resources that would have been idle otherwise If multiple threads are using the same data, sharing the same cache can lead to better usage of the cache as well as data synchronization

Disadvantages Potential exists for threads to interfere with each other when sharing hardware resources Performance gains vary from system to system Hand-crafted assembly programs can actually see performance degradation Requires software support at both the operating system and application level to work properly

Types Temporal Multithreading (two main sub-categories that differ by their granularity) Coarse-Grained Fine-Grained (Interleaving) Simultaneous Multithreading Distinction between the two is how many threads can be at a given pipeline stage during a cycle: Temporal: Allows only one thread per execution cycle Simultaneous: Allows more than one per execution cycle

Coarse-Grained architecture When a thread is stalled due to some event, switch to a different hardware context. CPU switches every few cycles to a different thread.

Fine-Grained Architecture Also called Cycle-by-Cycle Interleaved. One core with separate sets of register to manage multiple threads The core can make a context switch from one thread to another at every cycle. When there is a long period of cache missed and the current thread is idle; you still be able to run another thread. Tolerates the control and data dependency latencies by overlapping the latency with useful work from other threads

Fine-Grained Architecture

Simultaneous Multithreading( SMT ) Used exclusively for increasing the efficiency of superscalar CPUs Initially developed for use in IBM’s supercomputer project during the 1960’s Allows multiple threads to issue instructions per CPU cycle Enabled without major changes to a processor’s architecture: Ability to accept instructions from multiple threads Larger than normal register to accommodate the data from extra threads

Simultaneous Multithreading( SMT )

Simultaneous Multithreading (Cont.) Advantages: Increased processor performance (varies, see below) Increased power efficiency Cuts memory latency down to near unnoticeable levels Disadvantages: Can actually decrease performance depending on processor architecture if there are resource bottlenecks Makes software development more difficult, as testing needs to be done to determine if the application benefits or suffers from the feature followed by logic to turn it off if necessary Potential security issues with shared resources

Multithreading architecture summary

How do we increase computing power? Increasing Performance: A farmer seeks to increase performance of his ox and plow Should the farmer try to breed a stronger ox?

How do we increase computing power? Increasing Performance:

How do we increase computing power? Increasing Performance: Or should the farmer use more oxen yoked together?

How do we increase computing power? Increasing Performance: Processors have become faster, smaller, and transistor-denser, but these advances will quickly diminish while production costs increase rapidly Limitations of increasing Processor performance: Transistor density limited by electromagnetic / heat interference Cost increase per Performance increase diminishes, when compared to adding additional processors

Cluster Computing What is a cluster? Commodity computers using customized operating systems, connected by network interconnects, managed by an application

Cluster Computing What is cluster computing used for? Distributed computing: A network of computers that communicate with each other to achieve a common goal A job to be processed is split into tasks, and the tasks are processed by individual computers or nodes Amdahl’s Law: every algorithm has a section that must be executed serially, this limits the speedup that can be achieved, through distributed computing

Multicluster Architectures Grid Computing: Loosely coupled and geographically dispersed clusters Generally used in scientific research by institutions Utilize thousands to hundreds of thousands of processor cores spread across many institutions Connected via Storage Area Network or SAN

Multicluster Architectures Grid Computing: Tommy Minyard, TACC

Multicluster Architectures Grid Computing Limitations: Suitable for computationally intensive jobs, but ill-equipped for handling and transferring large amounts of data SAN becomes a bottleneck, when large amounts of data must be transferred to multiple clusters

Multicluster Architectures Supercomputers and High Performance Computing (HPC): Highly tuned computer clusters using commodity processors, with customized network interconnects and operating systems

Multicluster Architectures Supercomputers and High Performance Computing (HPC): FLOPS: Floating-point Operations per second Currently the fastest Supercomputers operate at peta-scale Quadrillions of FLOPS or 1,000,000,000,000,000 (1015)

Multicluster Architectures China’s Supercomputer Sunway TaihuLight: 93 petaFLOPS (2016) = 93,000,000,000,000,000 FLOPS

Multicluster Architectures Hadoop Clusters for Big Data: Data Locality: data is stored locally on the nodes themselves; very fast Unlike grid architectures, there is no bottleneck in data transfer over SAN Unlike RDBMS, Hadoop clusters stream through data at disk transfer rate, rather than using point queries at slower disk “seek” rate 2008 – 1 TB sorted in 209 seconds using 900 nodes 2009 – 100 TB sorted in 173 minutes using 3400 nodes

Multicluster Architectures Common Hadoop Cluster Networking scheme: Higher latency between racks Store data locally

Multicluster Architectures Hadoop Clusters for Big Data: Fault tolerance Large number of parts, increases the likelihood of hardware failure in the system Hardware Redundancy: Data and Task outputs replicated, three copies are made Error Detection: Large quantities of data transferred, increases likelihood of data corruption in the system CRC – 32 (cyclic redundancy check)

Sources Xie, Maoyuan & Yun, Zhifeng & Lei, Zhou & Allen, Gabrielle. (2007). Cluster Abstraction: Towards Uniform Resource Description andAccess in Multicluster Grid. 220-227. 10.1109/IMSCCS.2007.79. Raicu, I. Introduction to Distributed Systems [slides]. (2011). Illinois Institute of Technology. White, T. Hadoop: The Definitive Guide, 3rd ed. (2012). Null, L., Lobur, J. The Essentials of Computer Organization and Architecture, 4th ed. (2015). Simultaneous Multithreading Project (Information Repository): https://dada.cs.washington.edu/smt/