CS4021/4521 Advanced Computer Architecture II

Slides:



Advertisements
Similar presentations
CS252: Systems Programming Ninghui Li Program Interview Questions.
Advertisements

Trees Types and Operations
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Xiaoying Gao, Peter Andreae, VUW B Trees and B+ Trees COMP 261.
Data Structure II So Pak Yeung Outline Review  Array  Sorted Array  Linked List Binary Search Tree Heap Hash Table.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
Lecture 9 : Concurrent Data Structures Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Binary Search Trees Chapter 7 Objectives
Verification of Data-Dependent Properties of MPI-Based Parallel Scientific Software Anastasia Mironova.
Lecture 20: Consistency Models, TM
CSE373: Data Structures & Algorithms Priority Queues
FastHASH: A New Algorithm for Fast and Comprehensive Next-generation Sequence Mapping Hongyi Xin1, Donghyuk Lee1, Farhad Hormozdiari2, Can Alkan3, Onur.
11 Map ADTs Map concepts. Map applications.
COMP261 Lecture 23 B Trees.
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
AA Trees.
Red-Black Tree Neil Tang 02/04/2010
Hashing & HashMaps CS-2851 Dr. Mark L. Hornick.
Multiway Search Trees Data may not fit into main memory
CS703 – Advanced Operating Systems
IP Routers – internal view
Parallel Processing - introduction
Binary search tree. Removing a node
Threads Cannot Be Implemented As a Library
Atomic Operations in Hardware
Atomic Operations in Hardware
Binary Search Tree (BST)
Oracle SQL*Loader
Toward Advocacy-Free Evaluation of Packet Classification Algorithms
Processing Data in External Storage
Lectures on Network Flows
External Methods Chapter 15 (continued)
Lecture 5: GPU Compute Architecture
Trees 4 The B-Tree Section 4.7
CMPS 3130/6130 Computational Geometry Spring 2017
(edited by Nadia Al-Ghreimil)
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
CS223 Advanced Data Structures and Algorithms
Linchuan Chen, Peng Jiang and Gagan Agrawal
Concurrent Data Structures Concurrent Algorithms 2017
Lecture 5: GPU Compute Architecture for the last time
Symmetric Multiprocessing (SMP)
Chapter 26 Concurrency and Thread
CSC 4250 Computer Architectures
B- Trees D. Frey with apologies to Tom Anastasio
A.R. Hurson 323 CS Building, Missouri S&T
Lecture 22: Consistency Models, TM
Chapter 1 Introduction.
Peng Jiang, Linchuan Chen, and Gagan Agrawal
Data Structure and Algorithms
(edited by Nadia Al-Ghreimil)
Getting to the root of concurrent binary search tree performance
A Small and Fast IP Forwarding Table Using Hashing
Heuristic Search Methods
Data Structures in Ethereum
CS333 Intro to Operating Systems
Lecture: Coherence and Synchronization
Topic 25 Tries “In 1959, (Edward) Fredkin recommended that BBN (Bolt, Beranek and Newman, now BBN Technologies) purchase the very first PDP-1 to support.
Lecture 18: Coherence and Synchronization
Lecture-Hashing.
Presentation transcript:

CS4021/4521 Advanced Computer Architecture II   CS4021/4521 Advanced Computer Architecture II Dr Jeremy Jones web: https://www.scss.tcd.ie/Jeremy.Jones/CS4021/CS4021.htm some changes from previous years M. Herlihy and J. E. B. Moss, “Transactional memory: Architectural support for lock-free data structures,” In Proc. 20th Annu. Int. Symp. on Computer Architecture, pp. 289–300 May 1993

parallel algorithms on shared memory multiprocessors maximise performance DNA analysis (bioinformatics, string searching) multiple threads, SIMD instructions (SSEx, AVXx, ...) cost of sharing data, memory bandwidth, the problem with locks, ... lock implementations CAS based lockless algorithms (for lists, trees, hash tables...) lockless algorithms using hardware transactional memory

Remember algorithms underpin most of what Computer Scientists and Engineers do "algorithms are the poetry of the 21st century"

human genome approx. 3.2 billion bases (length n) ACGTACGTGGTAACCCGGTTA..... DNA sequencing techniques generate millions of short reads of 100 or so bases (length m) CGTGGTAA... have to find the reads in the genome (maybe with inexact matching) using the Burroughs Wheeler Transform (BWT) can find each read in O(m) rather than O(n) can parallelise BWT generation and searching for the reads

Consider a Binary Search Tree add (50) add(45) remove(45) – NO children remove(25) – ONE child remove(20) – TWO children find node (20) find smallest key in its right sub tree (22) overwrite key 20 with 22 remove old node 22 (will have zero or one child) 5 20 22 10 30 25 40 22 50 45

Concurrent add operations concurrently add(27) and add(50) OK if adding to different leaf nodes concurrently add(23) and add(24) problem as adding to same leaf node result depends on how the steps of the operations are interleaved could work correctly, BUT... if there's a conflict ONLY one node may be added concurrent removes also possible if there is no conflict 5 20 10 30 25 40 22 27 50 23 24

Concurrent Updates binary search tree normally protected by a single lock so concurrent updates by multiple threads are NOT possible plenty of scope for concurrent updates provided they do NOT conflict with one other (with a large tree conflicts will be rare) hence lockless algorithms

lock based algorithms are pessimistic assumes something will always go wrong lockless algorithms are optimistic assumes nothing will go wrong, deal with conflicts when detected exploits parallelism higher throughput / performance if done correctly lockless algorithms implemented using CAS instructions of hardware transactional memory (Intel TSX instruction set)

Looking Forward to Seeing You in September