Algorithm Engineering Parallele Suche Stefan Edelkamp.

Slides:



Advertisements
Similar presentations
Processes and Threads Chapter 3 and 4 Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College,
Advertisements

CMPT 401 Dr. Alexandra Fedorova Lecture III: OS Support.
Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
Datorteknik F1 bild 1 Higher Level Parallelism The PRAM Model Vector Processors Flynn Classification Connection Machine CM-2 (SIMD) Communication Networks.
Revisiting Co-Processing for Hash Joins on the Coupled CPU- GPU Architecture School of Computer Engineering Nanyang Technological University 27 th Aug.
1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Accelerating External Search with Bitstate Hashing Stefan Edelkamp Shahid Jabbar Computer Science Department University of Dortmund, Dortmund, Germany.
Parallel Graph Algorithms
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
1/28/2004CSCI 315 Operating Systems Design1 Operating System Structures & Processes Notice: The slides for this lecture have been largely based on those.
Blind Search-Part 2 Ref: Chapter 2. Search Trees The search for a solution can be described by a tree - each node represents one state. The path from.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
PRASHANTHI NARAYAN NETTEM.
C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.
Search Algorithms for Discrete Optimization Problems Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Adapted for 3030 To accompany the text.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
Distributed Verification of Multi-threaded C++ Programs Stefan Edelkamp joint work with Damian Sulewski and Shahid Jabbar.
Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Review for Final Andy Wang Data Structures, Algorithms, and Generic Programming.
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Parallel External Directed Model Checking with Linear I/O Shahid Jabbar Stefan Edelkamp Computer Science Department University of Dortmund, Dortmund, Germany.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
Elementary Data Organization. Outline  Data, Entity and Information  Primitive data types  Non primitive data Types  Data structure  Definition 
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Multi-Semester Effort and Experience to Integrate NSF/IEEE-TCPP PDC into Multiple Department- wide Core Courses of Computer Science and Technology Department.
External A* Stefan Edelkamp, Shahid Jabbar (ich) University of Dortmund, Germany and Stefan Schrödl (DaimlerChrysler, CA)
Review for Final Exam – cs411/511 Definitions (5 questions, 2 points each) Algorithm Analysis (3 questions, 3 points each) General Questions (3 questions,
Data Structures and Algorithms in Parallel Computing Lecture 1.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
I/O Efficient Directed Model Checking Shahid Jabbar and Stefan Edelkamp, Computer Science Department University of Dortmund, Germany.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Large Scale External Directed Liveness Checking Stefan Edelkamp Shahid Jabbar Computer Science Department University of Dortmund, Dortmund, Germany.
Parallel Programming in Chess Simulations Part 2 Tyler Patton.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.
Final Exam Review COP4530.
Introduction to threads
Mehdi Kargar Department of Computer Science and Engineering
Data Structures and Algorithms
Parallel Graph Algorithms
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Real-Time Ray Tracing Stefan Popov.
External Methods Chapter 15 (continued)
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Data Structures and Algorithms
Communication and Memory Efficient Parallel Decision Tree Construction
More examples How many processes does this piece of code create?
Parallel Sort, Search, Graph Algorithms
Indexing and Sorting Zachary G. Ives November 21, 2007
Midterm review: closed book multiple choice chapters 1 to 9
Parallel Analytic Systems
Search Exercise Search Tree? Solution (Breadth First Search)?
Multithreaded Programming
Concurrency, Processes and Threads
CPS216: Advanced Database Systems
Final Review Dr. Yingwu Zhu.
Low Depth Cache-Oblivious Algorithms
B-Trees and Sorting Zachary G. Ives April 12, 2019
CSE 326: Data Structures Lecture #24 Odds ‘n Ends
CMSC 471 Fall 2011 Class #4 Tue 9/13/11 Uninformed Search
The Gamma Database Machine Project
Presentation transcript:

Algorithm Engineering Parallele Suche Stefan Edelkamp

Übersicht Motivation PRAM Terminierung Depth-Slicing Hash-based Partitioning & Transposition Table Scheduling Stack Splitting & Parallel Window Search Parallele Suche mit Treaps

Parallel Shared Memory Graph Search Single-core CPU Multi-core CPU Parallelization is important for multi-core CPUs But parallelizing graph-search algorithms such as breadth- first search, Dijkstras algorithm, and A* is challenging… Issues: Load balancing, Locking, …

Parallel Shared Memory Graph Search Single-core CPU Multi-core GPU Parallelization is even more important for GPUs But parallelizing graph-search algorithms such as breadth- first search, Dijkstras algorithm, and A* is challenging… Issues: Kernel Function Design, Load balancing, Locking, …

Parallel External Memory Graph Search Single-core CPU+HDD Multi-core C/GPU+HDD …

Motivation Parallel and External Memory Graph Search Synergies: They need partitioned access to large sets of data This data needs to be processed individually. Limited information transfer between two partitions Streaming in external memory programs relates to Communication Queues in distributed programs (as communication often realized on files) Good external implementations often lead to good parallel implementations

Experimente

Weitere Experimente

Parallel Random Access Machine Common Read/Exclusive Write (CREW PRAM)

Parallele Addition

In Pseudo-Code

Definitionen Problemgröße Parallele Rechenzeit Arbeit Sequentielle Zeit: Effizienz: Speedup: Im Beispiel Linear Speedup Effiziente Parallelisierung: Im Beispiel

Präfixsumme

Terminierung

Depth-Slicing

Im Quelltext

Hash-based Partitioning

Transposition Driven Scheduling

Im Quelltext

Parallele Tiefensuche (Parallel Branch-And Bound) master slave

Im Quelltext

Load-Balancing via Stack Splitting

Parallel Window Search (Iterative-Deepening Search)

Treaps: Mischung aus Heaps und Suchbäumen

Einsatz Using a treap the need for exclusive locks can be alleviated to some extend. Each operation on the treap manipulates the data structure in the same top-down direction. Moreover, it can be decomposed into successive elementary operations. Tree partial locking protocol: Every process holds exclusive access to a sliding window of nodes in the tree. It can move this window down a path in the tree, which allows other processes to access different, non- overlapping windows at the same time. Parallel search using a treap with partial locking has been tested for the FIFTEENPUZZLE on different architectures, with a speedup for 8 processors in between 2 and 5.

Selbstanordnende Bäume mittels Splay-Operation Siehe Extra-Folien

Parallel External-Memory Graph Search Motivation Shared and Distributed Environments Parallel Delayed Duplicate Detection Parallel Expansion Distributed Sorting Parallel Structured Duplicate Detection Finding Disjoint Duplicate Detection Scopes Locking

Distributed Search over the Network Distributed setting provides more space. Experiments show that internal time dominates I/O.

Exploiting Independence Since each state in a Bucket is independent of the other – they can be expanded in parallel. Duplicates removal can be distributed on different processors. Bulk (Streamed) transfers much better than single ones.

Parallel Breadth-First Frontier Search Enumerating 15-Puzzle Hash function partitions both layers into files. If a layer is done, children files are renamed into parent files. For parallel processing a work queue contains parent files waiting to be expanded, and child files waiting to be merged

Distributed Queue for Parallel Best- First Search P0 P1 P2 TOP Beware of the Mutual Exclusion Problem!!!

Distributed Delayed Duplicate Detection Each state can appear several times in a bucket. A bucket has to be searched completely for the duplicates. P0P1P2P3 GOAL Problem: Concurrent Writes !!!! Sorted buffers Single Files

Multiple Processors - Multiple Disks Variant Sorted buffers w.r.t the hash val Sorted Files P1 P2 P3P4 Divide w.r.t the hash ranges Sorted buffers from every processor Sorted File h 0 ….. h k-1 h k ….. h l-1