MotivationFundamental ProblemsProblems on Graphs Parallel processors are becoming common place. Each core of a multi-core processor consists of a CPU and.

Slides:



Advertisements
Similar presentations
Multi-core Computing Lecture 3 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh.
Advertisements

I/O-Algorithms Lars Arge Spring 2012 April 17, 2012.
DIMACS Workshop on Parallelism: A 2020 Vision Alejandro Salinger University of Waterloo March 16, 2011.
Communication-Avoiding Algorithms Jim Demmel EECS & Math Departments UC Berkeley.
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Lecture 3: Parallel Algorithm Design
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
1 Parallel Parentheses Matching Plus Some Applications.
James Edwards and Uzi Vishkin University of Maryland 1.
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
Reference: Message Passing Fundamentals.
1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.
Overview Efficient Parallel Algorithms COMP308. COMP 308 Exam Time allowed : 2.5 hours Answer four questions (out of six). If you attempt to answer more.
1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous.
Data Parallel Algorithms Presented By: M.Mohsin Butt
An Introduction to Parallel Computing Dr. David Cronk Innovative Computing Lab University of Tennessee Distribution A: Approved for public release; distribution.
I/O-Algorithms Lars Arge Aarhus University April 16, 2008.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
The Design and Analysis of Algorithms
Basic PRAM algorithms Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei, Petru Eles, Zebo Peng, Jakob Rosen Presented By:
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Problems and MotivationsOur ResultsTechnical Contributions Membership: Maintain a set S in the universe U with |S| ≤ n. Given an x in U, answer whether.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Path Computation in External Memory In this work we focus on undirected, unweighted graphs with small diameter. This fits well for real world graph data.
Chapter 11 Broadcasting with Selective Reduction -BSR- Serpil Tokdemir GSU, Department of Computer Science.
Operating Systems Lecture 02: Computer System Overview Anda Iamnitchi
DLS on Star (Single-level tree) Networks Background: A simple network model for DLS is the star network with a master-worker platform. It consists of a.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Duality between Reading and Writing with Applications to Sorting Jeff Vitter Department of Computer Science Center for Geometric & Biological Computing.
An Optimal Cache-Oblivious Priority Queue and Its Applications in Graph Algorithms By Arge, Bender, Demaine, Holland-Minkley, Munro Presented by Adam Sheffer.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
Multi-core Computing Lecture 2 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh.
The Fundamentals: Algorithms, the Integers & Matrices.
Memory Allocation of Multi programming using Permutation Graph By Bhavani Duggineni.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Equivalence Between Priority Queues and Sorting in External Memory
Martin Kruliš by Martin Kruliš (v1.1)1.
Link Building and Communities in Large Networks Martin Olsen University of Aarhus Link Building Link Building is NP-Hard The dashed links show the set.
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
What Dynamic Programming (DP) is a fundamental problem solving technique that has been widely used for solving a broad range of search and optimization.
What is a metric embedding?Embedding ultrametrics into R d An embedding of an input metric space into a host metric space is a mapping that sends each.
Problem Definition I/O-efficient Rectangular Segment Search Gautam K. Das and Bradford G. Nickerson Faculty of Computer science, University of New Brunswick,
IntroductionReduction to Arrangements of Hyperplanes We keep the canonical triangulation (bottom, left corner triangulation) of each convex cell of the.
COMP 5704 Project Presentation Parallel Buffer Trees and Searching Cory Fraser School of Computer Science Carleton University, Ottawa, Canada
ProblemResultsConcurrent Range Reporting Input: n points in d dimensional space. Support reporting the t points in a query box. The best previous result:
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Internal Memory Pointer MachineRandom Access MachineStatic Setting Data resides in records (nodes) that can be accessed via pointers (links). The priority.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
GPU ProgrammingOther Relevant ObservationsExperiments GPU kernels run on C blocks (CTAs) of W warps. Each warp is a group of w=32 threads, which are executed.
PRAM and Parallel Computing
The Design and Analysis of Algorithms
Multi-core Computing Lecture 2 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh.
Real-Time Ray Tracing Stefan Popov.
Query Processing in Databases Dr. M. Gavrilova
Unit-4: Dynamic Programming
Lecture 2 The Art of Concurrency
Low Depth Cache-Oblivious Algorithms
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

MotivationFundamental ProblemsProblems on Graphs Parallel processors are becoming common place. Each core of a multi-core processor consists of a CPU and a private cache. Inter-processor communication is performed by writing and reading to/from shared memory (higher level cache or the main memory). We need new models of computation which model parallelism while taking into account the latencies of memory hierarchies. Prefix sums: Given a sequence A = {a 1, …,a n } compute B = {b 1, …,b n }, s.t. b i = a 1 +…+a i. Prefix sums is a basic building block for solutions to many problems in parallel models. Our PEM prefix sums solution is optimal in all three complexity metrics. Sorting is a fundamental problems in computer science and a building block for solutions to many problems. We develop an optimal sorting algorithm in all three complexity metrics in the PEM model. Using list ranking and solutions on trees, we can solve problems on graphs efficiently: PEM ModelProblems on TreesOrthogonal Geometric Problems The existing parallel random access (PRAM) model does not have a notion of caches and, therefore, does not account for spatial locality. On the other hand, the existing external memory (EM) model, while explicitly modeling cache access, is not a parallel model. We combine the two models to obtain the parallel external memory (PEM) model – a parallel model that explicitly counts cache accesses. There are three complexity metrics in the PEM model:  Space – amount of memory used  Parallel time – max. time spent by a CPU for computing  Parallel I/O – max. number of transfers between the main memory and the cache of a CPU List ranking: Given a linked list of N elements assign each element its rank, the distance from the head of the list to the element. List ranking is a linchpin to solving problems on trees: We parallelize the distribution sweeping technique. Using distribution sweeping, we obtain efficient solutions to orthogonal geometric problems: References [1] L. Arge. M. T. Goodrich, M. Nelson, N. Sitchinava, Fundamental algorithms for private-cache chip multiprocessors, SPAA, [2] L. Arge, M.T. Goodrich, N. Sitchinava, Parallel external memory graph algorithms, IPDPS, [3] D. Ajwani, N. Sitchinava, N. Zeh, Geometric algorithms for private- cache chip multiprocessors. Manuscript. Parallel External Memory Model for Private-cache Chip Multiprocessors MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation Nodari Sitchinava Aarhus University Shared Memory PRAM model EM model PEM model Solution: We solve the list ranking problem by identifying a maximal independent set via deterministic coin tossing, bridging the set out and recursively solving the problem on the remaining list. All steps are completed in parallel and I/O efficiently, resulting in optimal sorting complexity in all three complexity metrics in the PEM model. A: {2, 4, 1, 9, 3, 2, 7, 1, 1, 8} B: {2, 6, 7, 16, 19, 21, 28, 29, 30, 38} Intel Quad-core processorAMD 6-core processor Line segment intersections Rectangle intersections Range query Minimum spanning tree Bi-connected components Ear decomposition {3, 5, 1, 7, 2, 8, 4, 6, 9, 10} {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}