Summary Inf-2202 Concurrent and Data-Intensive Programming Fall 2016

Slides:

Advertisements

Similar presentations

Sven Woop Computer Graphics Lab Saarland University

Advertisements

Parallel Programming and Algorithms : A Primer Kishore Kothapalli IIIT-H Workshop on Multi-core Technologies International Institute.

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,

Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.

Thoughts on Shared Caches Jeff Odom University of Maryland.

Copyright © 2005 Department of Computer Science CPSC 641 Winter PERFORMANCE EVALUATION Often in Computer Science you need to: – demonstrate that.

Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.

Weekly Report Ph.D. Student: Leo Lee date: Oct. 9, 2009.

UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.

ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.

1 PERFORMANCE EVALUATION H Often one needs to design and conduct an experiment in order to: – demonstrate that a new technique or concept is feasible –demonstrate.

CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware

Architecture and Real Time Systems Lab University of Massachusetts, Amherst An Application Driven Reliability Measures and Evaluation Tool for Fault Tolerant.

12/1/2005Comp 120 Fall December Three Classes to Go! Questions? Multiprocessors and Parallel Computers –Slides stolen from Leonard McMillan.

1 PERFORMANCE EVALUATION H Often in Computer Science you need to: – demonstrate that a new concept, technique, or algorithm is feasible –demonstrate that.

GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.

Parallel Programming: Case Studies Todd C. Mowry CS 495 September 12, 2002.

Bridge the gap between HPC and HTC Applications structured as DAGs Data dependencies will be files that are written to and read from a file system Loosely.

Thinking in Parallel Adopting the TCPP Core Curriculum in Computer Systems Principles Tim Richards University of Massachusetts Amherst.

Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.

Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:

Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi CoE EECS Department April 21, 2014.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

Workflow Early Start Pattern and Future's Update Strategies in ProActive Environment E. Zimeo, N. Ranaldo, G. Tretola University of Sannio - Italy.

LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:

Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.

Concurrency and Performance Based on slides by Henri Casanova.

Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks (cont.) Dr. Xiao Qin Auburn University

Parallel programs Inf-2202 Concurrent and Data-intensive Programming Fall 2016 Lars Ailo Bongo

BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Introduction inf-2202 Concurrent and Data-intensive Programming

Performance evaluation

CS 584 Lecture 3 How is the assignment going?.

Hadoop Clusters Tess Fulkerson.

On-Time Network On-chip

Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

EE 193: Parallel Computing

Spark Software Stack Inf-2202 Concurrent and Data-Intensive Programming Fall 2016 Lars Ailo Bongo

Gabor Madl Ph.D. Candidate, UC Irvine Advisor: Nikil Dutt

CPSC 531: System Modeling and Simulation

Lecture 2: Parallel Programs

Computer Systems Performance Evaluation

Development & Evaluation of Network Test-beds

Overview of big data tools

CS 501: Software Engineering Fall 1999

Hybrid Programming with OpenMP and MPI

Multithreaded Programming

Architecture & System Performance

CS 584 Lecture7 Assignment -- Due Now! Paper Review is due next week.

GATES: A Grid-Based Middleware for Processing Distributed Data Streams

Wide Area Workload Management Work Package DATAGRID project

Parallelism and Amdahl's Law

Computer Systems Performance Evaluation

Mattan Erez The University of Texas at Austin

CS 584 Lecture 5 Assignment. Due NOW!!.

Mattan Erez The University of Texas at Austin

Presentation transcript:

Summary Inf-2202 Concurrent and Data-Intensive Programming Fall 2016 Lars Ailo Bongo (larsab@cs.uit.no)

Goals of parallelization process Step Architecture dependent? Major performance goals Decomposition Mostly no Expose enough concurrency but not too much Assignment Balance workload Reduce communication volume Orchestration Yes Reduce noninherent communication via data locality Reduce communication and synchronization cost as seen by the processor Reduce serialization to shared resources Schedule tasks to satisfy dependencies early Mapping Put related threads on the same core if necessary Exploit locality in chip and network topology

A performance model 𝑆𝑝𝑒𝑒𝑑𝑢𝑝 𝑝𝑟𝑜𝑏𝑙𝑒𝑚 𝑝 ≤ 𝐵𝑢𝑠𝑦 1 + 𝐷𝑎𝑡𝑎 𝑙𝑜𝑐𝑎𝑙 1 𝐵𝑢𝑠𝑦 𝑢𝑠𝑒𝑓𝑢𝑙 (𝑝 )+ 𝐷𝑎𝑡𝑎 𝑙𝑜𝑐𝑎𝑙 𝑝 +𝑆𝑦𝑛𝑐ℎ 𝑝 + 𝐷𝑎𝑡𝑎 𝑟𝑒𝑚𝑜𝑡𝑒 (𝑝)+ 𝐵𝑢𝑠𝑦 𝑜𝑣𝑒𝑟ℎ𝑒𝑎𝑑 (𝑝)

Step 4 – System and workload parameters Problem size Communication-computation ratio Execution time breakdown in different parts Load balance Temporal locality Spatial locality

Selection of technique Criterion Modeling Simulation Measurement Stage Any Post-prototype 2. Time required Small Medium Varies 3. Tools Analyst Computer languages/ simulator Instrumentation 4. Accuracy Low Moderate 5. Trade-off evaluation Easy Difficult 6. Cost High 7. Scalability Slightly modified table 3.1 from The art of computer systems performance analysis. Raj Jain. Wiley. 1991.

Commodity Component Distributed System SATA 6Gbit/s … … 1TB on 100 nodes => 14s On 1000 nodes => 1.4s 1PB on 100 nodes => 4h On 1000 nodes => 23min

Stallo

Berkeley AMPlab https://amplab.cs.berkeley.edu/software/

Mandatory assignments B-tree Deduplication engine Spark PageRank on AWS

Exercises and readings