UPC Status Report - 10/12/04 Adam Leko UPC Project, HCS Lab University of Florida Oct 12, 2004.

Slides:



Advertisements
Similar presentations
EDA Lab. Dept. of Computer Engineering C. N. U. 1 SYNTHESIS Issues in synthesizable VHDL descriptions (from VHDL Answers to FAQ by Ben Cohen)
Advertisements

Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
Distributed Systems CS
Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.
1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.
Enforcing Sequential Consistency in SPMD Programs with Arrays Wei Chen Arvind Krishnamurthy Katherine Yelick.
Programming Types of Testing.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011.
MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,
A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.
Introduction CS 524 – High-Performance Computing.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
16/13/2015 3:30 AM6/13/2015 3:30 AM6/13/2015 3:30 AMIntroduction to Software Development What is a computer? A computer system contains: Central Processing.
Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.
Tools for Engineering Analysis of High Performance Parallel Programs David Culler, Frederick Wong, Alan Mainwaring Computer Science Division U.C.Berkeley.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Parallelizing Compilers Presented by Yiwei Zhang.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Memory access scheduling Authers: Scott RixnerScott Rixner,William J. Dally,Ujval J. Kapasi, Peter Mattson, John D. OwensWilliam J. DallyUjval J. KapasiPeter.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
1 Compiling with multicore Jeehyung Lee Spring 2009.
Analyzing the Energy Efficiency of a Database Server Hanskamal Patel SE 521.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Chapter 8: Problem Solving
Calculating Discrete Logarithms John Hawley Nicolette Nicolosi Ryan Rivard.
MpiP Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
An Introduction to Digital Systems Simulation Paolo PRINETTO Politecnico di Torino (Italy) University of Illinois at Chicago, IL (USA)
Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Predictive Design Space Exploration Using Genetically Programmed Response Surfaces Henry Cook Department of Electrical Engineering and Computer Science.
IPDPS 2005, slide 1 Automatic Construction and Evaluation of “Performance Skeletons” ( Predicting Performance in an Unpredictable World ) Sukhdeep Sodhi.
CS3505: DATA LINK LAYER. data link layer  phys. layer subject to errors; not reliable; and only moves information as bits, which alone are not meaningful.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
1 RapidIO Testbed Update Chris Conger Honeywell Project 1/25/2004.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
A Preliminary Investigation on Optimizing Charm++ for Homogeneous Multi-core Machines Chao Mei 05/02/2008 The 6 th Charm++ Workshop.
CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.
CPSC 871 John D. McGregor Module 8 Session 1 Testing.
Unit 1 Lecture 4.
Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.
Single Node Optimization Computational Astrophysics.
Testing plan outline Adam Leko Hans Sherburne HCS Research Laboratory University of Florida.
Copyright 2014 – Noah Mendelsohn Performance Analysis Tools Noah Mendelsohn Tufts University Web:
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
CPSC 372 John D. McGregor Module 8 Session 1 Testing.
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
COMPSCI 110 Operating Systems
Pipelining and Retiming 1
John D. McGregor Session 9 Testing Vocabulary
CSCE 212 Chapter 4: Assessing and Understanding Performance
John D. McGregor Session 9 Testing Vocabulary
LightRing with Tunable Transceivers
Gary M. Zoppetti Gagan Agrawal
Overfitting and Underfitting
Memory System Performance Chapter 3
TensorFlow: A System for Large-Scale Machine Learning
Programming with Shared Memory Specifying parallelism
Presentation transcript:

UPC Status Report - 10/12/04 Adam Leko UPC Project, HCS Lab University of Florida Oct 12, 2004

2 NSA bench9 ● Simple code – Given stream A – Two parameters ● L – number of elements in A ● N – number of bits for each element in A – Compute B i = A i “right justified” ● > 0001 ● > 0101 ● > 1011 ● Removes factors of 2 from list – Compute C such that B i * C i = 1 mod 2 N ● Parameters for experiments – N=48 (recommended: N=30 or N=46) – L=5*10^6 (recommended: L = 5*10^7)

3 Oct 12, 2004 Program flow ● Computation section (embarrassingly parallel) – Fill up A with [rand() & (2 N -1)] + 1 – Compute B & C ● B: directly by right shifts ( >> 1) ● C: iterative algorithm – x n -> n correct bits computed – x 2j = x j * (2 – C i * x j ) mod 2 2j – Example: 12 bits, B i =127 ● x 3 = 7 mod 8 ● x 6 = 7 * (2 – 127*7) = 63 mod 64 ● x 12 = 63 * (2 – 127*63) = 3967 mod 4096 ● Check section (gather) – First node checks all values to verify B i *C i = 1 mod 2 N – Fits along with benchmark ● “Output selected values from A, B, and C”

4 Oct 12, 2004

5

6 Analysis for factors (user's perspective) ● Big question: where is time being spent? Which statements in source code use the most cycles? – Which statements incur remote accesses? ● Factors: network characteristics, communication patterns – Which threads are sitting idle? ● Factors: CPU utilization, parallel efficiency, synchronization overhead – How close am I to peak GFLOPS? ● Factors: all, especially lower-level cache and network/memory – How expensive and how much synchronization? ● Factors: synchronization algorithms, network/memory latency

7 Oct 12, 2004 Analysis strategy ● Come up with list of questions we want our performance tool to answer ● Think about possible factors in terms of which questions they answer or help answer – Split up some questions in terms of combinations of factors – Try to get as many as possible – Preliminary list from brainstorming? ● Based on important questions from above – Perform sensitivity study ● Assemble microbenchmark suite to isolate factors ● Vary parameters artificially – Also run through list of questions and catalog answers ● Can we record this factor? etc ● Combine results from sensitivity study with survey and tool study to get preliminary list of factors

8 Oct 12, 2004 Individual part of project ● Contacting developers – Sent out to all developers from contact list – Purpose ● Understand “compiler weirdness” ● Get ideas for factors ● Get access to a Cray machine? ● Look at benchmarks – Get ideas for factors ● Start on next coding project – convolution ● Model-driven factor development

9 Oct 12, 2004 Model-driven factor development ● Start up one or more performance models that take into account major performance factors ● Tune those models to Marvel, lambda+IBA, kappa+SCI ● General idea: – If a performance model can have 90%+ accuracy, then using the model we can determine which factors are import for which architectures – And thus what to concentrate on and what to show user – Gives us a good understanding of “what's going on” – Also can be used to validate factors we have chosen ● Issues – Existing models? – Simulation or equations? – Corner cases? – Too hard?