Community Grids Laboratory

Slides:



Advertisements
Similar presentations
CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. Rosa Filgueira, David E.Singh, Alejandro Calderón and Jesús.
Advertisements

Matching Memory Access Patterns and Data Placement for NUMA Systems Zoltán Majó Thomas R. Gross Computer Science Department ETH Zurich, Switzerland.
XEON PHI. TOPICS What are multicore processors? Intel MIC architecture Xeon Phi Programming for Xeon Phi Performance Applications.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Chapter Hardwired vs Microprogrammed Control Multithreading
Nor Asilah Wati Abdul Hamid, Paul Coddington. School of Computer Science, University of Adelaide PDCN FEBRUARY 2007 AVERAGES, DISTRIBUTIONS AND SCALABILITY.
1 Performance Evaluation of Gigabit Ethernet & Myrinet
1 Multicore and Cloud Futures CCGSC September Geoffrey Fox Community Grids Laboratory, School of informatics Indiana University
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.
 The processor number is one of several factors, along with processor brand, specific system configurations and system-level benchmarks, to be.
SALSASALSA Programming Abstractions for Multicore Clouds eScience 2008 Conference Workshop on Abstractions for Distributed Applications and Systems December.
User-Level Process towards Exascale Systems Akio Shimada [1], Atsushi Hori [1], Yutaka Ishikawa [1], Pavan Balaji [2] [1] RIKEN AICS, [2] Argonne National.
Service Aggregated Linked Sequential Activities GOALS: Increasing number of cores accompanied by continued data deluge Develop scalable parallel data mining.
PC08 Tutorial 1 CCR Multicore Performance ECMS Multiconference HPCS 2008 Nicosia Cyprus June Geoffrey Fox, Seung-Hee Bae, Neil Devadasan,
DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program
A performance analysis of multicore computer architectures Michel Schelske.
Impact of Network Sharing in Multi-core Architectures G. Narayanaswamy, P. Balaji and W. Feng Dept. of Comp. Science Virginia Tech Mathematics and Comp.
Service Aggregated Linked Sequential Activities GOALS: Increasing number of cores accompanied by continued data deluge. Develop scalable parallel data.
Applications and Runtime for multicore/manycore March Geoffrey Fox Community Grids Laboratory Indiana University 505 N Morton Suite 224 Bloomington.
1 Performance of a Multi-Paradigm Messaging Runtime on Multicore Systems Poster at Grid 2007 Omni Austin Downtown Hotel Austin Texas September
MPI and OFA Divergent interests? Dan Caldwell, VP WW Channel Sales Scali, Inc.
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
Message-based MVC and High Performance Multi-core Runtime Xiaohong Qiu December 21, 2006.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
ESMF Performance Evaluation and Optimization Peggy Li(1), Samson Cheung(2), Gerhard Theurich(2), Cecelia Deluca(3) (1)Jet Propulsion Laboratory, California.
Planned AlltoAllv a clustered approach Stephen Booth (EPCC) Adrian Jackson (EPCC)
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
1 Performance Measurements of CCR and MPI on Multicore Systems Expanded from a Poster at Grid 2007 Austin Texas September Xiaohong Qiu Research.
Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.
Service Aggregated Linked Sequential Activities: GOALS: Increasing number of cores accompanied by continued data deluge Develop scalable parallel data.
Shanghai Many-Core Workshop, March Judy Qiu Research.
Message Management April Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN.
Bronis R. de Supinski and John May Center for Applied Scientific Computing March 18, 1999 Benchmarking pthreads.
1 Multicore for Science Multicore Panel at eScience 2008 December Geoffrey Fox Community Grids Laboratory, School of informatics Indiana University.
A Preliminary Investigation on Optimizing Charm++ for Homogeneous Multi-core Machines Chao Mei 05/02/2008 The 6 th Charm++ Workshop.
Understanding Parallel Computers Parallel Processing EE 613.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Intra-Socket and Inter-Socket Communication in Multi-core Systems Roshan N.P S7 CSB Roll no:29.
11 Brian Van Straalen Portable Performance Discussion August 7, FASTMath SciDAC Institute.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Martin Kruliš by Martin Kruliš (v1.1)1.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
1 Multicore Salsa Parallel Computing and Web 2.0 Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October Geoffrey Fox, Huapeng Yuan,
Lecture 5. Example for periority The average waiting time : = 41/5= 8.2.
Assembler, Compiler, MIPS simulator
Tom LeCompte High Energy Physics Division Argonne National Laboratory
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Uniprocessor Performance
Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 11 Amazon Web Services Prof. Zhang Gang
Core i7 micro-processor
Latency Measurement Testing
Overview Parallel Processing Pipelining
Lecture 22 review PRAM: A model developed for parallel machines
Geoffrey Fox, Huapeng Yuan, Seung-Hee Bae Xiaohong Qiu
Chapter 4: Threads.
GCC2008 (Global Clouds and Cores 2008) October Geoffrey Fox
Scalable Parallel Interoperable Data Analytics Library
Introduction to Multiprocessors
Hybrid Programming with OpenMP and MPI
MPJ: A Java-based Parallel Computing System
3 Questions for Cluster and Grid Use
Multicore and GPU Programming
Optimizing MPI collectives for SMP clusters
Multicore and GPU Programming
Run time performance for all benchmarked software.
Cluster Computers.
CReSIS Cyberinfrastructure
Clouds and Grids Multicore and all that
Presentation transcript:

Community Grids Laboratory http://grids.ucs.indiana.edu/ptliupages/presentations/PC2007/ contains a set of 4 parallel computing lectures given by Fox at Microsoft Research February 26 to March 1 2007 http://www.connotea.org/user/crmc is a tagged collection of multicore links We analyzed Microsoft CCR which supports an exchange of messages between threads using named ports and a service architecture DSS built on CCR for Robotics applications DSS gives on multicore systems 40 microsecond latency for 2-way messages and CCR a few microsecond latency for MPI style communication This package is attractive for a broad class of applications Looking at machine learning and computer chess applications

Summary of CCR Stage Overheads for Intel 4-core 2-processor Machine These are stage switching overheads in microseconds for a set of runs with different levels of parallelism and different message patterns –each stage takes about 30 microseconds. 2-core 2-processor Xeon overheads in parentheses. We also benchmarked 2-core 2-processor AMD machine These measurements are equivalent to MPI latencies Match uses a number of threads = number of parallel computations Default uses 8 threads