Circuit Placement w/ Multi-core Processors May 10-02 Mike Drob Grant Furgiuele Ben Winters Advisor: Dr. Chris Chu Client: IBM Design Presentation.

Slides:



Advertisements
Similar presentations
Efficient Multiprogramming for Multicores with SCAF Published in MICRO-46, December Published by Timothy Creech, Aparna Kotha and Rajeev Barua. Presented.
Advertisements

FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model FastPlace: Efficient Analytical Placement.
Scheduling Algorithms
CPU Scheduling CS 3100 CPU Scheduling1. Objectives To introduce CPU scheduling, which is the basis for multiprogrammed operating systems To describe various.
SYNAR Systems Networking and Architecture Group CMPT 886: Special Topics in Operating Systems and Computer Architecture Dr. Alexandra Fedorova School of.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
Speeding up VirtualDub Presented by: Shmuel Habari Advisor: Zvika Guz Software Systems Lab Technion.
Implementation of a satellite on a Multi-Core System A project by: Daniel Aranki Mohammad Nassar Supervised by: Mony Orbach Winter 2009 Characterization.
QAP/MPI Adam Gaweda Anthony Habash Ray Brown. What is QAP Stands for Quadratic Assignment Problem The QAP is the problem of assigning a set of facilities.
Software Performance Tuning Project – Final Presentation Prepared By: Eyal Segal Koren Shoval Advisors: Liat Atsmon Koby Gottlieb.
Using JetBench to Evaluate the Efficiency of Multiprocessor Support for Parallel Processing HaiTao Mei and Andy Wellings Department of Computer Science.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program
Silberschatz, Galvin and Gagne ©2011Operating System Concepts Essentials – 8 th Edition Chapter 4: Threads.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Software Performance Analysis Using CodeAnalyst for Windows Sherry Hurwitz SW Applications Manager SRD Advanced Micro Devices Lei.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Cell processor implementation of a MILC lattice QCD application.
InCoB August 30, HKUST “Speedup Bioinformatics Applications on Multicore- based Processor using Vectorizing & Multithreading Strategies” King.
Computational Biology 2008 Advisor: Dr. Alon Korngreen Eitan Hasid Assaf Ben-Zaken.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition, Chapter 4: Multithreaded Programming.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
Threads. Readings r Silberschatz et al : Chapter 4.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
May Mike Drob Grant Furgiuele Ben Winters Advisor: Dr. Chris Chu Client: IBM IMB Contact – Karl Erickson.
Chapter 4: Threads.
Update on G5 prototype Andrei Gheata Computing Upgrade Weekly Meeting 26 June 2012.
Thermal Management in Datacenters Ayan Banerjee. Thermal Management using task placement Tasks: Requires a certain number of servers (cores) for a specified.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
May Mike Drob Grant Furgiuele Ben Winters Advisor: Dr. Chris Chu Client: IBM IBM Contact – Karl Erickson.
Parallel Computing Presented by Justin Reschke
SSU 1 Dr.A.Srinivas PES Institute of Technology Bangalore, India 9 – 20 July 2012.
An Analysis of Memory Access in Relation to Operations on Data Structures Ryan Connaughton & Daniel Rinzler CSE December 13, 2006 An Analysis of.
Analyzing Memory Access Intensity in Parallel Programs on Multicore Lixia Liu, Zhiyuan Li, Ahmed Sameh Department of Computer Science, Purdue University,
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Rakesh Kumar Keith Farkas Norman P Jouppi,Partha Ranganathan,Dean M.Tullsen University of California, San Diego MICRO 2003 Speaker : Chun-Chung Chen Single-ISA.
1 Chapter 5: Threads Overview Multithreading Models & Issues Read Chapter 5 pages
Lecture 5. Example for periority The average waiting time : = 41/5= 8.2.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
CSCI206 - Computer Organization & Programming
Chapter 4: Multithreaded Programming
Chapter 4: Threads.
lecture 5: CPU Scheduling
Chapter 6: CPU Scheduling
CS427 Multicore Architecture and Parallel Computing
Breakout Session 3 Alex, Mirco, Vojtech, Juraj, Christoph
Resource Aware Scheduler – Initial Results
High Performance Computing on an IBM Cell Processor --- Bioinformatics
Processing Framework Sytse van Geldermalsen
Operating System Concepts
Chapter 6: CPU Scheduling
Chapter 4: Threads.
Chapter 5: CPU Scheduling
CSCI206 - Computer Organization & Programming
Chapter 4: Threads.
Erlang Multicore support
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
EE 4xx: Computer Architecture and Performance Programming
Chapter 4: Threads & Concurrency
Chapter 4: Threads.
What I've done in past 6 months
Chapter 5: CPU Scheduling
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

Circuit Placement w/ Multi-core Processors May Mike Drob Grant Furgiuele Ben Winters Advisor: Dr. Chris Chu Client: IBM Design Presentation

Project Overview Circuit Placement problem is bottleneck of physical design Currently only single-core – no threads Will attempt to parallelize some functions of the FastPlace algorithm using the linux pthreads library. Will implement RQL idea into FastPlace

Design Considerations Paralellize certain CPU and time heavy functions  Profiled existing algorithm using gprof in Linux Analyze ease of parallelizing parts of the algorithm  Spring potential energy calculation  Global Placesment (Matrix Problem)  Local refinement optimizations

Design Considerations (cont.) Cores v. Threads  Speedup only continues with more cores if there are more threads  Specify number of cores at compilation time or at run time? RQL concept  Nullify the spreading forces on a small portion of the modules with highest force  Leaves these modules at their quadratically optimal location

Paralellization Priorities Using the ISPD2005 Benchmarks Function Name% Time move_8pt_clustering_withMap50.59 update_autil_ density_move_8pt_clustering_withMap8.87 move_8pt_withMap9.17 density_move_8pt_withMap3.79 wirelen3.32 shiftBlocks2.15 density_update_autil11.72 move_8pt_PP1.47 mapcoreRegion1.31 Function Name% Time swap_move_FM40.67 v_swap_FM24.43 local_order3_FM12.41 new_compact findSegmentList3.9 distributeCells1.76 find_optimal_region1.4 flipOneSeg0.58 ilr_legalizer0.56 wirelen0.44 Global PlacementDetailed Placement

Desired Performance Gains Unit Time Taken by Top 10 Most Time Consuming Functions vs. All Other Functions

Prototyping Program which looped 10,000,000 times.  Took ~6.5 seconds on single core  With two threads on dual core, took ~3.5 seconds  With four threads on quad core, took ~2 seconds Results not quite halved / fourthed due to overhead.  Thread creation overhead  System overhead

Testing Considerations Frequent Testing  On a per method basis  Will use Valgrind to profile performance Gprof doesn’t work with threaded programs  Testing done on a variety of systems Test for consistant performance levels in comparable systems Test for increased performance in more capable systems

Schedule

Personnel Effort Estimates

Cost Estimates

Questions?