Dijkstra’s Algorithm Keep Going!. Pre-Computing Shortest Paths How many paths to pre-compute? Recall: –Using single-source to single-dest find_path: Need.

Slides:



Advertisements
Similar presentations
9.4 Page Replacement What if there is no free frame?
Advertisements

Chapter 2 Machine Language.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
A Parallel GPU Version of the Traveling Salesman Problem Molly A. O’Neil, Dan Tamir, and Martin Burtscher* Department of Computer Science.
Motion Planning CS 6160, Spring 2010 By Gene Peterson 5/4/2010.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Performance See: P&H 1.4.
EECE476: Computer Architecture Lecture 11: Understanding and Assessing Performance Chapter 4.1, 4.2 The University of British ColumbiaEECE 476© 2005 Guy.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
More Graph Algorithms Weiss ch Exercise: MST idea from yesterday Alternative minimum spanning tree algorithm idea Idea: Look at smallest edge not.
Multi-core processors. History In the early 1970’s the first Microprocessor was developed by Intel. It was a 4 bit machine that was named the 4004 The.
Introduction to CMOS VLSI Design Case Study: Intel Processors.
Traveling Salesman Problem Continued. Heuristic 1 Ideas? –Go from depot to nearest delivery –Then to delivery closest to that –And so on until we are.
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
Writer:-Rashedul Hasan Editor:- Jasim Uddin
Computer Processing of Data
Different CPUs CLICK THE SPINNING COMPUTER TO MOVE ON.
Introduction to CMOS VLSI Design Lecture 22: Case Study: Intel Processors David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Lecture 22: Case Study: Intel Processors David Harris Harvey Mudd College Spring 2004.
The Central Processing Unit
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
CENTRAL PROCESSING UNIT – a,b,c & d a - The Purpose of a CPU The CPU is the brain of the computer. The Purpose of the CPU is to process.
10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.
Games Development 2 Concurrent Programming CO3301 Week 9.
Process by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
Lecture 20: Parallelism & Concurrency CS 62 Spring 2013 Kim Bruce & Kevin Coogan CS 62 Spring 2013 Kim Bruce & Kevin Coogan Some slides based on those.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Playstation2 Architecture Architecture Hardware Design.
HOW COMPUTERS WORK THE CPU & MEMORY. THE PARTS OF A COMPUTER.
Ramakrishna Lecture#2 CAD for VLSI Ramakrishna
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
1 A simple parallel algorithm Adding n numbers in parallel.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
Measuring Performance Based on slides by Henri Casanova.
SPANNING TREES, INTRO. TO THREADS Lecture 23 CS2110 – Fall
Traveling Courier / Milestone 4 Continued. Recall Pre-compute all shortest paths you might need? –Then just look up delays during pertubations How many.
Stored Program Concept Learning Objectives Learn the meaning of the stored program concept The processor and its components The fetch-decode-execute and.
GCSE OCR Computing A451 The CPU Computing hardware 1.
Depth First Seach: Output Fix
Operating Systems (CS 340 D)
Multi-core processors
The University of Adelaide, School of Computer Science
CPU Efficiency Issues.
CSCI206 - Computer Organization & Programming
Multi-core processors
What happens inside a CPU?
Dynamic Memory CSCE 121 J. Michael Moore.
Operating Systems (CS 340 D)
CMSC 341 Prof. Michael Neary
Chapter 9: Virtual-Memory Management
CPU Key Revision Points.
Control unit extension for data hazards
Multithreading Why & How.
M4 and Parallel Programming
Mental Health and Wellness Resources
Dynamic Memory And Objects
Efficiently Estimating Travel Time
Control unit extension for data hazards
Chapter 3: Processes Process Concept Process Scheduling
Presentation transcript:

Dijkstra’s Algorithm Keep Going!

Pre-Computing Shortest Paths How many paths to pre-compute? Recall: –Using single-source to single-dest find_path: Need any delivery location to any other travel time: –N * (N-1)  2450 calls for N = 50 Plus any depot to any delivery location –M * N  500 calls for N = 50, M = 10 Plus any delivery location to any depot –N * M  500 calls –Total: 3450 calls to your find_path

Pre-Computing Travel Time Paths Using single-source to all destinations –Need any delivery location to any other N calls  50 –Plus any depot to any delivery location M calls  10 –Plus any delivery location to any depot 0 calls –Total: 60 calls Is this the minimum? –No, with small change can achieve: 51 calls Get from earlier call

Is This Fast Enough? Recall: –Dijkstra’s algorithm can search whole graph –Especially with multiple destinations –O(N) items to put in wavefront –Using heap / priority_queue: O (log N) to add / remove 1 item from wavefront Total: –N log N –Can execute in well under a second –OK!

Escaping Local Minima Revisited

Say We’re In This State Local perturbation to improve? deliveryOrder = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Swap Order of Two Deliveries? deliveryOrder = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} deliveryOrder = {0, 1, 3, 2, 4, 5, 6, 7, 8, 9} No swap of two deliveries can improve! Stuck in a local minimum

2-Opt? Path cut into 3 pieces deliveryOrder = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

2-Opt? Reconnected: worse! deliveryOrder = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}deliveryOrder = {0, 1, 2, 6, 5, 4, 3, 7, 8, 9}

2-Opt? Reconnected differently: now better! deliveryOrder = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}deliveryOrder = {0, 1, 2, 7, 8, 9, 6, 5, 4, 3}

Perturbations & Local Minima Explore lots of local perturbations –Compute travel time for each –To see what’s better Escape local minima with more powerful perturbations And/or use high climbing Powerful unifying technique –Simulated annealing –Lots of high climbing early –Little later (metal has cooled)

How Do I Finish by the Time Limit? #include #define TIME_LIMIT 30 // m4: 30 second time limit int main ( ) { clock_t startTime = clock (); // Clock “ticks” do { myOptimizer (); clock_t currentTime = clock (); float timeSecs = ((float) (currentTime – startTime)) / CLOCKS_PER_SEC; // Keep optimizing until within 10% of time limit } while (timeSecs < 0.9 * TIME_LIMIT);... }

Algorithm Challenge

Algorithms: Challenge Question Frank likes what he calls “cool” numbers For cool numbers, there are integers x and y such that –Cool number = x 2 = y 3 –For example, 1 is cool (= 1 2 = 1 3 ) and 64 is cool (= 8 2 = 4 3 ) –25 is not cool (= 5 2, but no integer cubed = 25) 1.Write a program to print all cool numbers between 1 and N 2.Calculate the computational complexity of your program 3.Mail me program & complexity: first 5 of lowest complexity  chocolate bar in class Fri. Source: ACM Programming Competition

Multithreading Why & How

Intel 8086 First PC microprocessor ,000 transistors 5 MHz ~10 clocks / instruction ~500,000 instructions / s

Intel Core i billion transistors 3.5 GHz ~15 clocks / instruction, but ~30 instructions in flight at once  Average about 2 instructions completed / clock Can execute ~7 billion instructions / s

1978 to ,000x more transistors ~14,000x more instructions / s The future: –Still getting 2X the transistors every 2 years –But transistors not getting much faster Clock speed saturating –~30 instructions in flight Complexity & power to go beyond this climbs rapidly Slow growth in instructions / cycle –Impact: CPU speed not increasing as rapidly Using multiple processors (cores) now important Multithreading: one program using multiple cores at once

A Single-Threaded Program Instructions (code) Memory Global Variables Heap Variables (new) Stack (local variables)... Program Counter Stack Pointer CPU / Core

A Multi-Threaded Program Instructions (code) Memory Global Variables Heap Variables (new) Stack1 (local variables)... Program Counter Stack Pointer Core1 Program Counter Stack Pointer Core2 Stack2 (local variables) thread 1 thread 2 Shared by all threads Each thread gets own local variables

Thread Basics Each thread has own program counter –Can be executing a different function –Is (almost always) executing a different instruction from other threads Each thread has own stack –Has its own copy of local variables (all different) Each thread sees same global variables Dynamically allocated memory –Shared by all threads –Any thread with a pointer to it can access

Implications Threads can communicate through memory –Global variables –Dynamically allocated memory –Fast communication! Must be careful threads don’t conflict in reads/write to same memory –What if two threads update the same global variable at the same time? –Not clear which update wins! Can have more threads than CPUs –Time share the CPUs