Download presentation
Presentation is loading. Please wait.
Published byMadlyn Summers Modified over 9 years ago
1
1 Components of the Virtual Memory System Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk page offset virtual page number (VPN) PPN tag index block offset
2
2 Hard drives The textbook shows the ugly guts of a hard disk —Data is stored on double-sided magnetic disks called platters —Each platter is arranged like a record, with many concentric tracks —Tracks are further divided into individual sectors, which are the basic unit of data transfer —Each surface has a read/write head like the arm on a record player, but all the heads are connected and move together A 75GB IBM Deskstar has roughly: —5 platters (10 surfaces), —27,000 tracks per surface, —512 bytes per sector, —~512 sectors per track… …but this number increases going from the center to the rim
3
3 There are two fundamental performance metrics for I/O systems: 1.LatencyTime to initiate data-transfer(units = sec) 2.BandwidthRate of initiated data-transfer(units = bytes/sec) Time = latency + transfer_size / bandwidth sec bytes / (bytes/sec) I/O Performance Dominant term for small transfers Dominant term for large transfers
4
4 Accessing data on a hard disk Factors affecting latency: —Seek time measures the delay for the disk head to reach the track —A rotational delay accounts for the time to get to the right sector Factors affecting bandwidth: —Usually the disk can read/write as fast as it can spin —Bandwidth is determined by the rotational speed, which also determines the rotational delay We can compute average seek time/rotational delay/etc. but careful placement of data on the disk can speed things up considerably: —head is already on or near the desired track —data in contiguous sectors —in other words, locality! Even so, loading a page from the hard-disk can take tens of milliseconds
5
5 Parallel I/O Many hardware systems use parallelism for increased speed A redundant array of inexpensive disks or RAID system allows access to several hard drives at once, for increased bandwidth —similar to interleaved memories from last week
6
CS232 roadmap Here is what we have covered so far 1.Understanding the relationship between HLL and assembly code 2.Processor design, pipelining, and performance 3.Memory systems, caches, virtual memory, I/O The next major topic is: performance tuning —How can I, as a programmer, make my programs run fast? —First step: where/why is my program slow? Program profiling How does one go about optimizing a program? —Use better algorithms/data structures —Exploit the processor better (after Midterm 3)
7
7 Performance Optimization Flowchart “We should forget about small efficiencies, say about 97% of the time.” -- Sir Tony Hoare
8
8 Collecting data The process is called “instrumenting the code” One option is to do this by hand: —record entry and exit times for suspected “hot” blocks of code —but this is tedious and error prone Fortunately, there are tools to do this instrumenting for us: —Gprof: The GNU profiler (run g++ with the -pg flag) —g++ keeps track of source code object code correspondence —also links in a profiling signal handler the program requests OS to periodically send it signals signal handler records instruction that was executing ( gmon.out ) —Display results: gprof a.out Shows how much time is being spent in each function Shows the path of function calls to the hot spot
9
9 Collecting data (cont.) gprof doesn’t say where inside the function the time is spent —in a big function, this is still a needle in a haystack gcov : the GNU coverage tool 1.Compile/link as g++ mp6.cxx -fprofile-arcs -ftest-coverage records frequency of each line (like –prof_file in spim ) 2.Run as normal (creates mp6.gcda and mp6.gcno ) 3.Run gcov mp6.cxx and then open mp6.cxx.gcov
10
10 Performance Optimization, cont. How do we fix performance problems?
11
11 Original code (135 vertices) valgrind --tool=cachegrind a.out I refs: 46,049,383 I1 misses: 1,147 L2i misses: 1,128 I1 miss rate: 0.00% L2i miss rate: 0.00% D refs: 27,788,106 (24,723,408 rd + 3,064,698 wr) D1 misses: 869,707 ( 862,893 rd + 6,814 wr) L2d misses: 11,175 ( 4,804 rd + 6,371 wr) D1 miss rate: 3.1% ( 3.4% + 0.2% ) L2d miss rate: 0.0% ( 0.0% + 0.2% ) L2 refs: 870,854 ( 864,040 rd + 6,814 wr) L2 misses: 12,303 ( 5,932 rd + 6,371 wr) L2 miss rate: 0.0% ( 0.0% + 0.2% ) 19.267u 0.023s 0:19.45 99.1%
12
12 Original code ( O2 compiler flag) I refs: 17,354,974 I1 misses: 1,138 L2i misses: 1,119 I1 miss rate: 0.00% L2i miss rate: 0.00% D refs: 7,376,517 (6,835,268 rd + 541,249 wr) D1 misses: 868,703 ( 861,894 rd + 6,809 wr) L2d misses: 11,176 ( 4,805 rd + 6,371 wr) D1 miss rate: 11.7% ( 12.6% + 1.2% ) L2d miss rate: 0.1% ( 0.0% + 1.1% ) L2 refs: 869,841 ( 863,032 rd + 6,809 wr) L2 misses: 12,295 ( 5,924 rd + 6,371 wr) L2 miss rate: 0.0% ( 0.0% + 1.1% ) 18.937u 0.027s 0:19.15 98.9%
13
13 Better cache usage I refs: 19,827,127 I1 misses: 1,148 L2i misses: 1,130 I1 miss rate: 0.00% L2i miss rate: 0.00% D refs: 6,281,750 (5,710,369 rd + 571,381 wr) D1 misses: 199,355 ( 191,241 rd + 8,114 wr) L2d misses: 12,480 ( 4,805 rd + 7,675 wr) D1 miss rate: 3.1% ( 3.3% + 1.4% ) L2d miss rate: 0.1% ( 0.0% + 1.3% ) L2 refs: 200,503 ( 192,389 rd + 8,114 wr) L2 misses: 13,610 ( 5,935 rd + 7,675 wr) L2 miss rate: 0.0% ( 0.0% + 1.3% ) 1.146u 0.027s 0:01.20 96.6%
14
14 Manual Optimizations I refs: 16,347,210 I1 misses: 1,148 L2i misses: 1,130 I1 miss rate: 0.00% L2i miss rate: 0.00% D refs: 5,105,818 (4,559,900 rd + 545,918 wr) D1 misses: 198,821 ( 190,840 rd + 7,981 wr) L2d misses: 12,479 ( 4,804 rd + 7,675 wr) D1 miss rate: 3.8% ( 4.1% + 1.4% ) L2d miss rate: 0.2% ( 0.1% + 1.4% ) L2 refs: 199,969 ( 191,988 rd + 7,981 wr) L2 misses: 13,609 ( 5,934 rd + 7,675 wr) L2 miss rate: 0.0% ( 0.0% + 1.4% ) 0.457u 0.025s 0:00.49 95.9%
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.