Download presentation
Presentation is loading. Please wait.
Published byAmelia Austin Modified over 8 years ago
1
İnformasiya texnologiyaları kafedrası Mövzu № 2. CISC və RISC tipli mikroprosessorlar. (Prosessorların arxitekturası. CISC arxitekturalı prosessorlar. RISC arxitekturalı prosessorlar. Çoxnüvəli prosessorların arxitekturası.) KOMPÜTERİN TƏŞKİLİ VƏ TEXNOLOGİYALARI (Computer Organization & Technologies) www.berkut.ws/comporgtech.html Azər Fərhad oğlu Həsənov www.berkut.ws/teaching.html education@berkut.ws iş nömrəsi: 497-26-00, 24-20
2
Mövzu № 2. CISC və RISC tipli mikroprosessorlar. Prosessorların arxitekturası. CISC arxitekturalı prosessorlar (Complex Instruction Set Computers). RISC arxitekturalı prosessorlar (Reduced Instruction Set Computers). RISC arxitekturasının əsas xüsusiyyətləri. RISC prosessorlarda registrlər. RISC arxitekturasının üstünlükləri və çatışmazlıqları. Çoxnüvəli prosessorların arxitekturası.
3
Prosessorların arxitekturası prosessorun registrlərinin (registr faylının) funksiyaları və ölçüləri; yazma və oxuma üçün yaddaşa müraciət zamanı üsul və məhdudiyyətlər; bir əmrdə yerinə yetirilən əməliyyatların sayı; əmrlərin uzunluğu (dəyişən və ya fiksasiya olunmuş); verilənlər tiplərinin sayı.
4
Complex Instruction Set Computers mərkəzi prosessorun bir neçə taktına yerinə yetirilən çoxlu sayda (yüzlərlə) müxtəlif maşın əmrləri; proqramlaşdırılabilən məntiqli idarəetmə qurğusu; ümumi təyinatlı registrlərin (ÜTR) daha da çoxluğu; müxtəlif uzunluqlu müxtəlif formatlı əmrlər; iki ünvanlı ünvanlaşdırmanın olması; müxtəlif dolayı ünvanlaşdırılma üsullarını özündə saxlayan operandların ünvanlaşdırılması mexanizmlərinin inkişafı.
5
Reduced Instruction Set Computers bütün əmrlərin (və ya heç olmasa əmrlərin 75%-nin) bir dövrə yerinə yetirilməsi; bütün əmrlərin unifikasiya olunmuş konveyerli emalına icazə verən adi söz uzunluqlu və verilənlər şinin eninə bərabər olan standart söz uzunluqlu əmrlər; əmrlərin daha az sayı (128-dən çox olmayan); əmrlərin formatının daha az sayı (4-dən çox olmayan); ünvanlaşdırma üsullarının daha az sayı (4-dən çox olmayan); yalnız “oxuma” və “yazma” əmrləri ilə yaddaşa müraciət; “oxuma” və “yazma” əmrlərindən başqa yerdə qalanların prosessor daxili registrlər arası ötürülmələrdən istifadəsi; aparat məntiqli İQ; prosessorun daha böyük (32-dən az olmayan) ümumi təyinatlı registrlər faylı (müasir RISC mikroprosessorlarında bu 500 aşa bilir).
6
RISC prosessorlarda registrlər
9
Çoxnüvəli prosessorların arxitekturası
10
Şəkil 2.5. Müxtəlif istehsalçıların çoxnüvəli prosessorlarının strukturları: a – IBM Power 6; b – Intel Pentium D;
11
Çoxnüvəli prosessorların arxitekturası Şəkil 2.5. Müxtəlif istehsalçıların çoxnüvəli prosessorlarının strukturları: c – Intel Core 2 Quad; d – Intel Nehalem;
12
Çoxnüvəli prosessorların arxitekturası Şəkil 2.5. Müxtəlif istehsalçıların çoxnüvəli prosessorlarının strukturları: e – Intel Itanium 3 Tukwila; f – AMD Phenom X4; g – Sun UltraSPARC T2;
13
Çoxnüvəli prosessorların arxitekturası Şəkil 2.5. Müxtəlif istehsalçıların çoxnüvəli prosessorlarının strukturları: h – Intel Core i7-990X.
14
+ William Stallings Computer Organization and Architecture 9 th Edition
15
+ Chapter 15 Reduced Instruction Set Computers (RISC)
16
Table 15.1 Characteristics of Some CISCs, RISCs, and Superscalar Processors
17
Instruction Execution Characteristics High-level languages (HLLs) Allow the programmer to express algorithms more concisely Allow the compiler to take care of details that are not important in the programmer’s expression of algorithms Often support naturally the use of structured programming and/or object-oriented design Semantic gap The difference between the operations provided in HLLs and those provided in computer architecture Operations performed Determine the functions to be performed by the processor and its interaction with memory Operands used The types of operands and the frequency of their use determine the memory organization for storing them and the addressing modes for accessing them Execution sequencing Determines the control and pipeline organization
18
Table 15.2 Weighted Relative Dynamic Frequency of HLL Operations Table 15.2 Weighted Relative Dynamic Frequency of HLL Operations [PATT82a]
19
Table 15.3 Operands Table 15.3 Dynamic Percentage of Operands
20
Table 15.4 Procedure Arguments and Local Scalar Variables
21
+ Implications HLLs can best be supported by optimizing performance of the most time-consuming features of typical HLL programs Three elements characterize RISC architectures: Use a large number of registers or use a compiler to optimize register usage Careful attention needs to be paid to the design of instruction pipelines Instructions should have predictable costs and be consistent with a high-performance implementation
22
+ The Use of a Large Register File Requires compiler to allocate registers Allocates based on most used variables in a given time Requires sophisticated program analysis More registers Thus more variables will be in registers Software SolutionHardware Solution
23
+ Overlapping Register Windows
24
Circular Buffer Organization of Overlapped Windows
25
+ Global Variables Variables declared as global in an HLL can be assigned memory locations by the compiler and all machine instructions that reference these variables will use memory reference operands However, for frequently accessed global variables this scheme is inefficient Alternative is to incorporate a set of global registers in the processor These registers would be fixed in number and available to all procedures A unified numbering scheme can be used to simplify the instruction format There is an increased hardware burden to accommodate the split in register addressing In addition, the linker must decide which global variables should be assigned to registers
26
Characteristics of Large-Register-File and Cache Organizations Table 15.5 Characteristics of Large-Register-File and Cache Organizations
27
+ Referencing a Scalar
28
Graph Coloring Approach
29
+ Why CISC ? There is a trend to richer instruction sets which include a larger and more complex number of instructions Two principal reasons for this trend: A desire to simplify compilers A desire to improve performance There are two advantages to smaller programs: The program takes up less memory Should improve performance Fewer instructions means fewer instruction bytes to be fetched In a paging environment smaller programs occupy fewer pages, reducing page faults More instructions fit in cache(s) (Complex Instruction Set Computer)
30
Table 15.6 Code Size Relative to RISC 1 Table 15.6 Code Size Relative to RISC I
31
Characteristics of Reduced Instruction Set Architectures Machine cycle --- the time it takes to fetch two operands from registers, perform an ALU operation, and store the result in a register One machine instruction per machine cycle Only simple LOAD and STORE operations accessing memory This simplifies the instruction set and therefore the control unit Register-to-register operations Simplifies the instruction set and the control unit Simple addressing modes Generally only one or a few formats are used Instruction length is fixed and aligned on word boundaries Opcode decoding and register operand accessing can occur simultaneously Simple instruction formats
32
Comparison of Register-to-Register and Memory-to-Memory Approaches
33
Table 15.7 Characteristics of Some Processors
34
The Effects of Pipelining
35
+ Optimization of Pipelining Delayed branch Does not take effect until after execution of following instruction This following instruction is the delay slot Delayed Load Register to be target is locked by processor Continue execution of instruction stream until register required Idle until load is complete Re-arranging instructions can allow useful work while loading Loop Unrolling Replicate body of loop a number of times Iterate loop fewer times Reduces loop overhead Increases instruction parallelism Improved register, data cache, or TLB locality
36
Table 15.8 Normal and Delayed Branch
37
+ Use of the Delayed Branch
38
Loop Unrolling Twice Example do i=2, n-1 a[i] = a[i] + a[i-1] * a[i+l] end do Becomes do i=2, n-2, 2 a[i] = a[i] + a[i-1] * a[i+i] a[i+l] = a[i+l] + a[i] * a[i+2] end do if (mod(n-2,2) = i) then a[n-1] = a[n-1] + a[n-2] * a[n] end if
39
+ RISC versus CISC Controversy Quantitative Compare program sizes and execution speeds of programs on RISC and CISC machines that use comparable technology Qualitative Examine issues of high level language support and use of VLSI real estate Problems with comparisons: No pair of RISC and CISC machines that are comparable in life- cycle cost, level of technology, gate complexity, sophistication of compiler, operating system support, etc. No definitive set of test programs exists Difficult to separate hardware effects from complier effects Most comparisons done on “toy” rather than commercial products Most commercial devices advertised as RISC possess a mixture of RISC and CISC characteristics
40
+ Summary Instruction execution characteristics Operations Operands Procedure calls Implications The use of a large register file Register windows Global variables Large register file versus cache Reduced instruction set architecture Characteristics of RISC CISC versus RISC characteristics RISC pipelining Pipelining with regular instructions Optimization of pipelining MIPS R4000 Instruction set Instruction pipeline SPARC SPARC register set Instruction set Instruction format Compiler-based register optimization RISC versus CISC controversy Chapter 15 Reduced Instruction Set Computers (RISC)
41
+ Chapter 18 Multicore Computers
42
+ Alternative Chip Organization
43
+ Intel Hardware Trends
44
Processor Trends
45
+ Power Memory
46
+ Power Consumption By 2015 we can expect to see microprocessor chips with about 100 billion transistors on a 300 mm 2 die Assuming that about 50-60% of the chip area is devoted to memory, the chip will support cache memory of about 100 MB and leave over 1 billion transistors available for logic How to use all those logic transistors is a key design issue Pollack’s Rule States that performance increase is roughly proportional to square root of increase in complexity
47
+ Performance Effect of Multiple Cores
48
Scaling of Database Workloads on Multiple-Processor Hardware
49
+ Effective Applications for Multicore Processors Multi-threaded native applications Characterized by having a small number of highly threaded processes Lotus Domino, Siebel CRM (Customer Relationship Manager) Multi-process applications Characterized by the presence of many single-threaded processes Oracle, SAP, PeopleSoft Java applications Java Virtual Machine is a multi-threaded process that provides scheduling and memory management for Java applications Sun’s Java Application Server, BEA’s Weblogic, IBM Websphere, Tomcat Multi-instance applications One application running multiple times If multiple application instances require some degree of isolation, virtualization technology can be used to provide each of them with its own separate and secure environment
50
+ Hybrid Threading for Rendering Module
51
Multicore Organization Alternatives
52
+ Intel Core Duo Block Diagram
53
+ Intel x86 Multicore Organization Core Duo Advanced Programmable Interrupt Controller (APIC) Provides inter-processor interrupts which allow any process to interrupt any other processor or set of processors Accepts I/O interrupts and routes these to the appropriate core Includes a timer which can be set by the OS to generate an interrupt to the local core Power management logic Responsible for reducing power consumption when possible, thus increasing battery life for mobile platforms Monitors thermal conditions and CPU activity and adjusts voltage levels and power consumption appropriately Includes an advanced power-gating capability that allows for an ultra fine grained logic control that turns on individual processor logic subsystems only if and when they are needed Continued...
54
+ Intel x86 Multicore Organization Core Duo 2MB shared L2 cache Cache logic allows for a dynamic allocation of cache space based on current core needs MESI support for L1 caches Extended to support multiple Core Duo in SMP L2 cache controller allows the system to distinguish between a situation in which data are shared by the two local cores, and a situation in which the data are shared by one or more caches on the die as well as by an agent on the external bus Bus interface Connects to the external bus, known as the Front Side Bus, which connects to main memory, I/O controllers, and other processor chips
55
Intel Core i7-990X Block Diagram
56
+ Table 18.1 Cache Latency
57
+ Summary Hardware performance issues Increase in parallelism and complexity Power consumption Software performance issues Software on multicore Valve game software example Multicore organization Intel x86 multicore organization Intel Core Duo Intel Core i7-990X ARM11 MPCore Interrupt handling Cache coherency IBM zEnterprise mainframe Chapter 18 Multicore Computers
58
Növbəti mühazirənin mövzusu Mövzu № 3. Mikroarxitektura və mikroproqramlaşdırma. ( Mikroarxitektura nümunələri. Verilənlər traktı. Mikroəmrlər. Mic-1 mikroarxitekturası – mikroəmrlərin idarəedilməsi. IJVM - əmrlər yığımının arxitektura nümunəsi. 8051 ailəsindən olan mikrokontrollerlərin mikroarxitekturası. Mikroproqramlaşdırma. ) www.berkut.ws/comporgtech.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.