Multiprocessor Initialization An introduction to the use of Interprocessor Interrupts.

Slides:



Advertisements
Similar presentations
Chapter 3 Basic Input/Output
Advertisements

1/1/ / faculty of Electrical Engineering eindhoven university of technology Memory Management and Protection Part 3:Virtual memory, mode switching,
Register In computer architecture, a processor register is a small amount of storage available on the CPU whose contents can be accessed more quickly than.
Chapter 2 (cont.) An Introduction to the 80x86 Microprocessor Family Objectives: The different addressing modes and instruction types available The usefulness.
Global Environment Model. MUTUAL EXCLUSION PROBLEM The operations used by processes to access to common resources (critical sections) must be mutually.
There are two types of addressing schemes:
1/1/ / faculty of Electrical Engineering eindhoven university of technology Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Processor support devices Part 1:Interrupts and shared memory dr.ir. A.C. Verschueren.
Using the 8254 Timer-Counter Understanding the role of the system’s 8254 programmable Interval-Timer/Counter.
Computer System Overview
Intel MP.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 3: Input/output and co-processors dr.ir. A.C. Verschueren.
Multiprocessor Initialization
Using the 8254 Timer-Counter Understanding the role of the system’s 8254 programmable Interval-Timer/Counter.
Using the 8254 Timer-Counter
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
Interrupts What is an interrupt? What does an interrupt do to the “flow of control” Interrupts used to overlap computation & I/O – Examples would be console.
Message Signaled Interrupts
1 Lecture 2: Review of Computer Organization Operating System Spring 2007.
1 Hardware and Software Architecture Chapter 2 n The Intel Processor Architecture n History of PC Memory Usage (Real Mode)
Prelude to Multiprocessing Detecting cpu and system-board capabilities with CPUID and the MP Configuration Table.
1 Computer System Overview OS-1 Course AA
Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.
Multiprocessor Initialization
Understanding POST and ROM-BIOS service functions Numerous low-level services are available to real-mode programs (include boot-loaders)
The Structure of the “THE” -Multiprogramming System Edsger W. Dijkstra Jimmy Pierce.
Prelude to Multiprocessing Detecting cpu and system-board capabilities with CPUID and the MP Configuration Table.
ICS312 Set 3 Pentium Registers. Intel 8086 Family of Microprocessors All of the Intel chips from the 8086 to the latest pentium, have similar architectures.
INPUT/OUTPUT ORGANIZATION INTERRUPTS CS147 Summer 2001 Professor: Sin-Min Lee Presented by: Jing Chen.
Gursharan Singh Tatla Block Diagram of Intel 8086 Gursharan Singh Tatla 19-Apr-17.
Unit-1 PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE Advance Processor.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization.
1 Computer System Overview Chapter 1 Review of basic hardware concepts.
General System Architecture and I/O.  I/O devices and the CPU can execute concurrently.  Each device controller is in charge of a particular device.
Introduction to Embedded Systems
Dr. José M. Reyes Álamo 1.  The 80x86 memory addressing modes provide flexible access to memory, allowing you to easily access ◦ Variables ◦ Arrays ◦
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Machine Instruction Characteristics
MICROPROCESSOR INPUT/OUTPUT
Khaled A. Al-Utaibi  Interrupt-Driven I/O  Hardware Interrupts  Responding to Hardware Interrupts  INTR and NMI  Computing the.
1 Fundamental of Computer Suthida Chaichomchuen : SCC
Top Level View of Computer Function and Interconnection.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Mutual Exclusion.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
8051 Micro controller. Architecture of 8051 Features of 8051.
Chapter 3 Basic Input/Output. Chapter Outline Basic I/O capabilities of computers I/O device interfaces Memory-mapped I/O registers Program-controlled.
Interrupt driven I/O. MIPS RISC Exception Mechanism The processor operates in The processor operates in user mode user mode kernel mode kernel mode Access.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Lecture 1: Review of Computer Organization
Interrupt driven I/O Computer Organization and Assembly Language: Module 12.
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures.
Introduction to Exceptions 1 Introduction to Exceptions ARM Advanced RISC Machines.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
The Microprocessor & Its Architecture A Course in Microprocessor Electrical Engineering Department Universitas 17 Agustus 1945 Jakarta.
Transmitter Interrupts Review of Receiver Interrupts How to Handle Transmitter Interrupts? Critical Regions Text: Tanenbaum
Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir. A.C. Verschueren Eindhoven University of Technology Section of Digital.
Homework Reading Machine Projects Labs
William Stallings Computer Organization and Architecture 8th Edition
Basic Microprocessor Architecture
CS 301 Fall 2002 Computer Organization
Operating Systems Chapter 5: Input/Output Management
CS333 Intro to Operating Systems
Process.
Lecture 18: Coherence and Synchronization
Presentation transcript:

Multiprocessor Initialization An introduction to the use of Interprocessor Interrupts

A traditional MP system CPU 0 CPU 1 Main memory system bus

Core 2 Duo processor Dual-Core Technology CPU 0 CPU 1 Main memory system bus Shared level-2 cache

Multi-Core Technology Core 2 Quad processor CPU 0 CPU 1 Main memory system bus Shared level-2 cache CPU 2 CPU 3 Shared level-2 cache

CPU has its own Local-APIC CPU processor’s application registers EAX, EBX, …, EIP, EFLAGS processor’s system registers CR0, CR2, CR3, …, IDTR, GDTR, TR processor’s Local-APIC registers Local-ID, IRR, ISR, EOI, LVT0, LVT1, …, ICR, TCFG processor’s Execution Engine

The Local-APIC ID register reserved APIC ID Memory-Mapped Register-Address: 0xFEE00020 This register is initially zero, but its APIC ID Field (8-bits) is programmed by the BIOS during system startup with a unique processor identification- Number, which subsequently is used when specifying the processor as a recipient of inter-processor interrupts.

The Local-APIC EOI register write-only register 310 Memory-Mapped Register-Address: 0xFEE000B0 This write-only register is used by Interrupt Service Routines to issue an ‘End-Of-Interrupt’ command to the Local-APIC. Any value written to this register will be interpreted by the Local-APIC as an EOI command. The value stored in this register is initially zero (and it will remain unchanged).

The Spurious Interrupt register reserved spurious vector 3170 Memory-Mapped Register-Address: 0xFEE000F0 This register is used to Enable/Disable the functioning of the Local-APIC, and when enabled, to specify the interrupt-vector number to be delivered to the processor in case the Local-APIC generates a ‘spurious’ interrupt. (In some processor-models, the vector’s lowest 4-bits are hardwired 1s.) ENEN 8 Local-APIC is Enabled (1=yes, 0=no)

Interrupt Command Register Each processor’s Local-APIC unit has a 64-bit Interrupt Command Register It can be programmed by system software to transmit messages to one, or to several, of the other processors in the system Each processor has a unique identification number in its APIC Local-ID Register that can be used for directing messages to it

ICR (upper 32-bits) reserved Destination field Memory-Mapped Register-Address: 0xFEE00310 The Destination Field (8-bits) can be used to specify which processor (or group of processors) will receive the message

ICR (lower 32-bits) Vector field Destination Shorthand 00 = no shorthand 01 = only to self 10 = all including self 11 = all excluding self R/OR/O 10 8 Delivery Mode 000 = Fixed 001 = Lowest Priority 010 = SMI 011 = (reserved) 100 = NMI 101 = INIT 110 = Start Up 111 = (reserved) Trigger Mode 0 = Edge 1 = Level 15 Level 0 = De-assert 1 = Assert Destination Mode 0 = Physical 1 = Logical 12 Delivery Status 0 = Idle 1 = Pending Memory-Mapped Register-Address: 0xFEE00300

MP initialization protocol Set a shared processor-counter equal to 1 Step 1: issue an ‘INIT’ IPI to all-except-self Delay for 10 milliseconds Step 2: issue ‘Startup’ IPI to all-except-self Delay for 200 microseconds Step 3: issue ‘Startup’ IPI to all-except-self Delay for 200 microseconds Check the value of the processor-counter

Issue an ‘INIT’ IPI # address Local-APIC via register FS mov $sel_fs, %ax mov %ax, %fs # broadcast ‘INIT’ IPI to ‘all-except-self’ mov $0x000C4500, %eax mov %eax, %fs:0xFEE00300).B0:btl $12, %fs:(0xFEE00300) jc.B0

Issue a ‘Startup’ IPI # broadcast ‘Startup’ IPI to all-except-self # using vector 0x11 to specify entry-point # at real memory-address 0x mov $0x000C4611, %eax mov %eax, %fs:(0xFEE00300).B1:btl $12, %fs:(0xFEE00300) jc.B1

Timing delays Intel’s MP Initialization Protocol specifies the use of some timing-delays: –10 milliseconds ( = 10,000 microseconds) –200 microseconds We can use the 8254 Timer’s Channel 2 for implementing these timed delays, by programming it for ‘one-shot’ countdown mode, then polling bit #5 at i/o port 0x61

Mathematical examples EXAMPLE 2 Delaying for 200-microseconds means delaying 1/5000-th of a second (because 5000 times 200 microseconds = one-million microseconds) EXAMPLE 1 Delaying for 10-milliseconds means delaying for 1/100-th of a second (because 100 times 10-milliseconds = one-thousand milliseconds) GENERAL PRINCIPLE Delaying for x–microseconds means delaying for /x seconds (because /x times x-microseconds = one-million microseconds)

Mathematical theory RECALL: Clock-Frequency-in-Seconds = Hertz ALSO: One second equals one-million microseconds PROBLEM: Given the desired delay-time in microseconds, express the desired delay-time in clock-frequency pulses and program that number into the PIT’s Latch-Register Delay-in-Clock-Pulses = Delay-in-Microseconds * Pulses-Per-Microsecond Pulses-Per-Microsecond = Pulses-Per-Second / Microseconds-Per-Second APPLYING DIMENSIONAL ANALYSIS CONCLUSION For a desired time-delay of x microseconds, the number of clock-pulses may be computed as x * ( / ) = ( * x) / as dividing by a fraction amounts to multiplying by that fraction’s reciprocal

Delaying for EAX microseconds # We compute the value for the 8254 Timer’s Channel-2 Latch-register # Delaying for EAX microseconds means that Latch-register’s value is # a certain fraction of one full second’s worth of input-pulses: # fraction = (EAX microseconds)/(one-million microseconds-per-second) # # Thus the latch-value should be: fraction*( pulses-per-second) # which we can compute by doing a multiplication followed by a division # mov%eax, %ecx# copy the delay to ECX mov$ , %eax# setup input-frequency in EAX mul%ecx# multiplied by microseconds mov$ , %ecx# setup one-million as a divisor div%ecx# so quotient will be Latch-value # Quotient in register AX should be written to the timer’s Latch Register

Intel’s MP terminology When an MP system starts up, one of the CPUs will be selected to handle the ‘boot’ procedures, while the other CPUs ‘sleep’ The BSP is this BootStrap Processor, and every other processor is known as an AP (i.e., a so-called ‘Application Processor’) BSPAP

‘parallel computing’ principles When it’s awakened, each processor will need its own private stack-area, so it can handle any interrupts or procedure-calls without modifying an area in memory which another processor is also using And whenever two or more processors do share ‘write-access’ to any memory area, then those accesses must ‘serialized’

‘atomic’ memory-access Shared variables must not be modified by more than one processor at a time (‘atomic’ access) The x86 cpu’s ‘lock’ prefix helps enforce this Example: every processor adds 1 to a counter lock incl (counter) Some instructions have ‘atomic’ access built in Example: all processors needs private stacks mov0x1000, %ax xadd (new_SS), %ax mov%ax, %ss

ROM-BIOS isn’t ‘reentrant’ The video service-functions in ROM-BIOS often used to display a message-string at the current cursor-location (and afterward advance the cursor) modify global storage locations (as well as i/o ports), and hence must be called by one processor at a time A shared memory-variable (called ‘mutex’) is used to enforce this mutual exclusion

Implementing a ‘spinlock’ # Here is a ‘global’ variable, which all of the processors can modify mutex:.word1# initial value for variable is 1 # Here is a ‘prologue’ and ‘epilog’ for using this variable to enforce # ‘mutually exclusive access’ to a section of ‘non-reentrant’ code spin:btw$0, mutex# test bit #0 to see if mutex is free jncspin# spin if the mutex is not available lock# else request exclusive bus-access btrw$0, mutex# and try to grab mutex ownership jncspin# unsuccessful? then try again btsw$0, mutex# release the mutex when finished

Demo: ‘mphello.s’ Each CPU needs to access its Local-APIC The BSP (“Boot-Strap Processor”) wakes up other processors by broadcasting the ‘INIT-SIPI-SIPI’ message-sequence Each AP (“Application Processor”) starts executing at a 4K page-boundary -- and needs its own private stack-area Shared variables require ‘atomic’ access

Demo’s organization MAIN: # the BSP will execute these calls call allow_4GB_access calldisplay_APIC_LocalID callbroadcast_AP_starup calldelay_until_APs_halt initAP: # each AP will execute these calls callallow_4GB_access calldisplay_APIC_LocalID

In-class exercise #1 Add a call to this procedure by each of the processors, but do it without using a ‘lock’ prefix (and outside mutex-protected code) Then let the BSP print the value of ‘total’ total:.word0# include this ‘shared’ global-variable add_one_thousand:# let each processor call this subroutine mov$1000, %cx nxadd:addw$1, total loopnxadd ret

Binary-to-Decimal Recall algorithm for converting numbers to decimal digit-strings (for console display) num2dec: # converts value in register AX to a decimal string at DS:DI mov$10, %bx# setup the number-base in BX xor%cx, %cx# setup remainder-count in CX nxdiv:xor%dx, %dx# extend AX to a doubleword div%bx# divide the doubleword by ten push%dx# save remainder on the stack inc%cx# and count this remainder or%ax, %ax# was the quotient zero yet? jnznxdiv# no, generate another digit nxdgt:pop%dx# recover saved remainder add$’0’, %dl# convert remainder to ASCII mov%dl, (%di)# store numeral in output-buffer inc%di# and advance buffer-pointer loopnxdgt# again for other remainders

In-class exercise #2 Using a Core-2 Quad processor we might expect the value of ‘total’ would be 4000 But see if that’s what actually happens! Without the ‘lock’ prefix, the four CPUs may all try to increment ‘total’ at once, resulting in a logically incorrect total So fix this problem (by using a ‘lock’ prefix ahead of the ‘addw $1, total’ instruction)

Do you need a ‘barrier’? You can use a software construct, known as a ‘barrier’, to stop CPUs from entering a block of code until a prescribed number of them are all ready to enter it together (i.e., simultaneously) This may be helpful with the in-class exercises arrived:.word0# allocate a shared global variable barrier:lock# acquire exclusive bus-access incwarrived# each cpu adds 1 to the variable await:cmpw$4, arrived# are four cpus ready to proceed? jbawait# no, wait for others to arrive here calladd_one_thousand# then proceed together