© 2004 Wayne Wolf Overheads for Computers as Components 2e Overview zWhy multiprocessors? zThe structure of multiprocessors. zElements of multiprocessors: yProcessing elements. yMemory. yInterconnect.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Why multiprocessing? zTrue parallelism: yTask level. yData level. zMay be necessary to meet real-time requirements.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Multiprocessing and real time zFaster rate processes are isolated on processors. ySpecialized memory system as well. zSlower rate processes are shared on a processor (or processor pool). CPUmem CPUmem print engine File read, Rendering, Etc.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Heterogeneous multiprocessors zWill often have a heterogeneous structure. yDifferent types of PEs. ySpecialized memory structure. ySpecialized interconnect.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Multiprocessor system-on- chip zMultiple processors. yCPUs, DSPs, etc. yHardwired blocks. yMixed-signal. zCustom memory system. zLots of software.
© 2004 Wayne Wolf Overheads for Computers as Components 2e System-on-chip applications zSophisticated markets: yHigh volume. yDemanding performance, power requirements. yStrict price restrictions. zOften standards-driven. zExamples: yCommunications. yMultimedia. yNetworking.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Terminology zPE: processing element. zInterconnection network: may require more than one clock cycle to transfer data. zMessage: address+data packet.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Generic multiprocessor zShared memory:z Message passing: PE mem PE mem PE mem … … Interconnect network PE mem PE mem PE mem … Interconnect network
© 2004 Wayne Wolf Overheads for Computers as Components 2e Shared memory vs. message passing zShared memory and message passing are functionally equivalent. zDifferent programming models: yShared memory more like uniprocessor. yMessage passing good for streaming. zMay have different implementation costs: yInterconnection network.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Shared memory implementation zMemory blocks are in address space. zMemory interface sends messages through network to addressed memory block.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Message passing implementation zProgram provides processor address, data/parameters. yUsually through API. zPacket(s) interface appears as I/O device. yPacket routed through network to interface. zRecipient must decode parameters to determine how to handle the message.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Processing element selection zWhat tasks run on what PEs? ySome tasks may be duplicated (e.g., HDTV motion estimation). ySome processors may run different tasks. zHow does the load change? yStatic vs. dynamic task allocation.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Matching PEs to tasks zFactors: yWord size. yOperand types. yPerformance. yEnergy/power consumption. zHardwired function units: yPerformance. yInterface.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Task allocation zTasks may be created at: yDesign time (video encoder). yRun time (user interface). zTasks may be assigned to processing elements at: yDesign time (predictable load). yRun time (varying load).
© 2004 Wayne Wolf Overheads for Computers as Components 2e Memory system design zUniform vs. heterogeneous memory system. yPower consumption. yCost. yProgramming difficulty. zCaches: yMemory consistency.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Parallel memory systems zTrue concurrency--- several memory blocks can operate simultaneously. PE mem PE mem PE mem … … Interconnect network
© 2004 Wayne Wolf Overheads for Computers as Components 2e Cache consistency zProblem: caches hide memory updates. zSolution: have caches snoop changes. PE mem PE cache network mem
© 2004 Wayne Wolf Overheads for Computers as Components 2e Cache consistency and tasks zTraditional scientific computing maps a single task onto multiple PEs. zEmbedded computing maps different tasks onto multiple PEs. yMay be producer/consumer. yNot all of the memory may need to be consistent.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Network topologies zMajor choices. yBus. yCrossbar. yBuffered crossbar. yMesh. yApplication-specific.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Bus network zAdvantages: yWell-understood. yEasy to program. yMany standards. zDisadvantages: yContention. ySignificant capacitive load.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Crossbar zAdvantages: yNo contention. ySimple design. zDisadvantages: yNot feasible for large numbers of ports.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Buffered crossbar zAdvantages: ySmaller than crossbar. yCan achieve high utilization. zDisadvantages: yRequires scheduling. Xbar
© 2004 Wayne Wolf Overheads for Computers as Components 2e Mesh zAdvantages: yWell-understood. yRegular architecture. zDisadvantages: yPoor utilization.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Application-specific. zAdvantages: yHigher utilization. yLower power. zDisadvantages: yMust be designed. yMust carefully allocate data.
© 2004 Wayne Wolf Overheads for Computers as Components 2e TI OMAP zTargets communications, multimedia. zMultiprocessor with DSP, RISC. C55x DSP OMAP 5910: ARM9 MMU Memory ctrl MPU interface System DMA control bridge I/O
© 2004 Wayne Wolf Overheads for Computers as Components 2e RTOS for multiprocessors zIssues: yMultiprocessor communication primitives. yScheduling policies. zTask scheduling is considerably harder with true concurrency.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Distributed system performance zLongest-path algorithms don’t work under preemption. zSeveral algorithms unroll the schedule to the length of the least common multiple of the periods: yproduces a very long schedule; ydoesn’t work for non-fixed periods. zSchedules based on upper bounds may give inaccurate results.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Data dependencies help zP 3 cannot preempt both P 1 and P 2. zP 1 cannot preempt P 2. P1P1 P2P2 P3P3
© 2004 Wayne Wolf Overheads for Computers as Components 2e Preemptive execution hurts zWorst combination of events for P 5 ’s response time: yP 2 of higher priority yP 2 initiated before P 4 ycauses P 5 to wait for P 2 and P 3. zIndependent tasks can interfere—can’t use longest path algorithms. P1P1 M1M1 P5P5 P2P2 M2M2 P4P4 P3P3 M3M3
© 2004 Wayne Wolf Overheads for Computers as Components 2e Period shifting example zP 2 delayed on CPU 1; data dependency delays P 3 ; priority delays P 4. Worst-case t 3 delay is 80, not 50. taskperiod 2 70 processCPU time P 1 30 P 2 10 P 3 30 P 4 20 CPU 1 P1P1 P2P2 CPU 2 P3P3 P4P4 P2P2 P3P3 P4P4 P1P1 P2P2 P4P4 P3P3 11 22 33