Multiprocessor Highlights

Multiprocessor Highlights
MESI Cache Coherence Protocol, Memory Consistency, ILP and MC Zhao Zhang 2003

MESI Protocol From local processor’s viewpoint, for each cache block
Modified: Only I have a copy and the copy has been modifed; must respond to any read/write request Exclusive-clean: Only I have a copy and the copy is clear; no need to inform others about my changes Shared: Someone else may have copy; have to inform others about my changes Invalid: The block has been invalidated (possibly on the request of someone else) Actions highlight: Have read misses on a block: send read request onto bus Have write misses on a block: send write request onto bus Receive bus read request: transit the block to shared state Receive bus write request: transit the block to invalid state Must write back data when transiting from modified state

Memory Consistency Model
Define memory correctness for parallel execution: Execution appears to the that of some correct execution of some theoretical parallel computer which has n sequential processors Particularly, remote writes must appear in a local processor in some correct sequence Typical memory consistency model: Sequential consistency Memory read/writes are globally serialized; assume every cycle only one processor can proceed for one step, and write result appears on other processors immediately Processors do not reorder local reads and writes Note #possible sequences is an exponential function of #inst Total storing order Only writes are globally serialized; assume every cycle at most one write can proceed, and the write result appears immediately Processors may reorder local reads/writes without RAW dependence Processor consistency Writes from one processor appear in the same order on all other processors

Memory Consistency and ILP
Sequential consistency, TSO and PC are strong consistency models (but TSO and PC are relaxed consistency models) Why use weak consistency models (e.g. release consistency)? Otherwise, without speculative execution recovery, every write to shared data may take a full memory access latency (can afford 100ns for every such write on 2GHz 4-way issue processors?) For SC, reads cannot bypass any previous write (even without RAW dependence) Strong consistency may work efficiently with speculative execution in ILP (PC and TSO in practice; SC can be supported with speculative cache)

Multiprocessor Highlights

Similar presentations

Presentation on theme: "Multiprocessor Highlights"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multiprocessor Highlights

Similar presentations

Presentation on theme: "Multiprocessor Highlights"— Presentation transcript:

Similar presentations

About project

Feedback