Download presentation
Presentation is loading. Please wait.
Published byAmie Hancock Modified over 8 years ago
1
Computer Structure – VM 1 Computer Structure X86 Virtual Memory and TLB Franck Sala Updated by tomer gurevich Slides from Lihu and Adi’s Lecture
2
Computer Structure – VM 2 X86 paging עבור זיכרון עם כתובת בגודל של 32 ביט וגודל הדף הוא 4kb, גודל טבלת הדפים הדרושה הוא 4Mb. רוב התהליכים במערכת משתמשים רק במעט זיכרון. התקורה של 4Mb עבור כל תהליך היא לרוב יקרה ומיותרת. עבור רוב התהליכים טבלת תרגום כזאת תהיה רוב הזיכרון שהתהליך צורך. ב -X86 ישנן מספר רמות של טבלאות תרגום, המסודרות במבנה של עץ. אנו מקצים טבלאות תרגום באופן דינאמי, רק כאשר יש צורך ממשי בטבלה.
3
Computer Structure – VM 3 32bit Mode: 4KB / 4MB Page Mapping 2-level hierarchical mapping: Page Directories and Page tables –4KB aligned PDE –Present (0 = page fault) –Page size (4KB or 4 MB) CR4.PSE=1 both 4MB & 4KB pages supported Separate TLBs OFFSET 031 DIRTABLE Linear Address Space (4K Page) 1121 1K entry Page Table 1K entry Page Directory PDE 4K Page data CR3 (PDBR) 10 12 PTE 20+12=32 (4K aligned) 20 OFFSET 031 DIR Linear Address Space (4MB Page) 21 Page Directory PDE 4MByte Page data CR3 (PDBR) 10 22 20+12=32 (4K aligned) 10
4
Computer Structure – VM 4 32bit Mode: PDE and PTE Format 20 bit pointer to a 4K Aligned address Virtual memory –Present –Accessed –Dirty (in PTE only) –Page size (in PDE only) –Global Protection –Writable (R#/W) –User / Supervisor # 2 levels/type only Caching –Page WT –Page Cache Disabled –PAT – PT Attribute Index 3 bits available for OS usage Page Directory Entry (4KB page table) Page Table Entry UG Page Frame Address 31:12 AVAIL0AA PCDPCD PWTPWT WP Present Writable User / Supervisor Write-Through Cache Disable Accessed Page Size (0: 4 Kbyte) Global Available for OS Use 0 4 12357 911 68 1231 - GAVAIL Page Frame Address 31:12 D PCDPCD PWTPWT UWAP PATPAT Present Writable User / Supervisor Write-Through Cache Disable Accessed Dirty PAT Global Available for OS Use 0412357911681231 -
5
Computer Structure – VM 5 4KB Page Mapping in 64 bit Mode 2003: AMD Opteron… sign ext. 029 DIRTABLEOFFSET Linear Address Space (4K Page) 1120 512 entry Page Table 512 entry Page Directory PDE 4KByte Page data 9 9 12 PTE CR3 (PDPTR) 40 (4KB aligned) M-12 122130 38 512 entry Page Directory Pointer Table PDP entry M-12 9 PDPPML4 394763 512 entry PML4 Table PML4 entry 9 M-12 64 256 TB of virtual memory (2 48 ) 1 TB of physical memory (2 40 ) PML4: Page Map Level 4 PDP: Page Directory Pointer
6
Computer Structure – VM 6 2MB Page Mapping in 64 bit Mode PML4PDPsign ext. 029 DIROFFSET Linear Address Space (2M Page) 20 512 entry Page Directory PDE 2MByte Page data 9 21 CR3 (PDPTR) 40 (4KB aligned) M-21 2130 38 512 entry Page Directory Pointer Table PDP entry M-12 9 394763 512 entry PML4 Table PML4 entry 9 M-12
7
Computer Structure – VM 7 1GB Page Mapping in 64 bit Mode OFFSETPML4PDPsign ext. 029 Linear Address Space (1G Page) 1GByte Page data 30 CR3 (PDPTR) 40 (4KB aligned) M-30 30 38 512 entry Page Directory Pointer Table PDP entry 9 394763 512 entry PML4 Table PML4 entry 9 M-12
8
Computer Structure – VM 8 Question 1 We have a core similar to X86 –64 bit mode –Support Small Pages (PTE) and Large Pages (DIR) –Page table size in each hierarchy is the size of a small page –Entry size in the Page Table is 16 byte, in all the hierarchies 011N1 12 63 sign ext.DIRTABLEOFFSETPDPPML4 N2N3N4 What is the size of a small page ? 12 bits in the offset field 2 12 B = 4KB How many entries are in each Page Table? Page Table size = Page Size = 4KB PTE = 16B 4KB / 16B = 2 12 / 2 4 = 2 8 = 256 entries in each Page Table
9
Computer Structure – VM 9 Question 1 011N1 12 63 sign ext.DIRTABLEOFFSETPDPPML4 N2N3N4 What are the values of N1, N2, N3 and N4 ? Since we have 256 entries in each table, we need 8 bits to address them -Table [19:12]N1 = 19 -DIR [27:20]N2 = 27 -PDP [35:28]N3 = 35 -PML4 [43:36]N4 = 43 -64 bit (large & small) -PT size = Page size = 4KB -PTE = 16B -Page Table: 256 entries What is the size of a large page ? Large pages are pointed by DIR So the large pages offset is 20 bits [19:0] large pages size: 2 20 = 1MB We can also say: DIR can point to 256 pages of 4KB = 1MB
10
Computer Structure – VM 10 Question 1 01119 12 63 sign ext.DIRTABLEOFFSETPDPPML4 273543 We access a sequence of virtual addresses For each address, what is the minimal number of tables that were added in all the hierarchies ? See next foil in presentation mode… -64 bit (large & small) -PT size = Page size = 4KB -Large page size = 1 MB -PTE = 16B -Page Table: 256 entries
11
Computer Structure – VM 11 PML4 PDP DIR PTE offset 0271119 E2 4KB Page FF 8 8 12 Page Table CR3 122028 35 PDP Table 82 8 364363 PML4 Table 8 sign ext.DIRTABLEOFFSETPDPPML4 Page Dir Question 1: sequence of allocations D3 28 B25 937 C00 68 49 7171 46 B5 622 54 00000 D3 82 FF E2 B25 00000 D3 82 FF E2 349 00000 D3 82 FF 68 937 349 00000 D3 82 28 49 C00 00000 71 46 B5 54 622 8 bits instead of 9 for example purposes only 3 new tables + 1 page 0 new table 0 new table + 1 page 1 new tables + 1 page 3 new tables + 1 page
12
Computer Structure – VM 12 Translation Look aside Buffer (TLB) Page table resides in memory each translation requires an extra memory access TLB caches recently used PTEs –speed up translation –typically 128 to 256 entries, 4 to 8 way associative TLB Indexing On A TLB miss –Page Miss Handler (HW PMH) gets PTE from memory Access Page Table In memory Yes No TLB Hit ? Virtual Address Physical Addresses TLB Access TagSet Offset Virtual page number
13
Computer Structure – VM 13 Virtual Memory And Cache TLB access is serial with cache access Page table entries are cached in L1 D$, L2$ and L3 $ as data Yes Page Walk: get PTE from memory Hierarchy No Access Cache Virtual Address L1 Cache Hit ? Yes No Physical Addresses Data No Access Memory L2 Cache Hit ? TLB Hit ? Access TLB STLB Hit ? No
14
Computer Structure – VM 14 TLBs The processor saves most recently used PDEs and PTEs in TLBs –Separate TLB for data and instruction caches –Separate TLBs for 4KB and 2/4MB page sizes
15
Computer Structure – VM 15 כאשר ישנה גישה לזיכרון מתרחש התהליך הבא: ראשית, ניגש ל -TLB המתאים עם ה- VPN המלא. ישנם dTLB ו – iTLB אשר מכילים תרגום של מידע וזיכרון בהתאמה. ה -TLBs מחולקים גם לדפים קטנים וגדולים בהתאמה. אם ישנו TLB HIT, נשתמש בתרגום שמצאנו. במקרה של TLB MISS נפנה ל- PMH
16
Computer Structure – VM 16 ה – PMH מכיל את ה- STLB. לאחר ה - TLB MISS ניגש ל- STLB. ה-STLB מהווה עוד "רמה " של TLB. הוא מכיל יותר PTE גם הוא. נפנה אליו לאחר ה - TLB MISS.. ה - STLB מכיל גם זיכרון של פקודות וגם של מידע. כמו כן, הוא מכיל גם תרגומים של דפים גדולים. כמו ב - TLB אנו משתמשים ב-VPN כדי לחפש ב-STLB. עבור STLB HIT, נשתמש בתרגום שמצאנו עבור STLB MISS,ה- PMH יבצע Page walk.
17
Computer Structure – VM 17 בשלב הזה,ה -PMH חייב לבצע Page Walk : הוא מטייל על הירארכיית טבלאות הדפים החל מהשורש (4PML ). כדי לקצר את התהליך ה -PMH שומר cache עבור הרמות הגבוהות של התרגום: PML4 cache,PDP cache,DIR cache. מדוע אין צורך לשמור Table cache ? מכיוון שה-TLB שומר תרגום מלא של כתובת, אין צורך ב- Table Cache אלא רק עבור הרמות הגבוהות יותר, אשר מהוות תרגום חלקי.
18
Computer Structure – VM 18 cache Accessed with virtual address bits If hits, returns DIR cache[47:21]PDE PDP cache[47:30]PDP entry PML4 cache [47:39]PML4 entry SIGN EXT.PML4PDPDIRTABLEOFFSET 0 12 21 30 3948
19
Computer Structure – VM 19 אל כל cache אנו פונים עם הסיביות ששימשו אותנו גם לפנייה לרמה גבוהה יותר. אם ברמה כלשהיא של ה - cache התרחשה פגיעה. אזי הצלחנו לחסוך (לפחות) את כל הגישות לרמה הנוכחית ולרמות הגבוהות יותר. לדוגמא, עבור PDP cache hit נמצא PDPE מתאים וכך נחסוך פנייה ל-PML4 ו – PDP. עדיין נצטרך לגשת לזיכרון עבור הרמות הנמוכות יותר.
20
Computer Structure – VM 20 ה -PMH ניגש לכל המטמונים במקביל ובוחר ברמה הנמוכה ביותר עבורה היה HIT. את שארית התרגום נבצע כרגיל, באמצעות גישה לטבלאות אשר שמורות בזיכרון. נשים לב לכל הפחות נהיה חייבים לגשת ל -PAGE TABLE. את טבלת הדפים המתאימה נחפש בהירארכיית הזיכרון כרגיל: ניגש קודם ל-L1 cache, L2 cache,L3 cache ורק לבסוף ניגש לזיכרון.
21
Computer Structure – VM 21 Caches and Translation Structures PlatformOn-die Core PMH L1 data cache L1 Inst. cache L2 Inst. TLBData. TLB translation L3 Instruction bytes PTE STLB PTE Memory PDE cache PDP cache PML4 cache PDE entry PDP entry PML4 entry VA[47:12] VA[47:21] VA[47:30] VA[47:39] Page Walk Logic Load entry
22
Computer Structure – VM 22 Question 2 Processor similar to X86 – 64 bits Pages of 4KB The processor has a TLB –TLB Hit: we get the translation with no need to access the translation tables –TLB Miss: the processor has to do a Page Walk The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset For the sequence of memory access below, how many accesses are needed for the translations? 01119 12 63 sign ext.DIRTABLEOFFSETPDPPML4 273543 Address memory access Explanations 0000022334455666H 0000022334455777H 0000022884455777H
23
Computer Structure – VM 23 Question 2 Processor similar to X86 – 64 bits Pages of 4KB The processor has a TLB –TLB Hit: we get the translation with no need to access the translation tables –TLB Miss: the processor has to do a Page Walk The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset For the sequence of memory access below, how many accesses are needed for the translations? 01119 12 63 sign ext.DIRTABLEOFFSETPDPPML4 273543 Address memory access Explanations 0000022334455666H 0000022334455777H 0000022884455777H
24
Computer Structure – VM 24 Question 2 Processor similar to X86 – 64 bits Pages of 4KB The processor has a TLB –TLB Hit: we get the translation with no need to access the translation tables –TLB Miss: the processor has to do a Page Walk The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset For the sequence of memory access below, how many accesses are needed for the translations? 01119 12 63 sign ext.DIRTABLEOFFSETPDPPML4 273543 Address memory access Explanations 0000022334455666H 4We need to access the memory for each of the 4 translation tables 0000022334455777H 0000022884455777H
25
Computer Structure – VM 25 Question 2 Processor similar to X86 – 64 bits Pages of 4KB The processor has a TLB –TLB Hit: we get the translation with no need to access the translation tables –TLB Miss: the processor has to do a Page Walk The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset For the sequence of memory access below, how many accesses are needed for the translations? 01119 12 63 sign ext.DIRTABLEOFFSETPDPPML4 273543 Address memory access Explanations 0000022334455666H 4We need to access the memory for each of the 4 translation tables 0000022334455777H 0Same pages as above TLB hit No memory access 0000022884455777H
26
Computer Structure – VM 26 Question 2 Processor similar to X86 – 64 bits Pages of 4KB The processor has a TLB –TLB Hit: we get the translation with no need to access the translation tables –TLB Miss: the processor has to do a Page Walk The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset For the sequence of memory access below, how many accesses are needed for the translations? 01119 12 63 sign ext.DIRTABLEOFFSETPDPPML4 273543 Address memory access Explanations 0000022334455666H 4We need to access the memory for each of the 4 translation tables 0000022334455777H 0Same pages as above TLB hit No memory access 0000022884455777H 3We hit in PML4 cache. Then we miss in the PDP and so we need to access the memory 3 times: PDP, DIR, PTE
27
Computer Structure – VM 27 Question 2 L1 data Cache: 32KB – 2 ways of 64B each How can we access this cache before we get the physical address? 64B 6 bits offset bits [5:0] 32KB = 2 15 / (2 ways * 2 6 bytes) = 2 8 = 256 sets [13:6] 12 bits are not translated: [11:0] we lack 2 bits [13:12] to get the set address So we do a lookup using 2 un-translated bits for the set address Those bits can be different from the PFN obtained after translation, therefore we need to compare the whole PFN to the tag stored in the Cache Tag array
28
Computer Structure – VM 28 Question 2: Read Acces Offset 05 Set 613 Not Translated 1211 VPN Translated 47 40:12 == Tag Match Tag Match Tag [40:12] Tag [40:12] Way 0Way 1 PFN 40 1312
29
Computer Structure – VM 29 Question 2: example 0000 0000 00000 … 00110000 0000 00000011 … 0000 VB:PB: Set: [13:6] = 0 Tag: 3 (0 if we don’t take [13:12]) Read Virtual Address B 0000 0000 00000 … 00000000 0000 00000000 … 0000 VA:PA: VPNOffset PFN Set: [13:6] = 0 Write Virtual Address A Tag: 0 Offset 05 Set 613 Not Translated 1211 VPN Translated 47
30
Computer Structure – VM 30 Question 2: Virtual Alias L1 data Cache: 32KB 2 ways of 64B each What will happen when we access with a given offset the virtual page A and after this, there is an access with the same offset in the virtual page B, which is mapped by the OS to the same physical page as A? Offset 05 Set 613 Not Translated 1211 VPN Translated 47 PFN 40 1312 VPN xxxx01 VPN yyyy00 PFN zzzz -2 virtual pages map to the same frame xxxx01.set.ofset and yyyy00.set.ofset 01 and 00 are bits 13:12 Physical Addresses Cache Set[11:6] Not translated Max: 64 sets zzzz Phys. addressed cache: The data exist only once Physical Addresses Cache 01.set[11:6] zzzz 00.set[11:6] Virtual addressed cache: The data may exist twice AVOID THIS !!!
31
Computer Structure – VM 31 Question 2: Virtual Alias L1 data Cache: 32KB 2 ways of 64B each What will happen when we access with a given offset the virtual page A and after this, there is an access with the same offset in the virtual page B, which is mapped by the OS to the same physical page as A? Offset 05 Set 613 Not Translated 1211 VPN Translated 47 PFN 40 1312 Physical Addresses Cache 01.set[11:6] zzzz 00.set[11:6] Virtual addressed cache: The data may exist twice AVOID THIS !!! Avoid having the same data twice in the cache xxxx01.set.ofset and yyyy00.set.ofset Check 4 sets when we allocate a new entry and see if the same tag appears If yes, evict the second occurrence of the data (the alias)
32
Computer Structure – VM 32 Question 2: Snoop L1 data Cache: 32KB 2 ways of 64B each What happens in case of snoop in the cache? Offset 05 Set 613 Not Translated 1211 VPN Translated 47 PFN 40 1312 Physical Addresses Cache 01.set[11:6] zzzz 00.set[11:6] Virtual addressed cache: The data may exist twice AVOID THIS !!! The cache is snooped with a physical address [40:0] Since the 2 MSB bits of the set address are virtual, a given physical address can map to 4 different sets in the cache (depending on the virtual page that is mapped to it) So we must snoop 4 sets * 2 ways in the cache
33
Computer Structure – VM 33 Question 3 Core similar to X86 in 64 bit mode Supports small pages (pointed by PTE) and large pages (pointed by DIR). Size of an entry in all the different page tables is 8 Bytes PMH Caches at all the levels –4 entries direct mapped –Access time on hit: 2 cycles –Miss known after 1 cycle PMH caches are accessed at all the levels in parallel In each level, when there is a HIT, the PMH cache provides the relevant entry in the page table in the relevant level In each level, when there is a miss: the core accesses the relevant page table in the main memory. Access time to the main memory is 100 cycles, not including the time needed to get the PMH cache miss. 01123 12 63 sign ext.DIRTABLEOFFSETPDPPML4 354755
34
Computer Structure – VM 34 Question 3 What is the size of the large pages? The large page is pointed by DIR, therefore, all the bits under are offset inside the large page: 2 24 = 16 MB 01123 12 63 sign ext.DIRTABLEOFFSETPDPPML4 354755 How many entries in each Page Table ? PTE: 2 12 DIR: 2 12 PDP: 2 12 PML4:2 8
35
Computer Structure – VM 35 01123 12 63 sign ext.DIRTABLEOFFSETPDPPML4 354755 Virtual Addr. Cycles Comment FF81 2345 6789 ABCD FF81 2340 6789 ABCD FF80 2340 6789 ABCD FF81 2340 6709 ABCD FF81 2340 6709 A0CD TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle
36
Computer Structure – VM 36 01123 12 63 sign ext.DIRTABLEOFFSETPDPPML4 354755 Virtual Addr. Cycles Comment FF81 2345 6789 ABCD401Miss and memory access at each level: 1+ 4 *100 = 401 cycles FF81 2340 6789 ABCD FF80 2340 6789 ABCD FF81 2340 6709 ABCD FF81 2340 6709 A0CD TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle
37
Computer Structure – VM 37 01123 12 63 sign ext.DIRTABLEOFFSETPDPPML4 354755 Virtual Addr. Cycles Comment FF81 2345 6789 ABCD401Miss and memory access at each level: 1+ 4 *100 = 401 cycles FF81 2340 6789 ABCD202(PML4, PDP, DIR, sTLB) = (H,H,M,M) 1cyc PDP TLB read 1 more cycle DIR TLB: Miss 100 cycles PTE TLB: Miss 100 cycles FF80 2340 6789 ABCD FF81 2340 6709 ABCD FF81 2340 6709 A0CD TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle
38
Computer Structure – VM 38 01123 12 63 sign ext.DIRTABLEOFFSETPDPPML4 354755 Virtual Addr. Cycles Comment FF81 2345 6789 ABCD401Miss and memory access at each level: 1+ 4 *100 = 401 cycles FF81 2340 6789 ABCD202(PML4, PDP, DIR, sTLB) = (H,H,M,M) 1cyc PDP TLB read 1 more cycle DIR TLB: Miss 100 cycles PTE TLB: Miss 100 cycles FF80 2340 6789 ABCD401PMH cache miss in all the levels: 1 + 4*100 = 401 cycles FF81 2340 6709 ABCD FF81 2340 6709 A0CD TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle
39
Computer Structure – VM 39 01123 12 63 sign ext.DIRTABLEOFFSETPDPPML4 354755 Virtual Addr. Cycles Comment FF81 2345 6789 ABCD401Miss and memory access at each level: 1+ 4 *100 = 401 cycles FF81 2340 6789 ABCD202(PML4, PDP, DIR, sTLB) = (H,H,M,M) 1cyc PDP TLB read 1 more cycle DIR TLB: Miss 100 cycles PTE TLB: Miss 100 cycles FF80 2340 6789 ABCD401PMH cache miss in all the levels: 1 + 4*100 = 401 cycles FF81 2340 6709 ABCD302(PML4, PDP, DIR, sTLB) = (H,M,M,M) 1 cyc PDP: the entry 234 that was filled for the second access was replaced by the entry that was filled in access 3, as it is in the same set miss 2 + (3 × 100) = 302 FF81 2340 6709 A0CD TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle
40
Computer Structure – VM 40 01123 12 63 sign ext.DIRTABLEOFFSETPDPPML4 354755 Virtual Addr. Cycles Comment FF81 2345 6789 ABCD401Miss and memory access at each level: 1+ 4 *100 = 401 cycles FF81 2340 6789 ABCD202(PML4, PDP, DIR, sTLB) = (H,H,M,M) 1cyc PDP TLB read 1 more cycle DIR TLB: Miss 100 cycles PTE TLB: Miss 100 cycles FF80 2340 6789 ABCD401PMH cache miss in all the levels: 1 + 4*100 = 401 cycles FF81 2340 6709 ABCD302(PML4, PDP, DIR, sTLB) = (H,M,M,M) 1 cyc PDP: the entry 234 that was filled for the second access was replaced by the entry that was filled in access 3, as it is in the same set miss 2 + (3 × 100) = 302 FF81 2340 6709 A0CD2Hit in TLB – no need to go to PMH: 2 cycles TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle
41
Computer Structure – VM 41 Backup Slides
42
Computer Structure – VM 42 Offset 05 Set 613 VPN 47 == Tag Match Tag Match Way 0Way 3 PFN 40 11 13 Tag field 40 11 : Translated not translated 0 471112 Virtual Address …
43
Computer Structure – VM 43 Offset Set VPN == Tag Match Tag Match Way 0Way 3 PFN 40 Tag field : Translated not translated Virtual Address …
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.