Presentation is loading. Please wait.

Presentation is loading. Please wait.

תרגול מס' 8: x86 Virtual Memory and TLB

Similar presentations


Presentation on theme: "תרגול מס' 8: x86 Virtual Memory and TLB"— Presentation transcript:

1 תרגול מס' 8: x86 Virtual Memory and TLB
מבנה מחשבים ספרתיים תרגול מס' 8: x86 Virtual Memory and TLB

2 x86 paging עבור זכרון וירטואלי בגודל 4GB ( כתובת באורך 32 ביט) ודפים בגודל 4KB, גודל טבלת הדפים הדרושה הוא 4MB. רב התהליכים במערכת משתמשים רק במעט זכרון. התקורה של 4MB עבור כל תהליך היא לרב יקרה ומיותרת. עבור רב התהליכים טבלת תרגום כזאת תהיה רב הזכרון שהתהליך צורך. ב-x86 ישנן מספר רמות של טבלאות תרגום, המסודרות במבנה של עץ. אנו מקצים טבלאות תרגום באופן דינאמי, רק כאשר יש צורך ממשי בטבלה. 4MB כי נזדקק ל-1MB תרגומים (PTEs), כל אחד בן 4B.

3 32bit Mode: 4KB / 4MB Page Mapping
2-level hierarchical mapping: Page Directories and Page Tables. 4KB aligned PDE Present bit (0  page fault) Page Size bit (4KB or 4MB) CR4.PSE=1  both 4MB & 4KB pages supported Separate TLBs OFFSET 31 DIR TABLE Linear Address Space (4K Page) 11 21 1K entry Page Table 1K entry Page Directory PDE 4K Page data CR3 (PDBR) 10 12 PTE 20 (4K aligned) 20 (PFN) OFFSET 31 DIR Linear Address Space (4MB Page) 21 Page Directory PDE 4M Page data CR3 (PDBR) 10 22 20 (4K aligned)

4 32bit Mode: PDE and PTE Format
20 bit pointer to a 4K Aligned address Virtual memory Present Accessed Dirty (in PTE only) Page size (in PDE only) Global Protection Writable (R#/W) User / Supervisor # 2 levels/type only Caching Page WT Page Cache Disabled PAT – PT Attribute Index 3 bits available for OS usage Page Directory Entry (4KB page table) U G Page Frame Address 31:12 AVAIL A PCD PWT W P Present Writable User / Supervisor Write-Through Cache Disable Accessed Page Size (0: 4 Kbyte) Global Available for OS Use 4 1 2 3 5 7 9 11 6 8 12 31 - Page Table Entry G AVAIL Page Frame Address 31:12 D PCD PWT U W A P P A T Present Writable User / Supervisor Write-Through Cache Disable Accessed Dirty PAT Global Available for OS Use 4 1 2 3 5 7 9 11 6 8 12 31 - 1985 – 80386

5 4KB Page Mapping in 64 bit Mode
sign ext. 29 DIR TABLE OFFSET Linear Address Space (4K Page) 11 20 512 entry Page Table 512 entry Page Directory PDE 4KB Page data 9 12 PTE CR3 M-12 (4KB aligned) M-12 21 30 38 512 entry Page Directory Pointer Table PDP entry PDP PML4 39 47 63 512 entry PML4 Table PML4 entry PML4: Page Map Level 4 PDP: Page Directory Pointer         2003: AMD Opteron Virtual Memory: 248 B = 256 TB Physical Memory: 2M B

6 2MB Page Mapping in 64 bit Mode
PML4 PDP sign ext. 29 DIR OFFSET Linear Address Space (2M Page) 20 512 entry Page Directory PDE 2MB Page data 9 21 CR3 M-12 (4KB aligned) M-21 30 38 512 entry Page Directory Pointer Table PDP entry M-12 39 47 63 512 entry PML4 Table PML4 entry

7 1GB Page Mapping in 64 bit Mode
OFFSET PML4 PDP sign ext. 29 Linear Address Space (1G Page) 1GB Page data 30 CR3 M-12 (4KB aligned) M-30 38 512 entry Page Directory Pointer Table PDP entry 9 39 47 63 512 entry PML4 Table PML4 entry M-12

8 Question 1 We have a core similar to x86-64
64 bit mode Support small pages (mapped by PTE) and large pages (mapped by PDE) Page Table size in each hierarchy is the size of a small page Entry size in the Page Table is 16 bytes, in all the hierarchies 11 N1 12 63 sign ext. DIR TABLE OFFSET PDP PML4 N2 N3 N4 What is the size of a small page? 12 bits in the offset field  212 B = 4KB How many entries are in each Page Table? Page Table size = Page size = 4KB Entry size = 16B  4KB / 16B = 212 / 24 = 28 = 256 entries in each Page Table

9 Question 1 What are the values of N1, N2, N3 and N4?
64 bit Small and large pages PT size = Page size = 4KB Entry size = 16B Page Table: 256 entries 11 N1 12 63 sign ext. DIR TABLE OFFSET PDP PML4 N2 N3 N4 What are the values of N1, N2, N3 and N4? Since we have 256 entries in each table, we need 8 bits to address them Table [19:12] N1 = 19 DIR [27:20] N2 = 27 PDP [35:28] N3 = 35 PML4 [43:36] N4 = 43 PML4 PDP sign ext. DIR OFFSET Page Directory Page CR3 Pointer Table PML4 Table What is the size of a large page? Large pages are pointed by DIR. So the large pages offset is 20 bits [19:0]  large pages size: 220 = 1MB We can also say: PDE can point to 256 pages of 4KB = 1MB

10 Question 1 64 bit Small and large pages PT size = Page size = 4KB Large page size = 1 MB Entry size = 16B Page Table: 256 entries 11 19 12 63 sign ext. DIR TABLE OFFSET PDP PML4 27 35 43 We allocate a sequence of virtual addresses. For each address, what is the minimal number of tables that were added in all the hierarchies? Assume allocation of small pages. See next foil in presentation mode…

11 Question 1: sequence of allocations
63 43 36 35 28 27 20 19 12 11 sign ext. DIR TABLE OFFSET PDP PML4 8 bits instead of 9 - for example purposes only 8 8 8 8 4KB Page 12 PML4 PDP Dir Table offset B25 FFFFF D3 82 FF E2 B25 Page Table 3 new tables + 1 page 349 Page Directory E2 FFFFF D3 82 FF E2 349 PDP Table FF PML4 Table 68 937 0 new tables 28 D3 82 FFFFF D3 82 FF 49 71 46 B5 622 54 0 new tables + 1 page C00 CR3 FFFFF D C00 1 new table + 1 page FFFFF B 3 new tables + 1 page

12 Translation Lookaside Buffer (TLB)
Page table resides in memory  each translation requires an extra memory access TLB caches recently used PTEs speed up translation typically 128 to 256 entries, 4 to 8 way associative TLB indexing On a TLB miss Page Miss Handler (PMH) gets PTE from memory Access Page Table in memory Yes No TLB Hit? Virtual Address Physical Address TLB Access Tag Set Offset Virtual page number

13 Virtual Memory And Cache
Virtual Address Page Walk: get PTE from memory hierarchy L1 Cache Hit? L2 Cache Hit? No STLB Hit? No TLB Hit? No No Access Memory Yes Yes Physical Address Data Page table entries are cached in L1 D$, L2$ and L3$ as data.

14 TLBs The processor saves most recently used PDEs and PTEs in TLBs
Separate TLB for data and instruction caches Separate TLBs for 4KB and 2/4MB page sizes

15 PMH - Page Miss Handler כאשר ישנה גישה לזכרון מתרחש התהליך הבא:
ראשית, ניגש ל-TLB המתאים עם ה-VPN המלא. ישנם dTLB ו-iTLB אשר מכילים תרגום של מידע וזכרון בהתאמה. ה-TLBs מחולקים גם לדפים קטנים וגדולים. אם מתקבל TLB hit, נשתמש בתרגום שמצאנו. במקרה של TLB miss נפנה ל-PMH.

16 PMH - המשך STLB – secondary TLB ה-PMH מכיל את ה-STLB.
לאחר TLB miss ניגש ל-STLB. ה-STLB מהווה עוד "רמה" של TLB. הוא מכיל יותר PTEs. ה-STLB מכיל מיפויים של כתובות הן של פקודות והן של מידע. כמו כן, הוא מכיל גם תרגומים של דפים גדולים. כמו ב-TLB - אנו משתמשים ב-VPN כדי לחפש ב-STLB. עבור STLB hit, נשתמש בתרגום שמצאנו. עבור STLB miss, ה-PMH יבצע Page walk.

17 PMH - המשך Page Walk בשלב הזה ה-PMH חייב לבצע Page Walk: הוא מטייל על היררכית טבלאות הדפים החל מהשורש (4PML). כדי לקצר את התהליך, ה-PMH שומר caches עבור הרמות הגבוהות של התרגום: PML4 cache, PDP cache, DIR cache. מדוע אין צורך לשמור Table cache? כבר יש כזה - ה-TLB. ה-TLB משמש cache ל-PTEs, כלומר שומר את התרגום הסופי של הכתובת. לכן אין צורך ב-Table Cache, אלא רק ב-caches עבור הרמות הגבוהות יותר, אשר מהוות שלב ביניים בתרגום.

18 Accessed with virtual address bits
PMH - המשך Page Walk 48 39 30 21 12 SIGN EXT. PML4 PDP DIR TABLE OFFSET cache Accessed with virtual address bits If hits, returns - DIR cache [47:21] PDE PDP cache [47:30] PDP entry PML4 cache [47:39] PML4 entry

19 PMH - המשך Page Walk אל כל cache אנו פונים גם עם הסיביות ששימשו אותנו לפנייה לרמה גבוהה יותר. אם ברמה כלשהי של ה-cache התרחשה פגיעה, אזי הצלחנו לחסוך (לפחות) את כל הגישות לרמה הנוכחית ולרמות הגבוהות יותר. לדוגמא, עבור PDP cache hit נמצא PDPE מתאים וכך נחסוך פנייה ל-PML4 ו-PDP. עדיין ייתכן שנצטרך לגשת לזכרון עבור הרמות הנמוכות יותר.

20 PMH - המשך Page Walk ה-PMH ניגש לכל המטמונים במקביל ובוחר ברמה הנמוכה ביותר עבורה היה hit. את שארית התרגום נבצע כרגיל, באמצעות גישה לטבלאות אשר שמורות בזכרון. נשים לב שאם התחלנו page walk (משמע לא השגנו את ה-PTE מה-TLB או מה-STLB), לכל הפחות נהיה חייבים לגשת ל-Page Table. את טבלת הדפים המתאימה נחפש בהיררכית הזכרון כרגיל: ניגש קודם ל-L1 data cache, L2 cache,L3 cache ורק לבסוף ניגש לזכרון.

21 PMH - סיכום PML4 PDP PDE PTE Virtual Address No PML4$ hit
inject a load to get PDP L1$ hit Memory No L2$ hit L3$ hit PML4 No PDP$ hit inject a load to get PDE L1$ hit Memory No L2$ hit L3$ hit PDP No PDE$ hit inject a load to get PML4 L1$ hit Memory No L2$ hit L3$ hit TLB hit Yes Yes inject a load to get PTE L1$ hit Memory No L2$ hit L3$ hit PDE Yes Yes No No STLB hit Yes PTE

22 Caches and Translation Structures
Core Instruction bytes On-die Platform L1 inst. cache L1 data cache L2 L3 Memory translation Inst. TLB Data. TLB Load entry PTE PTE PMH PTE STLB Page Walk Logic VA[47:12] PDE cache PDE VA[47:21] PDP cache PDP entry VA[47:30] PML4 cache PML4 entry VA[47:39]

23 Question 2 Processor similar to x86 – 64 bits Pages of 4KB The processor has a TLB TLB Hit: we get the translation with no need to access the translation tables TLB Miss: the processor has to do a Page Walk The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset For the sequence of memory accesses below, how many accesses are needed for the translations? 11 19 12 63 sign ext. DIR TABLE OFFSET PDP PML4 27 35 43 Address memory accesses Explanations H H H

24 Question 2 Processor similar to x86 – 64 bits Pages of 4KB The processor has a TLB TLB Hit: we get the translation with no need to access the translation tables TLB Miss: the processor has to do a Page Walk The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset For the sequence of memory accesses below, how many accesses are needed for the translations? 11 19 12 63 sign ext. DIR TABLE OFFSET PDP PML4 27 35 43 Address memory accesses Explanations H H H Address memory accesses Explanations H H H sign ext. DIR TABLE OFFSET Page Table Page Directory Page CR3 PDP Table PDP PML4 PML4 Table

25 Question 2 Processor similar to x86 – 64 bits Pages of 4KB
The processor has a TLB TLB Hit: we get the translation with no need to access the translation tables TLB Miss: the processor has to do a Page Walk The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset For the sequence of memory accesses below, how many accesses are needed for the translations? 11 19 12 63 sign ext. DIR TABLE OFFSET PDP PML4 27 35 43 Address memory accesses Explanations H 4 We need to access the memory for each of the 4 translation tables H H

26 Question 2 Processor similar to x86 – 64 bits Pages of 4KB
The processor has a TLB TLB Hit: we get the translation with no need to access the translation tables TLB Miss: the processor has to do a Page Walk The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset For the sequence of memory accesses below, how many accesses are needed for the translations? 11 19 12 63 sign ext. DIR TABLE OFFSET PDP PML4 27 35 43 Address memory accesses Explanations H 4 We need to access the memory for each of the 4 translation tables H Same pages as above  TLB hit  No memory access H

27 Question 2 Processor similar to x86 – 64 bits Pages of 4KB
The processor has a TLB TLB Hit: we get the translation with no need to access the translation tables TLB Miss: the processor has to do a Page Walk The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset For the sequence of memory accesses below, how many accesses are needed for the translations? 11 19 12 63 sign ext. DIR TABLE OFFSET PDP PML4 27 35 43 Address memory accesses Explanations H 4 We need to access the memory for each of the 4 translation tables H Same pages as above  TLB hit  No memory access H 3 We hit in PML4 cache. Then we miss in the PDP and so we need to access the memory 3 times: PDP, DIR, PTE Address memory accesses Explanations H 4 We need to access the memory for each of the 4 translation tables H Same pages as above  TLB hit  No memory access H 3 We hit in PML4 cache. Then we miss in the PDP and so we need to access the memory 3 times: PDP, DIR, PTE sign ext. DIR TABLE OFFSET Page Table Page Directory Page CR3 PDP Table PDP PML4 PML4 Table

28 Question 3 L1 data Cache: 32KB, 2 ways, 64B per block Page size: 4KB
How can we access this cache before we get the physical address? offset size: 64B per block  6 bits: bits [5:0] set size: 32KB / (2 ways * 64B) = 215B / (2 ways * 26B) = 28 = 256 sets  8 bits: bits [13:6] Page is 212B  12 bits are not translated: [11:0] We lack 2 bits [13:12] to get the set So we do a lookup using 2 untranslated bits [13:12] for the set Those bits can be different from the matching ones in the PFN obtained after translation. We need to compare the whole PFN to the tag stored in the cache tag array. Offset 5 Set 6 13 Not Translated 12 11 VPN Translated 47

29 Question 3: Read Access = Translated Not Translated VPN Set Offset PFN
47 13 12 11 6 5 PFN 40 13 12 Tag [40:12] Way 0 Way 1 = Tag Match 40:12

30 Question 3: example 0000 0000 0000 0 … 0000 0000 … 0000 VA: PA: VPN
Translated Not Translated VPN Set Offset 47 13 12 11 6 5 0 … 0000 0000 … 0000 VA: PA: VPN Offset PFN Set: [13:6] = 0 Write Virtual Address A Tag: 0 0 … 0011 0011 … 0000 VB: PB: Set: [13:6] = 0 Tag: 3 (0 if we don’t take [13:12]) Read Virtual Address B

31 Question 3: Virtual Alias
L1 data Cache: 32KB, 2 ways, 64B per block What will happen when we access virtual page A with a given offset, and after this we access virtual page B with the same offset, if both virtual pages are mapped by the OS to the same physical page? Offset 5 Set 6 13 Not Translated 12 11 VPN Translated 47 PFN 40 VPN xxxx01 yyyy00 PFN zzzz 2 virtual pages map to the same frame xxxx01.set.offset and yyyy00.set.offset 01 and 00 are bits 13:12 Physical Addresses Cache (L1) set[11:6] Not translated Max: 64 sets block i of frame zzzz Phys. addressed cache: The data exists only once Physical Addresses Cache (L1) 01.set[11:6] block i of frame zzzz 00.set[11:6] Virtual addressed cache: The data may exist twice AVOID THIS !!!

32 Question 3: Virtual Alias
Offset 5 Set 6 13 Not Translated 12 11 VPN Translated 47 PFN 40 L1 data Cache: 32KB, 2 ways, 64B per block What will happen when we access virtual page A with a given offset, and after this we access virtual page B with the same offset, if both virtual pages are mapped by the OS to the same physical page? Physical Addresses Cache 01.set[11:6] block i of frame zzzz 00.set[11:6] Virtual addressed cache: The data may exist twice AVOID THIS !!! Avoid having the same data twice in the cache - for virtual addresses xxxx01.set.offset and yyyy00.set.offset When we allocate a new entry: Check 4 sets * 2 ways and see if the same tag appears. If so, evict the second occurrence of the data (the alias).

33 Question 3: Snoop The cache is snooped with a physical address [40:0].
L1 data Cache: 32KB, 2 ways, 64B per block What happens in case of snoop in the cache? Offset 5 Set 6 13 Not Translated 12 11 VPN Translated 47 PFN 40 The cache is snooped with a physical address [40:0]. Since the 2 MSBs of the set are virtual, a given physical address can map to 4 different sets in the cache (depending on the virtual page that is mapped to it). So we must snoop 4 sets * 2 ways in the cache. Physical Addresses Cache 01.set[11:6] 00.set[11:6] Virtual addressed cache: The data may exist twice AVOID THIS !!! block i of frame zzzz

34 Question 4 Core similar to x86 in 64 bit mode.
Supports small pages (pointed by PTE) and large pages (pointed by DIR). Size of an entry in all the different page tables is 8 bytes. TLB, and PMH caches for all the levels - 4 entries direct mapped Access time on hit: 2 cycles Miss known after 1 cycle PMH caches are accessed at all the levels in parallel. In each level, when there is a hit: the PMH cache provides the relevant entry in the page table in the relevant level. In each level, when there is a miss: the core accesses the relevant page table in the main memory. Access time to the main memory is 100 cycles, not including the time needed to get the PMH cache miss. 11 23 12 63 sign ext. DIR TABLE OFFSET PDP PML4 35 47 55

35 Question 4 What is the size of the large pages?
11 23 12 63 sign ext. DIR TABLE OFFSET PDP PML4 35 47 55 8 12 12 12 12 What is the size of the large pages? A large page is pointed by DIR. Therefore, all the bits under DIR are offset inside the large page: 224 = 16 MB How many entries in each Page Table ? PTE: 212 DIR: 212 PDP: 212 PML4: 28

36 11 23 12 63 sign ext. DIR TABLE OFFSET PDP PML4 35 47 55 4 entries Direct mapped hit: 2 cycles miss: 1 cycle Memory access: 100 cycles How many cycles are required to get a translation to physical address for each virtual address? Assumptions: TLB and PMH caches are empty before the accesses sequence. All accesses are to small pages. All relevant pages are in memory – no page faults. Virtual Addr. Cycles Comment FF ABCD FF ABCD FF ABCD FF ABCD FF A0CD

37 miss and memory access at each level in PMH: 1 + 1+ 4*100 = 402 cycles
11 23 12 63 sign ext. DIR TABLE OFFSET PDP PML4 35 47 55 4 entries Direct mapped hit: 2 cycles miss: 1 cycle Memory access: 100 cycles Virtual Addr. Cycles Comment FF ABCD 402 TLB miss, miss and memory access at each level in PMH: *100 = 402 cycles FF ABCD FF ABCD FF ABCD FF A0CD

38 miss and memory access at each level in PMH: 1 + 1+ 4*100 = 402 cycles
11 23 12 63 sign ext. DIR TABLE OFFSET PDP PML4 35 47 55 4 entries Direct mapped hit: 2 cycles miss: 1 cycle Memory access: 100 cycles Virtual Addr. Cycles Comment FF ABCD 402 TLB miss, miss and memory access at each level in PMH: *100 = 402 cycles FF ABCD 203 TLB miss cycle (PML4, PDP, DIR, STLB) = (H,H,M,M) 1 cycle PDP cache read more cycle DIR cache miss cycles PTE cache miss cycles FF ABCD FF ABCD FF A0CD

39 miss and memory access at each level in PMH: 1 + 1+ 4*100 = 402 cycles
11 23 12 63 sign ext. DIR TABLE OFFSET PDP PML4 35 47 55 4 entries Direct mapped hit: 2 cycles miss: 1 cycle Memory access: 100 cycles Virtual Addr. Cycles Comment FF ABCD 402 TLB miss, miss and memory access at each level in PMH: *100 = 402 cycles FF ABCD 203 TLB miss cycle (PML4, PDP, DIR, STLB) = (H,H,M,M) 1 cycle PDP cache read more cycle DIR cache miss cycles PTE cache miss cycles FF ABCD TLB miss, PMH cache miss in all the levels: *100 = 402 cycles FF ABCD FF A0CD

40 miss and memory access at each level in PMH: 1 + 1+ 4*100 = 402 cycles
11 23 12 63 sign ext. DIR TABLE OFFSET PDP PML4 35 47 55 4 entries Direct mapped hit: 2 cycles miss: 1 cycle Memory access: 100 cycles Virtual Addr. Cycles Comment FF ABCD 402 TLB miss, miss and memory access at each level in PMH: *100 = 402 cycles FF ABCD 203 TLB miss cycle (PML4, PDP, DIR, STLB) = (H,H,M,M) 1 cycle PDP cache read more cycle DIR cache miss cycles PTE cache miss cycles FF ABCD TLB miss, PMH cache miss in all the levels: *100 = 402 cycles FF ABCD 303 TLB miss, (PML4, PDP, DIR, STLB) = (H,M*,M**,M) *PDP: the entry 234 that was filled for access 2 was replaced by the entry that was filled in access 3, as it is in the same set  miss. **DIR: Similarly *100 = 303 cycles FF A0CD

41 miss and memory access at each level in PMH: 1 + 1+ 4*100 = 402 cycles
11 23 12 63 sign ext. DIR TABLE OFFSET PDP PML4 35 47 55 4 entries Direct mapped hit: 2 cycles miss: 1 cycle Memory access: 100 cycles Virtual Addr. Cycles Comment FF ABCD 402 TLB miss, miss and memory access at each level in PMH: *100 = 402 cycles FF ABCD 203 TLB miss cycle (PML4, PDP, DIR, STLB) = (H,H,M,M) 1 cycle PDP cache read more cycle DIR cache miss cycles PTE cache miss cycles FF ABCD TLB miss, PMH cache miss in all the levels: *100 = 402 cycles FF ABCD 303 TLB miss, (PML4, PDP, DIR, STLB) = (H,M*,M**,M) *PDP: the entry 234 that was filled for access 2 was replaced by the entry that was filled in access 3, as it is in the same set  miss. **DIR: Similarly *100 = 303 cycles FF A0CD 2 Hit in TLB – no need to go to PMH: 2 cycles


Download ppt "תרגול מס' 8: x86 Virtual Memory and TLB"

Similar presentations


Ads by Google