Address Translation 101

Objectives

The purpose of this lab is to simulate x64 page walking and convert a Virtual Address(VA) to its corresponding Physical Address(PA).

Intro to x64/Long Mode Paging

Put simply, paging is the implementation of Virtual Memory which is mapped to Physical Memory by the Memory Management Unit(MMU).
Remember that PA is the actual address of a memory cell in the physical RAM chip.
The MMU is responsible for walking the page tables and translating a VA(used by the CPU) to its mapped PA.
The physical page size in x64 architecture can be any one of the following: 4 kB(Normal Page), 2 MB(Large Page) or 1 GB(Huge Page).

Virtual Address Space

Virtual Address Space(VAS) is simply a software's view of memory. It is divided into 2 categories:
1. User Virtual Address Space(User VAS) - This VAS is private and available per-process ergo all addresses in User VAS are relative to the process.
2. Kernel Virtual Address Space(Kernel VAS) - This VAS is shared between all processes(except PTE and Session Space) albeit it is not accessible from Ring-3/User-Mode.
In theory, x64 systems should have a total VAS of 2 ^ 64 = 16 EB but due to current hardware limitations, Physical Addresses are limited to 48 bits. It is for this reason, a Canonical Address had to be adopted where only 48 bits of a VA are used and the rest of the bits(48 - 63) are sign-extended(more on this later).
Thus, at present, total VAS = 2 ^ 48 = 256 TB which is divided equally among User VAS and Kernel VAS equal to 128 TB.
In other words, maximum addressable Physical Memory/RAM = 256 TB
The following diagram might make it clearer:
x64 Virtual Address Space

4-Level Paging and PxE

It is important to briefly mention that x64 paging uses 4-level paging. The four paging structures(from highest to lowest respectively) are:
1. Page Map Level 4(PML4)
2. Page Directory Pointer Table(PDPT)
3. Page Directory Table(PDT)
4. Page Table(PT)
Each paging table contains 512 entries(PxE) each of size 8 bytes and they are called: PML4E/PXE(In WinDBG), PDPE/PPE(In WinDbg), PDE and PTE respectively.
In x64 Windows, each page table entry is represented by a structure known as: nt!_MMPTE_HARDWARE.
Here is the complete structure:
1
// Ref: https://www.vergiliusproject.com/kernels/x64/Windows%2010%20|%202016/2009%2020H2%20(October%202020%20Update)/_MMPTE_HARDWARE
2
//0x8 bytes (sizeof)
3
struct _MMPTE_HARDWARE {
4
ULONGLONG Valid:1; //0x0
5
ULONGLONG Dirty1:1; //0x0
6
ULONGLONG Owner:1; //0x0
7
ULONGLONG WriteThrough:1; //0x0
8
ULONGLONG CacheDisable:1; //0x0
9
ULONGLONG Accessed:1; //0x0
10
ULONGLONG Dirty:1; //0x0
11
ULONGLONG LargePage:1; //0x0
12
ULONGLONG Global:1; //0x0
13
ULONGLONG CopyOnWrite:1; //0x0
14
ULONGLONG Unused:1; //0x0
15
ULONGLONG Write:1; //0x0
16
ULONGLONG PageFrameNumber:36; //0x0
17
ULONGLONG ReservedForHardware:4; //0x0
18
ULONGLONG ReservedForSoftware:4; //0x0
19
ULONGLONG WsleAge:4; //0x0
20
ULONGLONG WsleProtection:3; //0x0
21
ULONGLONG NoExecute:1; //0x0
22
};
Copied!
We will discuss some of the more relevant members of this structure(or control bits in the page table entry) here:
1. Valid or P bit - Must be set to 1 for the page table entry to be considered valid(entry may be used for address translation/page is present in RAM)
2. Owner or U/S bit - If set to 1 or U, it is a User-Mode page and if set to 0 or S, it is a Supervisor/Kernel-Mode page
3. LargePage or L bit - If set to 1, it is a Large Page(2 MB)
4. Write or R/W bit - If set to 1 or W, writing to page is enabled and if set to 0 or R, it is a read-only page
5. NoExecute or E bit - If set to 1, code cannot be executed on the page
There is yet another important member of the structure PxE which we haven't discussed yet i.e. Page Frame Number(PFN). PFN denotes the PA of the base of the next paging structure. I assure you that we will come back to this later when we do the page walk.

Anatomy Of A Virtual Address

In order to understand address translation, we must first understand the anatomy of a VA.
Here is a diagram breaking down a VA into various parts:
x64 VA Anatomy - I
As visible from the diagram, the parts of a VA(from high to low) are:
1. Sign Extend - Bits 48 to 63 = 16 bits for sign extension
2. PML4 Offset - Bits 39 to 47 = 9 bits index into the PML4 paging structure
3. PDPT Offset - Bits 30 to 38 = 9 bits index into the PDPT paging structure
4. PDP Offset - Bits 21 to 29 = 9 bits index into the PDT paging structure
5. PT Offset - Bits 12 to 20 = 9 bits index into the PT paging structure
6. Physical Page Offset - Bits 0 to 11 = 12 bits index into the physical page
Sign Extend bits are used to represent a Canonical Address which implies for User VAS, Virtual Addresses are sign-extended with 0 while for Kernel VAS, Virtual Addresses are sign-extended with 1.
Next, the different offsets simply select an appropriate entry from the corresponding 4-level paging structures.
Finally, the Physical Page Offset denotes the particular byte within the physical page determined by the PTE.
Here is another break down of a VA:
x64 VA Anatomy - II
Remember that the Physical Page Offset(lower 12 bits) remains the same on both VA and its corresponding PA.
A VA is actually a fusion of a Segment Number and the Linear Address as already discussed. However, Segmentation is not relevant for address translation ergo we will not discuss it here.

VA -> PA Translation

Now that some basic theory has been covered, let's look at the logic behind address translation.
x64 Address Translation(4 kB)
Let's break it down and see what's happening.
Every process has its own set of paging structures and a DirectoryTableBase(DTB)/DirBase(in WinDbg) in _nt!_KPROCESS(Kernel process object) structure that contains the PA of the highest-level paging structure i.e. PML4 in x64 paging. This value is moved into the CR3 register every time a process context switch is made by the processor and it is from this privileged control register that the address translation begins.
After the MMU retrieves the PA of the PML4 table base, it then selects a particular entry from the PML4 table based on the index given by the VA.
Now, this page table entry has several control bits and a PFN(already discussed before) which points to the starting PA of the next paging structure i.e. PDPT and this process is repeated for the remaining paging structures until we get to the PTE which denotes the base address of a page in physical memory and it is then added with the Physical Page Offset to get the actual PA mapped by the VA.
Keep in mind that a PTE maps 512 * 8 = 4 kB of Physical Memory, a PDE can address 512 * 512 * 8 = 2 MB of Physical Memory, a PDPE can address 512 * 512 * 512 = 1 GB of Physical Memory and finally a PML4E can address up to 512 * 512 * 512 * 512 * 8 ~ 512 GB of Physical Memory.
There is also a slight variation of the above where the PDE maps a Large Page/2 MB of contiguous physical memory instead of pointing to a PT(more on this later).

Importance of Address Translation

Address translation is usually of prime importance when productizing an exploit against a vulnerable device driver.
Let me explain.
So we've got an arbitrary physical memory read/write primitive but to actually make it usable we need to convert VA to PA and ergo derive an arbitrary Ring 0/Kernel-Mode virtual memory read/write primitive since we deal exclusively with VAs and not PAs.

Converting VA to PA

To convert a VA to its mapped PA using WinDbg we have two methods. One of them is automatic and the other is semi-automatic. We will look at both of them in this section and lastly we'll also look at a completely manual address translation which shall come to use later when we discuss the code.
Feel free to use your choice of debugging mode for this example i.e. either kd or lkd. Both of them should work equally well for this purpose.

WinDbg(Auto Mode)

Here is the command for translating VA to PA:
1
lkd> !vtop <DTB> <VA>
Copied!
where <DTB> is the Directory Table Base/DirBase and <VA> denotes the Virtual Address we are interested in translating.
Address Translation using vtop
Note the page table entries at each level.
Remember that the VA must be properly formatted(without ` symbol) for this command to work successfully. Also, this command does not need a process context switch.

WinDbg(Semi-Auto Mode)

Here is the alternative WinDbgcommand for walking page tables:
1
lkd> !pte <VA>
2
lkd> ? <PFN of PTE> * 0x1000 + <Physical Page Offset>
Copied!
where we must first retrieve the Page Frame Number, multiply it with 0x1000 and finally add the offset from the base of the PFN to get the PA.
Address Translation using pte
Apart from the page table entries themselves, also note the VA of the entries which are available using this command.
Remember that !pte uses the currently selected process's DirBase as the base PA of the PML4 table and walk the page tables.

WinDbg(Manual Mode)

Up until now, we have relied on some WinDbg commands to walk the page tables for us but once again we might not have that luxury in a real-world scenario. This means that we must rely on our knowledge of address translation and manually walk the page tables. With that in mind, let's get initiated.
  1. 1.
    To begin, we must first retrieve the Kernel DTB or the PA of the PML4 table base for Kernel since we are interested in the Kernel VAS(there's already a session where this is discussed in detail):
    1
    lkd> !process 4 0
    Copied!
  2. 2.
    Choose a VA to translate. We will be using nt!KeInsertQueueApc:
    1
    lkd> x nt!KeInsertQueueApc
    Copied!
  3. 3.
    Decompose the VA to get the PML4 offset:
    1
    lkd> ? (VA >> (0n12 + 0n9 + 0n9 + 0n9)) & 0x1FF
    Copied!
  4. 4.
    Get the PML4E/PXE:
    1
    lkd> !dq (PML4 Base Address + (PML4 offset * 8)) L1
    Copied!
    To get the page table entry at each level, we are going to calculate the PA of the page table entry and then dump physical memory using !dq.
  5. 5.
    Decompose the VA to get the PDPT offset:
    1
    lkd> ? (VA >> (0n12 + 0n9 + 0n9)) & 0x1FF
    Copied!
  6. 6.
    Get the PDPE/PPE:
    1
    lkd> !dq ((PML4E & 0xFFFFFFFFFF000) + (PDPT offset * 8)) L1
    Copied!
    where 0xFFFFFFFFFF000 is the PA mask for 4 kB page.
  7. 7.
    Decompose the VA to get the PDT offset:
    1
    lkd> ? (VA >> (0n12 + 0n9)) & 0x1FF
    Copied!
  8. 8.
    Get the PDE:
    1
    lkd> !dq ((PDPE & 0xFFFFFFFFFF000) + (PDT offset * 8)) L1
    Copied!
    It is here that we check if LargePage bit is set in PDE using:
    1
    lkd> ? ((PDE & 0x80) != 0)
    Copied!
    If true, we calculate PA like so:
    1
    lkd> ? (PDE & 0xfffffffe00000) + (VA & 0x1fffff)
    Copied!
    where 0xfffffffe00000 is the PA mask for 2 MB page and 0x1fffff is the Physical Page Offset mask for 2 MB page.
  9. 9.
    Else, decompose the VA to get the PT offset:
    1
    lkd> ? (VA >> 0n12) & 0x1FF
    Copied!
  10. 10.
    Get the PTE:
    1
    lkd> !dq ((PDE & 0xFFFFFFFFFF000) + (PT offset * 8)) L1
    Copied!
  11. 11.
    Decompose the VA to get the Physical Page Offset:
    1
    lkd> ? VA & 0xFFF
    Copied!
    where 0xFFF is the Physical Page Offset mask for 4 kB page.
  12. 12.
    Finally, get the PA like so:
    1
    lkd> ? (PTE & 0xFFFFFFFFFF000) + Physical Page offset
    Copied!
We may now verify the results of the conversion by examining the memory contents at PA and VA dumped by !db and db commands respectively.
Here is the complete WinDbg output dump:
1
lkd> !process 4 0
2
Searching for Process with Cid == 4
3
PROCESS ffffd30cedc85080
4
SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000
5
DirBase: 001aa000 ObjectTable: ffffb60bad229c40 HandleCount: 2111.
6
Image: System
7
8
lkd> x nt!KeInsertQueueApc
9
fffff803`3822b520 nt!KeInsertQueueApc (void)
10
lkd> ? (fffff803`3822b520 >> (0n12 + 0n9 + 0n9 + 0n9)) & 0x1FF
11
Evaluate expression: 496 = 00000000`000001f0
12
lkd> !dq (0x1aa000 + (0x1f0 * 8)) L1
13
# 1aaf80 00000000`01189063
14
lkd> ? (fffff803`3822b520 >> (0n12 + 0n9 + 0n9)) & 0x1FF
15
Evaluate expression: 12 = 00000000`0000000c
16
lkd> !dq ((00000000`01189063 & 0xFFFFFFFFFF000) + (0xc * 8)) L1
17
# 1189060 00000000`0118a063
18
lkd> ? (fffff803`3822b520 >> (0n12 + 0n9)) & 0x1FF
19
Evaluate expression: 449 = 00000000`000001c1
20
lkd> !dq ((00000000`0118a063 & 0xFFFFFFFFFF000) + (0x1c1 * 8)) L1
21
# 118ae08 00000000`01196063
22
lkd> ? (fffff803`3822b520 >> 0n12) & 0x1FF
23
Evaluate expression: 43 = 00000000`0000002b
24
lkd> !dq ((00000000`01196063 & 0xFFFFFFFFFF000) + (0x2b * 8)) L1
25
# 1196158 09000000`02a10121
26
lkd> ? fffff803`3822b520 & 0xFFF
27
Evaluate expression: 1312 = 00000000`00000520
28
lkd> ? (09000000`02a10121 & 0xFFFFFFFFFF000) + 0x520
29
Evaluate expression: 44107040 = 00000000`02a10520
30
lkd> !db 0x2a10520
31
# 2a10520 48 89 5c 24 10 44 89 4c-24 20 55 56 57 41 54 41 H.\$.D.L$ UVWATA
32
# 2a10530 55 41 56 41 57 48 83 ec-60 4d 8b e0 4c 8b ea 48 UAVAWH..`M..L..H
33
# 2a10540 8b f1 33 d2 48 8b 0d 2d-94 a0 00 41 b8 00 30 00 ..3.H..-...A..0.
34
# 2a10550 00 e8 2a 04 00 00 44 8a-56 51 44 8a d8 48 8b 46 ..*...D.VQD..H.F
35
# 2a10560 38 45 84 d2 48 89 44 24-48 48 8b 46 30 0f 95 84 8E..H.D$HH.F0...
36
# 2a10570 24 a0 00 00 00 48 89 44-24 50 48 8d 05 7f 85 6a $....H.D$PH....j
37
# 2a10580 00 48 39 46 20 0f 84 ff-a2 20 00 32 c9 48 8b 7e .H9F .... .2.H.~
38
# 2a10590 08 65 48 8b 14 25 88 01-00 00 48 8b 87 20 02 00 .eH..%....H.. ..
39
lkd> db fffff803`3822b520
40
fffff803`3822b520 48 89 5c 24 10 44 89 4c-24 20 55 56 57 41 54 41 H.\$.D.L$ UVWATA
41
fffff803`3822b530 55 41 56 41 57 48 83 ec-60 4d 8b e0 4c 8b ea 48 UAVAWH..`M..L..H
42
fffff803`3822b540 8b f1 33 d2 48 8b 0d 2d-94 a0 00 41 b8 00 30 00 ..3.H..-...A..0.
43
fffff803`3822b550 00 e8 2a 04 00 00 44 8a-56 51 44 8a d8 48 8b 46 ..*...D.VQD..H.F
44
fffff803`3822b560 38 45 84 d2 48 89 44 24-48 48 8b 46 30 0f 95 84 8E..H.D$HH.F0...
45
fffff803`3822b570 24 a0 00 00 00 48 89 44-24 50 48 8d 05 7f 85 6a $....H.D$PH....j
46
fffff803`3822b580 00 48 39 46 20 0f 84 ff-a2 20 00 32 c9 48 8b 7e .H9F .... .2.H.~
47
fffff803`3822b590 08 65 48 8b 14 25 88 01-00 00 48 8b 87 20 02 00 .eH..%....H.. ..
Copied!
And here is an example of Large Page translation:
Large Page Address Translation

Code

Now that we know the logic behind manual address translation, let's code it.
The below piece of code assumes that we already have a physical read primitive and the Kernel DTB.
Here is the function to translate a VA to its corresponding PA:
Utils.h
1
// Translate from virtual address to physical address using
2
// arbitrary physical memory read primitive
3
// ------------------------------------------------------------------------
4
5
LPVOID convert_virtual_to_physical(HANDLE deviceHandle, DWORD_PTR pml4Base, DWORD_PTR virtualAddress) {
6
// Init some important stuff
7
WORD pml4Offset;
8
DWORD_PTR PML4E;
9
WORD pdptOffset;
10
DWORD_PTR PDPE;
11
WORD pdtOffset;
12
DWORD_PTR PDE;
13
WORD ptOffset;
14
DWORD_PTR PTE;
15
WORD phyPageOffset;
16
DWORD_PTR physicalAddress;
17
18
printf("Virtual Address: 0x%p\n", virtualAddress); // [DBG]
19
20
// Get PML4 offset from virtual address
21
pml4Offset = (virtualAddress >> (12 + 9 + 9 + 9)) & 0x1FF;
22
printf("PML4 offset: 0x%X\n", pml4Offset); // [DBG]
23
24
// Get PML4E/PXE
25
if (!read_physical_memory(deviceHandle, (pml4Base + (pml4Offset * 8)), &PML4E, sizeof(PML4E))) {
26
printf("[-] Unable to read physical memory to get PML4E/PXE!\n"); // [DBG]
27
return NULL;
28
}
29
printf("PML4E: 0x%p\n", PML4E); // [DBG]
30
31
// Get PDPT offset from virtual address
32
pdptOffset = (virtualAddress >> (12 + 9 + 9)) & 0x1FF;
33
printf("PDPT offset: 0x%X\n", pdptOffset); // [DBG]
34
35
// Get PDPE/PPE
36
if (!read_physical_memory(deviceHandle, ((PML4E & PHY_ADDRESS_MASK) + (pdptOffset * 8)), &PDPE, sizeof(PDPE))) {
37
printf("[-] Unable to read physical memory to get PDPE/PPE!\n"); // [DBG]
38
return NULL;
39
}
40
printf("PDPE: 0x%p\n", PDPE); // [DBG]
41
42
if (PDPE == 0)
43
return NULL;
44
45
// Get PDT offset from virtual address
46
pdtOffset = (virtualAddress >> (12 + 9)) & 0x1FF;
47
printf("PDT offset: 0x%X\n", pdtOffset); // [DBG]
48
49
// Get PDE
50
if (!read_physical_memory(deviceHandle, ((PDPE & PHY_ADDRESS_MASK) + (pdtOffset * 8)), &PDE, sizeof(PDE))) {
51
printf("[-] Unable to read physical memory to get PDE!\n"); // [DBG]
52
return NULL;
53
}
54
printf("PDE: 0x%p\n", PDE); // [DBG]
55
56
if (PDE == 0)
57
return NULL;
58
59
// Check for 2MB pages - LargePage(L) bit set in PDE
60
// Entry page size bit = 0x80
61
// Ref: https://www.cnblogs.com/bianchengnan/p/6231597.html
62
if ((PDE & 0x80) != 0) {
63
// Physical address mask for 2 MB pages = 0xFFFFFFFE00000
64
// Physical page offset for 2 MB pages = 0x1FFFFF
65
physicalAddress = ((PDE & 0xFFFFFFFE00000) + (virtualAddress & 0x1FFFFF));
66
printf("Physical Address: 0x%p\n", physicalAddress); // [DBG]
67
return (LPVOID)physicalAddress;
68
}
69
70
// Get PT offset from virtual address
71
ptOffset = (virtualAddress >> 12) & 0x1FF;
72
printf("PT offset: 0x%X\n", ptOffset); // [DBG]
73
74
// Get PTE
75
if (!read_physical_memory(deviceHandle, ((PDE & PHY_ADDRESS_MASK) + (ptOffset * 8)), &PTE, sizeof(PTE))) {
76
printf("[-] Unable to read physical memory to get PTE!\n"); // [DBG]
77
return NULL;
78
}
79
printf("PTE: 0x%p\n", PTE); // [DBG]
80
81
if (PTE == 0)
82
return NULL;
83
84
// Get physical page offset from virtual address
85
// Physical page offset for 4 kB pages = 0xFFF
86
phyPageOffset = virtualAddress & 0xFFF;
87
printf("Physical page offset: 0x%X\n", phyPageOffset); // [DBG]
88
89
// Calculate physical address
90
// Physical address mask for 4 kB pages = 0xFFFFFFFFFF000
91
physicalAddress = (PTE & PHY_ADDRESS_MASK) + phyPageOffset;
92
printf("Physical Address: 0x%p\n", physicalAddress); // [DBG]
93
94
return (LPVOID)physicalAddress;
95
}
Copied!
Once again, there's not much to explain in this short code snippet since I have taken the effort to comment on every line so as to prevent any confusions.
And here is the output(+ some WinDbgfu to verify our results):
Virtual To Physical Address Translation
Note that we used the same VA as the one used in the manual address translation using WinDbg so that we can verify the results.
Shoutout to Ruben Boonen a.k.a. FuzzySec and Connor McGarr a.k.a. 33y0re whose works greatly inspired this session.

Links

Credits