Leaking Kernel DTB

Objectives

The purpose of this lab is to find the PA of the Page-Map-Level-4(PML4) table base for Kernel/System Virtual Address Space.

Kernel DTB

Put simply, Directory Table Base(DTB)(or DirBase in WinDBG) is the PA of the base of the top-level paging structure(PML4 in Long Mode\x64 Paging) whose value is also contained by the CR3 register.
When a new process is created, a new PML4 table is also allocated. The PA of the PML4 table is stored in nt!_KPROCESS(Kernel process object) structure's DirectoryTableBase member. When a process context switch is made by the processor, this PA is moved into the CR3 register which the Memory Management Unit(MMU) can then use along with the VA offsets into various paging structures to walk the page tables and translate virtual addresses to appropriate physical addresses.
Our goal in this lab is not to talk about VA -> PA translation. That shall be discussed in another session.
In other words, every process which is basically a containment object has a unique randomized PA of its root page table base which is moved into CR3 when the process is selected.
But what about the Kernel DTB that maps the Kernel Virtual Address Space? After all, that is what we are concerned about in this lab.
It turns out the PA of the base of the PML4 table for the Kernel/System process is mostly at an unrandomized PA(More on this later).
Note that that DTB can also be referred to as Self-referencing PML4E or in other words, an entry in the PML4 table which points to its own page table base PA.

KVA Shadow

Remember that the System Space is actually a shared one and mapped into every UM process(unless KVAS is enabled) and therefore is process context independent albeit not accessible unless in Supervisor/Kernel mode which is set via U/S bit in Page Table Entry(PTE).
This process is slightly different when Kernel Virtual Address Shadow(KVAS) is enabled to mitigate Meltdown. This involves the separation of User and Kernel-mode page tables for each process. This means that System Space is not mapped in CPL-3 page tables. When a process context switch is made, nt!_KPROCESS.DirectoryTableBase is moved into nt!_KPRCB.KernelDirectoryTableBase and the CPL-3 page table base is stored into nt!_KPROCESS.UserDirectoryTableBase. However, this is not important for our discussion and we won't be concerned with it as such.
KVAS is the software mitigation for Meltdown(CVE-2017-5754) introduced in Windows 10 RS4/1803 for some old(er) CPUs that are vulnerable. With newer processors (or processors that were not affected by it), KVAS will not be enabled at boot time.
When KVAS is enabled, there are some extra flags in the DirBase. Ex: 0x1aa000 would become 0x1aa002 when KVAS is enabled. However, in this lab, we won't be too concerned with it since those are used for determining User/Kernel-mode page tables and are masked off before moving into CR3.
KVAS can be forcefully disabled by running the following commands from an elevated command prompt and rebooting:
1
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverride /t REG_DWORD /d 3 /f
2
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverrideMask /t REG_DWORD /d 3 /f
Copied!
Remember this is strictly for testing purposes and should only be done in controlled environments (air-gapped testing VMs).

Kernel DTB for some platforms

The PML4 base address(physical) of Kernel have been recorded for the following platforms:
Platform
PML4 Base
Bare Metal(BIOS)
0x1ba000
Bare Metal(UEFI)
0x1ad000
Hyper-V Gen-1 (BIOS)
0x1aa000
Hyper-V Gen-2 (UEFI)
0x6d4000
VirtualBox(BIOS)
0x1aa000
VirtualBox(UEFI)
0x1ad000
Much of this data is taken from research conducted by hugsy and hugeh0ge.
According to their research, the PA of the PML4 table for the Kernel/System process does not appear to be intentionally randomized between boots in any way unlike the VA of PML4(and other page tables and its entries).

Importance of Kernel DTB

At this point, the readers may raise a question as to why the DTB for Kernel is required for us or what is the significance of this address.
Answer: Simple. We've got our sweet sweet arbitrary physical memory R/W primitives but to productize it, we need to build a function to convert VA to PA and simulate the MMU. In order to do that, we need the base address of the PML4 paging structure so that we can traverse page tables and reach the appropriate physical page, thus also giving us a Ring-0 arbitrary virtual memory R/W.
If you've been following the theory till now then you probably had guessed the answer but in case you didn't, now you know :)

Finding the Kernel DTB

Remember how we said that the PML4 base for Kernel is at unrandomized PA? So can we just hard-code it in our tools and get it done with? Because surely with our extensive reconnaissance we can just find out our target's platform and use an appropriate address.
Sure, but in our business, there are no rooms for error and being able to find the PML4 base dynamically would reduce the maintenance cost of our tooling while making it more portable and universal.
So with that in mind, let's see how we can find the Kernel DTB.

WinDbg

As usual, we'll start with manually finding the PML4 base using our favourite kernel debugger.
Here is the command for doing so:
1
kd> r cr3
Copied!
Viewing contents of CR3 register
This simply displays the contents of the CR3 register which holds the Kernel DTB(when breaking in to an idle system).
Alternatively, if you are working with lkd, here is the command for doing so:
1
lkd> !process 4 0
Copied!
Checking DirBase of System space
Note that 4 denotes the PID for the System process(Minimal process - no UM state - only for Kernel-mode threads) and 0 denotes the verbosity level(minimum) The DirBase associated with the process object will be the Kernel DTB. This, of course, will match the value of the CR3 register but we are unable to view register contents in local debugging.

Low Stub/HalpLowStub

Unfortunately, we might not have the luxury of installing and setting up kernel debugging on the target machine and we need to come up with a way to find the Kernel DTB dynamically on those systems.
Enter one of the most elusive and undocumented structures of all time - the LowStub.
According to a presentation at REcon Brussels 2017 by Alex Ionescu, the Low Stub or HalpLowStub is actually the undocumented PROCESSOR_START_BLOCK structure.
1
// Ref: https://github.com/mic101/windows/blob/6c4cf038dbb2969b1851511271e2c9d137f211a9/WRK-v1.2/base/ntos/inc/amd64.h#L3334
2
struct _PROCESSOR_START_BLOCK {
3
FAR_JMP_16 Jmp;
4
ULONG CompletionFlag;
5
PSEUDO_DESCRIPTOR_32 Gdt32;
6
PSEUDO_DESCRIPTOR_32 Idt32;
7
KGDTENTRY64 Gdt[PSB_GDT32_MAX + 1];
8
ULONG64 TiledCr3;
9
FAR_TARGET_32 PmTarget;
10
FAR_TARGET_32 LmIdentityTarget;
11
PVOID LmTarget;
12
PPROCESSOR_START_BLOCK SelfMap;
13
ULONG64 MsrPat;
14
ULONG64 MsrEFER;
15
KPROCESSOR_STATE ProcessorState;
16
} PROCESSOR_START_BLOCK;
Copied!
We are concerned with a sub-structure within it known as nt!_KPROCESSOR_STATE.
1
// Ref: https://www.vergiliusproject.com/kernels/x64/Windows%2010%20|%202016/2009%2020H2%20(October%202020%20Update)/_KPROCESSOR_STATE
2
//0x5c0 bytes (sizeof)
3
struct _KPROCESSOR_STATE {
4
struct _KSPECIAL_REGISTERS SpecialRegisters; //0x0
5
struct _CONTEXT ContextFrame; //0xf0
6
};
Copied!
Why is all of this important you ask?
Because we can directly fetch the contents of the CR3 register from this structure using nt!_KSPECIAL_REGISTERS.Cr3 which is a field in a sub-structure of nt!_KPROCESSOR_STATE which otherwise would not have been possible from CPL-3 as it is a privileged register.
1
// Ref: https://www.vergiliusproject.com/kernels/x64/Windows%2010%20|%202016/2009%2020H2%20(October%202020%20Update)/_KSPECIAL_REGISTERS
2
//0xf0 bytes (sizeof)
3
struct _KSPECIAL_REGISTERS {
4
ULONGLONG Cr0; //0x0
5
ULONGLONG Cr2; //0x8
6
ULONGLONG Cr3; //0x10
7
ULONGLONG Cr4; //0x18
8
ULONGLONG KernelDr0; //0x20
9
ULONGLONG KernelDr1; //0x28
10
ULONGLONG KernelDr2; //0x30
11
ULONGLONG KernelDr3; //0x38
12
ULONGLONG KernelDr6; //0x40
13
ULONGLONG KernelDr7; //0x48
14
struct _KDESCRIPTOR Gdtr; //0x50
15
struct _KDESCRIPTOR Idtr; //0x60
16
USHORT Tr; //0x70
17
USHORT Ldtr; //0x72
18
ULONG MxCsr; //0x74
19
ULONGLONG DebugControl; //0x78
20
ULONGLONG LastBranchToRip; //0x80
21
ULONGLONG LastBranchFromRip; //0x88
22
ULONGLONG LastExceptionToRip; //0x90
23
ULONGLONG LastExceptionFromRip; //0x98
24
ULONGLONG Cr8; //0xa0
25
ULONGLONG MsrGsBase; //0xa8
26
ULONGLONG MsrGsSwap; //0xb0
27
ULONGLONG MsrStar; //0xb8
28
ULONGLONG MsrLStar; //0xc0
29
ULONGLONG MsrCStar; //0xc8
30
ULONGLONG MsrSyscallMask; //0xd0
31
ULONGLONG Xcr0; //0xd8
32
ULONGLONG MsrFsBase; //0xe0
33
ULONGLONG SpecialPadding0; //0xe8
34
};
Copied!
Great! So we can get the Kernel DTB from HalpLowStub but this presents a recurring problem i.e. How do we get the HalpLowStub address when we only possess a physical read primitive till now?
Well thankfully that's all we need according to Alex Ionescu's research because on Advanced Programmable Interrupt Controller(APIC) systems on x64 running on bare metal, the Low Stub is almost always located at a fixed PA of 0x1000 unless Discard Low Memory is disabled(enabled by default) in which case Low Stub may not be at 0x1000 but it always has to be located at PA less than 0x100000(1 MB).
Discard Low Memory may be disabled using:
1
bcdedit /set firstmegabytepolicy useall
Copied!
This will change Low Stub's location from the fixed 0x1000.
Here's the WinDbg command to get Kernel DTB:
1
lkd> dt /p nt!_KPROCESSOR_STATE 0x1090 SpecialRegisters.Cr3
Copied!
_KSPECIAL_REGISTERS.Cr3
Note the PA = 0x11090 (since testing is not done on real hardware, PA != 0x1000) which denotes a fixed offset 0x90 of _KPROCESSOR_STATE from _PROCESSOR_START_BLOCK structure and the /p flag since it is a PA.
One important thing to point out here is that once again, "most" just won't cut it for mission-critical tasks, ergo, we also employ a heuristic/signature scanning technique for the HalpLowStub from the PA = 0x1000 - 0x100000 range as it is practically guaranteed to be somewhere in that range.
Heuristic Search Signature
This is not the only thing we can leak from HalpLowStub. HAL Heap VA(now randomized) can be leaked from HalpLowStub's approximately fixed physical address which proves a great avenue for KASLR bypass for REMOTE exploits.

Code

Now that we have a basic understanding of the process of finding Kernel DTB, let's implement it.
The below piece of code assumes that we have already established an arbitrary physical read by exploiting a vulnerability.
Here is a function to find Kernel DTB:
Utils.h
1
// Leak PML4 base address of Kernel from Low Stub using
2
// arbitrary physical memory read primitive
3
// ------------------------------------------------------------------------
4
5
LPVOID get_kernel_dtb(HANDLE deviceHandle) {
6
// Init some important stuff
7
DWORD offset = 0x1000; // _PROCESSOR_START_BLOCK PA on real hardware
8
DWORD limit = 0x100000; // _PROCESSOR_START_BLOCK PA maximum limit = 1 MB
9
DWORD_PTR buffer = NULL;
10
DWORD_PTR entry = NULL;
11
DWORD_PTR jmpSignature = 0x1000600E9; // JMP opcode
12
WORD cr3Offset = 0xa0; // 0x90(_PROCESSOR_START_BLOCK, _KPROCESSOR_STATE) + 0x10(_KSPECIAL_REGISTERS, Cr3)
13
DWORD_PTR pml4Base = NULL;
14
15
// Loop until we find _PROCESSOR_START_BLOCK PA by heuristic scanning and get CR3 value
16
while (offset < limit) {
17
// Read 8 bytes at PA
18
if (!read_physical_memory(deviceHandle, offset, &buffer, sizeof(buffer))) {
19
printf("[-] Unable to read physical memory to get _PROCESSOR_START_BLOCK PA!\n"); // [DBG]
20
return NULL;
21
}
22
23
// Mask some bits
24
entry = (buffer & 0xffffffffffff00ff);
25
26
// We found _PROCESSOR_START_BLOCK PA!
27
if (entry == jmpSignature) {
28
printf("[+] _PROCESSOR_START_BLOCK found at PA: 0x%X\n", offset); // [DBG]
29
30
// Read nt!_KSPECIAL_REGISTERS.Cr3 to get PML4 base for Kernel
31
if (!read_physical_memory(deviceHandle, (offset + cr3Offset), &pml4Base, sizeof(pml4Base))) {
32
printf("[-] Unable to read physical memory to get _KSPECIAL_REGISTERS.Cr3!\n"); // [DBG]
33
return NULL;
34
}
35
printf("[+] PML4 base for Kernel: 0x%X\n", pml4Base); // [DBG]
36
37
// Break out of loop since we already found Kernel DTB
38
break;
39
}
40
41
// If not found, increment offset to scan at next PA
42
offset = offset + 0x1000;
43
}
44
45
return (LPVOID)pml4Base;
46
}
Copied!
There's not much to explain in this short code snippet since I have taken the effort to comment on every line so as to prevent any confusions.
Shoutout to Ulf Frisk for pioneering this technique although ours is slightly different from the original one.
And here is the output(+ some WinDbgfu to verify our results)
Retrieving Kernel DTB
Note that this test was conducted on VirtualBox(BIOS) and our results coincide with the aforementioned values in the tables.

Links

Credits