← Back to all posts
#04

Tracing Stack Crash with Real Data: GPF vs. Page Fault

May 2026

The Premise

Returning a stack pointer from a C/C++ function is a classic undefined behavior bug. When the function returns, its frame is destroyed. If you call another function, it overwrites the stack memory. When the caller tries to dereference that corrupted pointer, a crash occurs. But exactly what does the silicon and the kernel do in this scenario? We set out to find out using real kernel probes (kprobes).

The Bug: Automatic Variable Escape

The culprit resides in lastN:

char *lastN(const char *str, int n) {
    ...
    char result[n + 1];
    strcpy(result, str + (len - n));
    char * volatile ptr = result;
    return ptr;
}
    

Here, result is a local, stack-allocated variable in lastN's stack frame. When the function returns, its stack frame is marked as invalid. However, the address of result is returned to the caller main. The use of volatile and intermediate variable ptr are decoy patterns that suppress compiler warnings about escaping local stack addresses, but they do not change the core undefined behavior.

The Proof using GDB

We ran the program inside GDB, setting breakpoints before and after the stack-corrupting garbage_stack() function call to trace the exact state of the memory:

(gdb) # Breakpoint 2, main() before garbage_stack()
(gdb) print res
$1 = 0x7fffffffde10 "efgh"
(gdb) x/12sb res
0x7fffffffde10:	"efgh"
0x7fffffffde15:	""

(gdb) # Breakpoint 3, main() after garbage_stack()
(gdb) print res
$2 = 0x7fffffffde10 'U' <repeats 80 times>, "p\336\377\377\377\177"
    

The Real Culprit: The string pointer str + (len - n) into argv[1] was perfectly valid! But returning result's stack address meant that after garbage_stack() reallocated and overwrote the stack frame with 0x55 ('U'), the memory at 0x7fffffffde10 was completely populated with 'U's and had no null terminator.

When printf("%s", res) was called, it did not stop at a null terminator because none existed. Instead, it continued reading down the stack past the buffer size looking for a \0 null byte. If it walks off the page into unmapped memory, a segfault is triggered.

Silicon Mechanics: GPF vs. Page Fault

We traced two different modes of pointer corruption crashes to see how the CPU silicon reacted.

Case A: Non-Canonical Address Corruption (GPF)

If the pointer address itself gets mangled to a non-canonical address like 0x5555555555555555, the CPU MMU detects the violation of virtual address sign-extension rules and generates a General Protection Fault (GPF, Interrupt 13). The kernel logs a GPF trap direct from silicon:

[ 3250.873323] traps: buggy[238340] general protection fault ip:760bd5d8b7dd sp:7ffc3ffad408 error:0 in libc.so.6
    

Case B: Canonical Unmapped Address Corruption (Page Fault)

If the pointer points to a canonical but unmapped address (like `0x123456789000`), the CPU MMU attempts a page table walk. It fails (PTE not present) and triggers a Page Fault (PF, Interrupt 14). This routes to the kernel's Page Fault exception handler and issues a force_sig_fault, captured here by our kprobe driver:

[ 3278.126297] buggy[238451]: segfault at 123456789000 ip 00007229c558b7dd sp 00007ffc38c6a4f8 error 4 in libc.so.6
[ 3278.126331] KPROBE CRASH: force_sig_fault triggered by 'buggy' (pid: 238451)!
[ 3278.126336]   Signal: 11, Code: 1 (SEGV_MAPERR), Faulting Address: 0x123456789000
    
Key Discovery: The true source of the segmentation fault is the escape of the local automatic stack variable. Once returned, subsequent function calls overwrite it. printf's null byte search then overflows the invalid memory, leading to either arbitrary stack leakage or a page-boundary-crossing segmentation fault.

GitHub Sources & Verification Logs

To inspect the raw logic, code drivers, and trace results, navigate to the hosted github pages assets below:

Source Code
Proof Logs