================================================================================
STAGE 2: KERNEL SPACE - TRACING getname()
================================================================================
PREREQUISITE: You completed Stage 1 (user space to syscall boundary).
Now we cross into KERNEL SPACE.
SOURCE CODE:
kernel/drivers/arg2_filename/trace_filename.c
MACHINE CONFIGURATION
---------------------
OS: Ubuntu 24.04.3 LTS
Kernel: 6.14.0-37-generic
Source: /usr/src/linux-source-6.8.0
================================================================================
AXIOM 0: THE CALL TRANSFORMATION
================================================================================
User Space:
open("somefile", O_RDWR)
Libc transforms this (glibc/sysdeps/unix/sysv/linux/open.c):
return __libc_openat(AT_FDCWD, file, oflag, mode);
AT_FDCWD = -100 (include/uapi/linux/fcntl.h:96)
Kernel receives:
openat(dfd=-100, filename=0x7ffe..., flags=0x2, mode=0)
ARG1 ARG2 ARG3 ARG4
================================================================================
AXIOM 1: WHY TRACE getname() FIRST?
================================================================================
SOURCE: fs/open.c:1388-1414
static long do_sys_openat2(int dfd, const char __user *filename,
struct open_how *how)
{
struct open_flags op;
int fd = build_open_flags(how, &op); // Line 1392
struct filename *tmp;
if (fd)
return fd;
tmp = getname(filename); // Line 1398 FIRST
if (IS_ERR(tmp))
return PTR_ERR(tmp); // Line 1400 GATE
fd = get_unused_fd_flags(how->flags); // Line 1402 SECOND
if (fd >= 0) {
struct file *f = do_filp_open(...);// Line 1404 THIRD
...
fd_install(fd, f); // Line 1409 FOURTH
}
putname(tmp); // Line 1412 FREE
return fd;
}
DERIVATION:
Line 1398 < Line 1402 < Line 1404 < Line 1409
getname() is FIRST function using filename argument
If getname() fails at Line 1399 -> return immediately
getname() is the GATE function
================================================================================
AXIOM 2: WHAT DOES getname() DO?
================================================================================
ONE LINE:
getname(user_ptr) -> copies string from user space to kernel buffer,
returns struct filename *
SOURCE: fs/namei.c:216-220
struct filename *
getname(const char __user * filename)
{
return getname_flags(filename, 0, NULL);
}
AXIOM 2.1: WHY getname_flags()?
getname() is a wrapper. It passes 0 for flags.
SOURCE: fs/namei.c:196
if (!(flags & LOOKUP_EMPTY)) {
return ERR_PTR(-ENOENT);
}
Logic:
If flags is 0: Empty filename ("") -> returns -ENOENT (Error).
If flags has LOOKUP_EMPTY: Empty filename allowed.
open() forbids empty filenames.
AXIOM 3: STRUCT FILENAME LAYOUT
================================================================================
SOURCE: include/linux/fs.h:2554-2560
struct filename {
const char *name; // +0x00 (8 bytes) Pointer to pathname
const char __user *uptr; // +0x08 (8 bytes) Original user pointer
atomic_t refcnt; // +0x10 (4 bytes) Reference count
struct audit_names *aname; // +0x18 (8 bytes) Audit
const char iname[]; // +0x20 (flexible) Embedded pathname
};
sizeof(struct filename) = 32 bytes = 0x20
offsetof(iname) = 0x20 = 32
AXIOM 3.1: WHAT ARE uptr AND aname?
uptr (User Pointer):
Stores the ORIGINAL user-space address (0x7ffe...).
Purpose: Audit logs need to know WHERE the string came from.
aname (Audit Name):
Starts as NULL (Line 203).
Purpose: Link to audit record if auditing is enabled.
================================================================================
AXIOM 4: SLAB ALLOCATION
================================================================================
SOURCE: include/linux/fs.h:2574
#define __getname() kmem_cache_alloc(names_cachep, GFP_KERNEL)
SOURCE: fs/dcache.c:3372-3373
names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0, ...);
SOURCE: include/uapi/linux/limits.h:13
#define PATH_MAX 4096
PROOF:
$ sudo cat /proc/slabinfo | grep names
names_cache 160 160 4096 8 8
4096 bytes per object = PATH_MAX
================================================================================
AXIOM 5: TWO-PHASE ALLOCATION
================================================================================
Phase 1 (Short paths < 4064 bytes):
String embedded in iname[] at offset 0x20
name pointer points to iname[]
Single 4096-byte allocation
Phase 2 (Long paths >= 4064 bytes):
Old allocation reused for string only
New 33-byte struct allocated via kzalloc
name pointer points to old buffer
MEMORY LAYOUT (Phase 1 - our case):
+-------------------------------------------------------+
| 0xffff898345cef000: struct filename (32 bytes) |
| +0x00 name = 0xffff898345cef020 ----------------------+
| +0x08 uptr = 0x0000604075144004 | |
| +0x10 refcnt = 1 | |
| +0x18 aname = NULL | |
| +0x20 iname = "somefile\0" <--------------------------+
+-------------------------------------------------------+
================================================================================
AXIOM 6: ERROR PATHS (6 TOTAL)
================================================================================
ERROR 1: Line 142: return ERR_PTR(-ENOMEM) // slab alloc failed
ERROR 2: Line 154: return ERR_PTR(-EFAULT) // bad user pointer (phase 1)
ERROR 3: Line 175: return ERR_PTR(-ENOMEM) // kzalloc failed (phase 2)
ERROR 4: Line 182: return ERR_PTR(-EFAULT) // bad user pointer (phase 2)
ERROR 5: Line 187: return ERR_PTR(-ENAMETOOLONG) // path >= 4096
ERROR 6: Line 198: return ERR_PTR(-ENOENT) // empty path
ERROR CODE SUMMARY:
-2 ENOENT empty path
-12 ENOMEM no memory
-14 EFAULT bad user pointer
-36 ENAMETOOLONG path >= 4096 bytes
================================================================================
AXIOM 7: REGISTER MAPPING (x86_64)
================================================================================
Entry to getname():
RDI = user pointer (arg1)
Exit from getname():
RAX = struct filename * (success)
RAX = error pointer (failure, check with IS_ERR)
IS_ERR(x) = true if x >= 0xfffffffffffff001 (last 4KB of address space)
PTR_ERR(x) = (long)x = negative error code
================================================================================
CAPTURED TRACE DATA
================================================================================
From dmesg after running trace_filename.ko + minimal_open:
[25759.220072] #2. getname. SUCCESS.
[25759.220077] user_ptr (RDI saved) = 0x0000604075144004
[25759.220083] fn (RAX) = 0xffff898345cef000
[25759.220088] fn->name (+0x00) = 0xffff898345cef020
[25759.220092] fn->uptr (+0x08) = 0x0000604075144004
[25759.220096] string = "somefile"
[25759.220100] [OK] AXIOM: fn->uptr == saved RDI
================================================================================
VERIFICATION MATH
================================================================================
fn = 0xffff898345cef000
fn + 0x20 = 0xffff898345cef020
fn->name = 0xffff898345cef020
0xffff898345cef020 == 0xffff898345cef000 + 0x20
-> name points to iname[] (embedded buffer)
-> This is Phase 1 (short path)
fn->uptr = 0x0000604075144004
saved RDI = 0x0000604075144004
fn->uptr == saved RDI
-> Original user pointer preserved correctly
User pointer: 0x0000604075144004
Bits 63-48 = 0x0000 (user canonical)
< 0x0000800000000000 (user space limit)
-> Valid user address
Kernel pointer: 0xffff898345cef000
Bits 63-48 = 0xffff (kernel canonical)
> 0xffff800000000000 (kernel space start)
-> Valid kernel address
================================================================================
CALL CHAIN
================================================================================
User: open("somefile", O_RDWR)
|
Libc: syscall(SYS_openat, AT_FDCWD, "somefile", O_RDWR, 0) RAX=257
|
Kernel: entry_SYSCALL_64 (arch/x86/entry/entry_64.S:91)
|
Kernel: __x64_sys_openat
|
Kernel: do_sys_open (fs/open.c:1416)
|
Kernel: do_sys_openat2 (fs/open.c:1388)
|
Kernel: getname (fs/namei.c:217) <-- WE TRACED THIS
|
Kernel: getname_flags (fs/namei.c:130)
|
+-- __getname() (slab allocation)
|
+-- strncpy_from_user() (copy user string)
|
Return: RAX = 0xffff898345cef000
================================================================================
REFCNT LIFECYCLE
================================================================================
getname(): atomic_set(&result->refcnt, 1); // refcnt = 1
putname(): atomic_dec_and_test() // refcnt-- -> if 0, free
In do_sys_openat2():
Line 1398: tmp = getname() -> refcnt = 1
Line 1404: do_filp_open(tmp) -> refcnt = 1 (not incremented)
Line 1412: putname(tmp) -> refcnt = 0 -> FREE
================================================================================
CONTEXT REQUIREMENTS
================================================================================
strncpy_from_user() may sleep (page fault)
kzalloc(size, GFP_KERNEL) may sleep
-> getname() MUST be called in process context
-> Cannot be called from interrupt or softirq
================================================================================