rdmabasics

HOW KERNEL RECEIVES PACKETS — AXIOMATIC DERIVATION

YOUR CONFUSION (STATED CLEARLY)

QUESTION: WiFi chip receives packets from air.
          WiFi chip knows NOTHING about processes.
          My wget process is SLEEPING.
          How does kernel know which process gets which packet?
          There are 100,000 packets on the chip.
          How does it match them?

AXIOM 0: PRIMATE KNOWS

- Byte = 8 bits
- RAM = array of bytes
- Process = running program with PID
- Kernel = program that controls hardware

STEP 1: WHAT DOES THE WIFI CHIP ACTUALLY RECEIVE?

WiFi chip receives RADIO WAVES (electromagnetic signals in 2.4GHz or 5GHz band).

Radio wave → Antenna → Chip decodes to BYTES

These bytes are structured as a PACKET (a sequence of bytes with specific layout).

PACKET = [HEADERS] + [PAYLOAD]

Example: 100 bytes received
┌─────────────────────────────────────────────────────────────────┐
│ Byte 0-13:  Ethernet header (14 bytes)                          │
│ Byte 14-33: IP header (20 bytes)                                │
│ Byte 34-41: UDP header (8 bytes)                                │
│ Byte 42-99: Payload data (58 bytes) ← your actual message       │
└─────────────────────────────────────────────────────────────────┘

STEP 2: WHAT IS IN THE HEADERS?

The headers contain ADDRESSES that identify sender and receiver.

ETHERNET HEADER (bytes 0-13):

Bytes 0-5:   Destination MAC address (6 bytes)
Bytes 6-11:  Source MAC address (6 bytes)
Bytes 12-13: EtherType (2 bytes, 0x0800 = IPv4)

MAC = Media Access Control = 48-bit hardware address burned into NIC. YOUR WIFI CHIP HAS A MAC: e.g., 70:66:55:b1:9a:8d

IP HEADER (bytes 14-33):

Bytes 14:     Version + header length
...
Bytes 26-29:  Source IP address (4 bytes)
Bytes 30-33:  Destination IP address (4 bytes)

IP = Internet Protocol = 32-bit address identifying machine. Example: 192.168.1.100

UDP HEADER (bytes 34-41):

Bytes 34-35:  Source port (2 bytes)
Bytes 36-37:  Destination port (2 bytes) ← THIS IS THE KEY
Bytes 38-39:  Length (2 bytes)
Bytes 40-41:  Checksum (2 bytes)

PORT = 16-bit number (0-65535) identifying WHICH PROGRAM on the machine.


STEP 3: THE KEY INSIGHT — PORT IDENTIFIES PROCESS

WIFI CHIP: "I received bytes. Destination IP = 192.168.1.100, Destination Port = 8080"

KERNEL: "Let me check: which process is listening on port 8080?"
        → Looks up in hash table
        → Finds: PID 12345 (wget) has a socket bound to port 8080
        → Queues packet to that socket

THE WIFI CHIP DOES NOT KNOW ABOUT PROCESSES. THE KERNEL KNOWS. KERNEL MAINTAINS THE MAPPING.


STEP 4: HOW DOES KERNEL GET THE PACKET FROM CHIP?

TWO MECHANISMS:

MECHANISM A: INTERRUPT + PIO (old, slow)

1. Packet arrives at NIC
2. NIC raises INTERRUPT (electrical signal to CPU)
3. CPU stops current task, jumps to INTERRUPT HANDLER
4. Interrupt handler reads bytes from NIC register one-by-one
5. Copies bytes to RAM

MECHANISM B: DMA (modern, fast)

1. Kernel preallocates RAM buffer (RX ring buffer)
2. Kernel tells NIC: "Here is physical address 0x12340000, write packets here"
3. Packet arrives at NIC
4. NIC writes packet directly to RAM[0x12340000] using DMA
5. NIC raises interrupt: "I wrote something"
6. Kernel reads from RAM[0x12340000], NOT from NIC

DMA = Direct Memory Access = NIC writes to RAM without CPU involvement.

YOUR MACHINE USES DMA. That’s why NIC is fast.


STEP 5: THE RX RING BUFFER

Kernel allocates a RING of buffers (circular array):

RX Ring (example: 256 entries):
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ [0] │ [1] │ [2] │ [3] │ [4] │ ... │[254]│[255]│ [0] │ ← wraps around
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘

Each entry = { physical_address, length, status }

Entry[0].physical_address = 0x12340000 (points to 2KB buffer in RAM)
Entry[1].physical_address = 0x12340800
Entry[2].physical_address = 0x12341000
...

NIC writes to Entry[next_index].physical_address, increments next_index. Kernel reads from entries, processes packets, recycles buffers.


STEP 6: WHAT HAPPENS WHEN PACKET ARRIVES (TIMELINE)

Time T+0:    Packet radio wave hits antenna
Time T+1:    NIC decodes to bytes → 100 bytes
Time T+2:    NIC DMAs bytes to RAM[0x12340000]
Time T+3:    NIC sets Entry[0].status = PACKET_READY
Time T+4:    NIC raises interrupt (IRQ)
Time T+5:    CPU stops whatever it was doing
Time T+6:    CPU jumps to NIC interrupt handler (kernel code)
Time T+7:    Handler reads RAM[0x12340000], parses headers
Time T+8:    Handler sees: dest_ip=192.168.1.100, dest_port=8080
Time T+9:    Handler calls socket lookup: find_socket(port=8080)
Time T+10:   Kernel finds: socket_fd=5, owned by PID 12345
Time T+11:   Kernel allocates sk_buff, copies data from RX buffer
Time T+12:   Kernel queues sk_buff to socket's receive queue
Time T+13:   If process is sleeping in recv(), kernel WAKES it up
Time T+14:   Interrupt handler returns, CPU resumes previous task

STEP 7: THE SOCKET LOOKUP TABLE

Kernel maintains a hash table:

Hash table key = (destination IP, destination port, source IP, source port)
Hash table value = pointer to struct sock (socket structure)

sock_hash_table[hash(127.0.0.1, 9999, 0, 0)] = socket_ptr
                      ↑           ↑
                    dest IP    dest port

This is O(1) lookup. Even with 100,000 sockets, finding the right one is fast.

FUNCTION: __udp4_lib_lookup() in net/ipv4/udp.c


STEP 8: YOUR wget EXAMPLE — STEP BY STEP

SCENARIO:
1. You run: wget http://example.com/file.txt
2. wget creates TCP socket, connects to example.com:80
3. wget sends HTTP request
4. wget calls recv() → BLOCKS (process SLEEPS)
5. CPU does other things (you browse, etc.)
6. Server responds (packet travels through internet)
7. Packet arrives at your WiFi chip

WHAT HAPPENS AT STEP 7:

(A) Packet in the air → NIC antenna
(B) NIC decodes: dest_ip=192.168.1.100, dest_port=54321
    (54321 is wget's ephemeral port, assigned by kernel when wget called connect())
(C) NIC DMAs to RX ring buffer
(D) NIC raises interrupt
(E) Kernel interrupt handler runs
(F) Kernel parses: "This packet is for port 54321"
(G) Kernel looks up: sock_hash_table[hash(192.168.1.100, 54321, ...)]
    → Finds wget's socket
(H) Kernel queues sk_buff to wget's socket receive queue
(I) Kernel sees: wget is sleeping in recv()
(J) Kernel calls wake_up(wget_task)
(K) wget process becomes RUNNABLE
(L) Scheduler eventually switches to wget
(M) wget's recv() returns with data

STEP 9: WHERE IS CR3 IN ALL THIS?

CR3 = Control Register 3 = holds physical address of page table root.

Each process has its own page table (VA→PA mapping).
CR3 is changed when CPU switches between processes.

DURING INTERRUPT HANDLING:

WHEN COPYING TO USER BUFFER:


STEP 10: HOW KERNEL MATCHES PACKET TO PROCESS — SUMMARY

STEP  ACTION                                WHO DOES IT        USES CR3?
────  ─────────────────────────────────────  ─────────────────  ─────────
1     Radio → bytes                          NIC hardware       No
2     DMA to RAM                             NIC hardware       No (uses PA directly)
3     Parse headers                          Kernel IRQ         No (kernel space)
4     Lookup socket                          Kernel (hash)      No (kernel space)
5     Queue to socket                        Kernel             No (kernel space)
6     Wake up process                        Kernel scheduler   No
7     Context switch to process              Scheduler          YES (loads CR3)
8     Copy data to user buffer               Kernel             YES (process VA→PA)

KEY INSIGHT:


STEP 11: WHAT IF 100,000 PACKETS ARRIVE?

1. NIC writes 256 packets to RX ring (ring full)
2. NIC raises interrupt after batch (NAPI: interrupt + polling)
3. Kernel runs softirq/NAPI poll
4. Kernel processes 256 packets in one batch:
   - Parse header
   - Socket lookup
   - Queue to appropriate sockets
5. Kernel recycles RX ring buffers
6. NIC can continue writing

Modern NICs have multiple RX queues (RSS: Receive Side Scaling). Each CPU core processes its own queue in parallel.


STEP 12: WHY DOESN’T PACKET GO DIRECTLY TO USER BUFFER? (RDMA DOES THIS)

NORMAL NETWORKING:

NIC → RX Ring → sk_buff → Socket Queue → User Buffer
                   ↑                        ↑
                COPY #3                   COPY #4

RDMA:

NIC → User Buffer (directly, via DMA)
       ↑
    NO COPY

RDMA requires:

  1. User pins their buffer in RAM (no page-out)
  2. Kernel tells NIC the physical address
  3. NIC DMAs directly to user buffer
  4. No kernel involvement during data transfer

This is why RDMA is faster: zero CPU copies.


NEW THINGS INTRODUCED WITHOUT DERIVATION: NONE

Every term traced:


AXIOM BLOCK 26: KERNEL SOURCE PROOF — SOCKET LOOKUP (2026-01-06)

CLAIM: kernel matches dest_port to process via hash table lookup in O(1) time using jhash_1word() on (IP, port) tuple, kernel source file: /home/r/Desktop/learn_kernel/source/net/ipv4/udp.c line 528-589, hash function file: /home/r/Desktop/learn_kernel/source/include/linux/jhash.h line 171-174, proof follows:

STEP 1: open file /home/r/Desktop/learn_kernel/source/net/ipv4/udp.c in editor → go to line 528 → read function signature struct sock *__udp4_lib_lookup(const struct net *net, __be32 saddr, __be16 sport, __be32 daddr, __be16 dport, int dif, int sdif, struct udp_table *udptable, struct sk_buff *skb) → count arguments: 9 → input: net=kernel namespace, saddr=source IP (4 bytes), sport=source port (2 bytes), daddr=dest IP (4 bytes), dport=dest port (2 bytes), dif=device index, sdif=slave device, udptable=hash table, skb=packet → output: struct sock * = pointer to socket structure or NULL

STEP 2: go to line 532 → read unsigned short hnum = ntohs(dport) → ntohs = network-to-host-short = swap bytes if needed → dport is in network order (big endian) → hnum is in host order (little endian on x86) → example: dport = 0x0F27 (network order bytes 0x0F, 0x27) → ntohs(0x0F27) = 0x270F = 9999 decimal → calculation: 0x27 × 256 + 0x0F = 39 × 256 + 15 = 9984 + 15 = 9999 ✓ → hnum = 9999

STEP 3: go to line 537 → read hash2 = ipv4_portaddr_hash(net, daddr, hnum) → open file /home/r/Desktop/learn_kernel/source/include/net/ip.h line 696-701 → read function body: return jhash_1word((__force u32)saddr, net_hash_mix(net)) ^ port → inputs: saddr = daddr = destination IP = 0x7F000001 (127.0.0.1 in hex), port = hnum = 9999 → jhash_1word hashes the IP with a per-namespace random seed → ^ port XORs result with port number → hash2 = some 32-bit value

STEP 4: go to line 538 → read slot2 = hash2 & udptable->mask → mask = table_size - 1 → if table_size = 256, mask = 0xFF → slot2 = hash2 AND 0xFF → example: hash2 = 0x12345678, mask = 0xFF → slot2 = 0x78 = 120 decimal → slot2 is index into hash table array

STEP 5: go to line 539 → read hslot2 = &udptable->hash2[slot2] → hslot2 = pointer to hash bucket at index slot2 → hash bucket = linked list of sockets that hash to same slot → this is where collision handling happens

STEP 6: go to line 542-544 → read result = udp4_lib_lookup2(net, saddr, sport, daddr, hnum, dif, sdif, hslot2, skb) → this searches the bucket for matching socket → match criteria: socket’s bound address matches daddr AND socket’s bound port matches hnum → if found: return pointer to struct sock → if not found: return NULL

STEP 7: open file /home/r/Desktop/learn_kernel/source/include/linux/jhash.h line 171-174 → read function: static inline u32 jhash_1word(u32 a, u32 initval) { return __jhash_nwords(a, 0, 0, initval + JHASH_INITVAL + (1 << 2)); } → JHASH_INITVAL = 0xdeadbeef (line 58) → calculation: initval + 0xdeadbeef + 4 → __jhash_nwords performs mixing using __jhash_final macro (lines 46-55) → output: 32-bit hash value with good distribution

STEP 8: numerical example: receiver binds to port 9999 on 127.0.0.1 → when receiver calls bind(fd, {AF_INET, 9999, 127.0.0.1}) → kernel calls __udp_lib_get_port() → kernel inserts socket into udptable at slot = ipv4_portaddr_hash(net, 127.0.0.1, 9999) & mask → later packet arrives with dest_port=9999 → kernel calls __udp4_lib_lookup() → computes same hash → finds socket in same slot → returns pointer → socket contains: sk->sk_socket->file->f_owner->pid = receiver PID

STEP 9: verify O(1): hash computation = constant time (fixed number of XOR, rotate, add) → array access = constant time → bucket search = O(k) where k = collision count → with good hash, k ≈ 1 on average → ∴ lookup = O(1) amortized

STEP 10: struct sock to PID trace: struct sock at /home/r/Desktop/learn_kernel/source/include/net/sock.h → sock->sk_socket (type struct socket *) → socket->file (type struct file *) → file->f_owner.pid (type struct pid *) → kernel can get PID value → but kernel doesn’t need PID to queue data → kernel queues sk_buff to sock->sk_receive_queue directly → process sleeping in recv() is woken via wake_up_interruptible(sk_sleep(sk))

CALCULATION PROOF — HASH FOR PORT 9999 ON 127.0.0.1:

IP = 127.0.0.1 = 0x7F000001 (hex)
port = 9999 = 0x270F (hex)
net_hash_mix(net) = random per-namespace seed, assume = 0x12345678

jhash_1word(0x7F000001, 0x12345678):
  initval = 0x12345678
  a = 0x7F000001
  b = 0
  c = 0
  initval' = initval + 0xdeadbeef + 4 = 0x12345678 + 0xdeadbeef + 4 = 0xF1234571 (ignoring overflow)
  a += initval' → a = 0x7F000001 + 0xF1234571 = 0x70234572
  b += initval' → b = 0xF1234571
  c += initval' → c = 0xF1234571
  __jhash_final(a, b, c) → many XOR/rotate/subtract → final c = some 32-bit value
  → assume c = 0xABCD1234

hash2 = 0xABCD1234 ^ 9999 = 0xABCD1234 ^ 0x270F = 0xABCD353B

If table_size = 256:
  mask = 255 = 0xFF
  slot2 = 0xABCD353B & 0xFF = 0x3B = 59 decimal

∴ socket for port 9999 on 127.0.0.1 is in bucket 59 of hash table

CODE TO VERIFY ON YOUR MACHINE:

// file: hash_test.c — compile with: gcc -o hash_test hash_test.c
#include <stdio.h>
#include <stdint.h>

#define JHASH_INITVAL 0xdeadbeef
#define rol32(x, n) (((x) << (n)) | ((x) >> (32-(n))))

#define __jhash_final(a, b, c) \
{ \
    c ^= b; c -= rol32(b, 14); \
    a ^= c; a -= rol32(c, 11); \
    b ^= a; b -= rol32(a, 25); \
    c ^= b; c -= rol32(b, 16); \
    a ^= c; a -= rol32(c, 4);  \
    b ^= a; b -= rol32(a, 14); \
    c ^= b; c -= rol32(b, 24); \
}

uint32_t jhash_1word(uint32_t a, uint32_t initval) {
    uint32_t b = 0, c = 0;
    a += initval + JHASH_INITVAL + 4;
    b += initval + JHASH_INITVAL + 4;
    c += initval + JHASH_INITVAL + 4;
    __jhash_final(a, b, c);
    return c;
}

int main() {
    uint32_t ip = 0x7F000001;  // 127.0.0.1
    uint16_t port = 9999;
    uint32_t net_seed = 0;  // assume 0 for init_net
    
    uint32_t hash = jhash_1word(ip, net_seed) ^ port;
    uint32_t slot = hash & 0xFF;  // assume 256-entry table
    
    printf("IP:       0x%08X (127.0.0.1)\n", ip);
    printf("Port:     %u (0x%04X)\n", port, port);
    printf("Hash:     0x%08X\n", hash);
    printf("Slot:     %u (table index)\n", slot);
    return 0;
}

RUN: gcc -o hash_test hash_test.c && ./hash_test → see actual slot number for port 9999


VIOLATIONS CHECK: