QUESTION: WiFi chip receives packets from air.
WiFi chip knows NOTHING about processes.
My wget process is SLEEPING.
How does kernel know which process gets which packet?
There are 100,000 packets on the chip.
How does it match them?
- Byte = 8 bits
- RAM = array of bytes
- Process = running program with PID
- Kernel = program that controls hardware
WiFi chip receives RADIO WAVES (electromagnetic signals in 2.4GHz or 5GHz band).
Radio wave → Antenna → Chip decodes to BYTES
These bytes are structured as a PACKET (a sequence of bytes with specific layout).
PACKET = [HEADERS] + [PAYLOAD]
Example: 100 bytes received
┌─────────────────────────────────────────────────────────────────┐
│ Byte 0-13: Ethernet header (14 bytes) │
│ Byte 14-33: IP header (20 bytes) │
│ Byte 34-41: UDP header (8 bytes) │
│ Byte 42-99: Payload data (58 bytes) ← your actual message │
└─────────────────────────────────────────────────────────────────┘
The headers contain ADDRESSES that identify sender and receiver.
Bytes 0-5: Destination MAC address (6 bytes)
Bytes 6-11: Source MAC address (6 bytes)
Bytes 12-13: EtherType (2 bytes, 0x0800 = IPv4)
MAC = Media Access Control = 48-bit hardware address burned into NIC. YOUR WIFI CHIP HAS A MAC: e.g., 70:66:55:b1:9a:8d
Bytes 14: Version + header length
...
Bytes 26-29: Source IP address (4 bytes)
Bytes 30-33: Destination IP address (4 bytes)
IP = Internet Protocol = 32-bit address identifying machine. Example: 192.168.1.100
Bytes 34-35: Source port (2 bytes)
Bytes 36-37: Destination port (2 bytes) ← THIS IS THE KEY
Bytes 38-39: Length (2 bytes)
Bytes 40-41: Checksum (2 bytes)
PORT = 16-bit number (0-65535) identifying WHICH PROGRAM on the machine.
WIFI CHIP: "I received bytes. Destination IP = 192.168.1.100, Destination Port = 8080"
KERNEL: "Let me check: which process is listening on port 8080?"
→ Looks up in hash table
→ Finds: PID 12345 (wget) has a socket bound to port 8080
→ Queues packet to that socket
THE WIFI CHIP DOES NOT KNOW ABOUT PROCESSES. THE KERNEL KNOWS. KERNEL MAINTAINS THE MAPPING.
TWO MECHANISMS:
1. Packet arrives at NIC
2. NIC raises INTERRUPT (electrical signal to CPU)
3. CPU stops current task, jumps to INTERRUPT HANDLER
4. Interrupt handler reads bytes from NIC register one-by-one
5. Copies bytes to RAM
1. Kernel preallocates RAM buffer (RX ring buffer)
2. Kernel tells NIC: "Here is physical address 0x12340000, write packets here"
3. Packet arrives at NIC
4. NIC writes packet directly to RAM[0x12340000] using DMA
5. NIC raises interrupt: "I wrote something"
6. Kernel reads from RAM[0x12340000], NOT from NIC
DMA = Direct Memory Access = NIC writes to RAM without CPU involvement.
YOUR MACHINE USES DMA. That’s why NIC is fast.
Kernel allocates a RING of buffers (circular array):
RX Ring (example: 256 entries):
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ [0] │ [1] │ [2] │ [3] │ [4] │ ... │[254]│[255]│ [0] │ ← wraps around
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
Each entry = { physical_address, length, status }
Entry[0].physical_address = 0x12340000 (points to 2KB buffer in RAM)
Entry[1].physical_address = 0x12340800
Entry[2].physical_address = 0x12341000
...
NIC writes to Entry[next_index].physical_address, increments next_index.
Kernel reads from entries, processes packets, recycles buffers.
Time T+0: Packet radio wave hits antenna
Time T+1: NIC decodes to bytes → 100 bytes
Time T+2: NIC DMAs bytes to RAM[0x12340000]
Time T+3: NIC sets Entry[0].status = PACKET_READY
Time T+4: NIC raises interrupt (IRQ)
Time T+5: CPU stops whatever it was doing
Time T+6: CPU jumps to NIC interrupt handler (kernel code)
Time T+7: Handler reads RAM[0x12340000], parses headers
Time T+8: Handler sees: dest_ip=192.168.1.100, dest_port=8080
Time T+9: Handler calls socket lookup: find_socket(port=8080)
Time T+10: Kernel finds: socket_fd=5, owned by PID 12345
Time T+11: Kernel allocates sk_buff, copies data from RX buffer
Time T+12: Kernel queues sk_buff to socket's receive queue
Time T+13: If process is sleeping in recv(), kernel WAKES it up
Time T+14: Interrupt handler returns, CPU resumes previous task
Kernel maintains a hash table:
Hash table key = (destination IP, destination port, source IP, source port)
Hash table value = pointer to struct sock (socket structure)
sock_hash_table[hash(127.0.0.1, 9999, 0, 0)] = socket_ptr
↑ ↑
dest IP dest port
This is O(1) lookup. Even with 100,000 sockets, finding the right one is fast.
FUNCTION: __udp4_lib_lookup() in net/ipv4/udp.c
SCENARIO:
1. You run: wget http://example.com/file.txt
2. wget creates TCP socket, connects to example.com:80
3. wget sends HTTP request
4. wget calls recv() → BLOCKS (process SLEEPS)
5. CPU does other things (you browse, etc.)
6. Server responds (packet travels through internet)
7. Packet arrives at your WiFi chip
WHAT HAPPENS AT STEP 7:
(A) Packet in the air → NIC antenna
(B) NIC decodes: dest_ip=192.168.1.100, dest_port=54321
(54321 is wget's ephemeral port, assigned by kernel when wget called connect())
(C) NIC DMAs to RX ring buffer
(D) NIC raises interrupt
(E) Kernel interrupt handler runs
(F) Kernel parses: "This packet is for port 54321"
(G) Kernel looks up: sock_hash_table[hash(192.168.1.100, 54321, ...)]
→ Finds wget's socket
(H) Kernel queues sk_buff to wget's socket receive queue
(I) Kernel sees: wget is sleeping in recv()
(J) Kernel calls wake_up(wget_task)
(K) wget process becomes RUNNABLE
(L) Scheduler eventually switches to wget
(M) wget's recv() returns with data
CR3 = Control Register 3 = holds physical address of page table root.
Each process has its own page table (VA→PA mapping).
CR3 is changed when CPU switches between processes.
DURING INTERRUPT HANDLING:
WHEN COPYING TO USER BUFFER:
STEP ACTION WHO DOES IT USES CR3?
──── ───────────────────────────────────── ───────────────── ─────────
1 Radio → bytes NIC hardware No
2 DMA to RAM NIC hardware No (uses PA directly)
3 Parse headers Kernel IRQ No (kernel space)
4 Lookup socket Kernel (hash) No (kernel space)
5 Queue to socket Kernel No (kernel space)
6 Wake up process Kernel scheduler No
7 Context switch to process Scheduler YES (loads CR3)
8 Copy data to user buffer Kernel YES (process VA→PA)
KEY INSIGHT:
1. NIC writes 256 packets to RX ring (ring full)
2. NIC raises interrupt after batch (NAPI: interrupt + polling)
3. Kernel runs softirq/NAPI poll
4. Kernel processes 256 packets in one batch:
- Parse header
- Socket lookup
- Queue to appropriate sockets
5. Kernel recycles RX ring buffers
6. NIC can continue writing
Modern NICs have multiple RX queues (RSS: Receive Side Scaling). Each CPU core processes its own queue in parallel.
NORMAL NETWORKING:
NIC → RX Ring → sk_buff → Socket Queue → User Buffer
↑ ↑
COPY #3 COPY #4
RDMA:
NIC → User Buffer (directly, via DMA)
↑
NO COPY
RDMA requires:
This is why RDMA is faster: zero CPU copies.
Every term traced:
CLAIM: kernel matches dest_port to process via hash table lookup in O(1) time using jhash_1word() on (IP, port) tuple, kernel source file: /home/r/Desktop/learn_kernel/source/net/ipv4/udp.c line 528-589, hash function file: /home/r/Desktop/learn_kernel/source/include/linux/jhash.h line 171-174, proof follows:
STEP 1: open file /home/r/Desktop/learn_kernel/source/net/ipv4/udp.c in editor → go to line 528 → read function signature struct sock *__udp4_lib_lookup(const struct net *net, __be32 saddr, __be16 sport, __be32 daddr, __be16 dport, int dif, int sdif, struct udp_table *udptable, struct sk_buff *skb) → count arguments: 9 → input: net=kernel namespace, saddr=source IP (4 bytes), sport=source port (2 bytes), daddr=dest IP (4 bytes), dport=dest port (2 bytes), dif=device index, sdif=slave device, udptable=hash table, skb=packet → output: struct sock * = pointer to socket structure or NULL
STEP 2: go to line 532 → read unsigned short hnum = ntohs(dport) → ntohs = network-to-host-short = swap bytes if needed → dport is in network order (big endian) → hnum is in host order (little endian on x86) → example: dport = 0x0F27 (network order bytes 0x0F, 0x27) → ntohs(0x0F27) = 0x270F = 9999 decimal → calculation: 0x27 × 256 + 0x0F = 39 × 256 + 15 = 9984 + 15 = 9999 ✓ → hnum = 9999
STEP 3: go to line 537 → read hash2 = ipv4_portaddr_hash(net, daddr, hnum) → open file /home/r/Desktop/learn_kernel/source/include/net/ip.h line 696-701 → read function body: return jhash_1word((__force u32)saddr, net_hash_mix(net)) ^ port → inputs: saddr = daddr = destination IP = 0x7F000001 (127.0.0.1 in hex), port = hnum = 9999 → jhash_1word hashes the IP with a per-namespace random seed → ^ port XORs result with port number → hash2 = some 32-bit value
STEP 4: go to line 538 → read slot2 = hash2 & udptable->mask → mask = table_size - 1 → if table_size = 256, mask = 0xFF → slot2 = hash2 AND 0xFF → example: hash2 = 0x12345678, mask = 0xFF → slot2 = 0x78 = 120 decimal → slot2 is index into hash table array
STEP 5: go to line 539 → read hslot2 = &udptable->hash2[slot2] → hslot2 = pointer to hash bucket at index slot2 → hash bucket = linked list of sockets that hash to same slot → this is where collision handling happens
STEP 6: go to line 542-544 → read result = udp4_lib_lookup2(net, saddr, sport, daddr, hnum, dif, sdif, hslot2, skb) → this searches the bucket for matching socket → match criteria: socket’s bound address matches daddr AND socket’s bound port matches hnum → if found: return pointer to struct sock → if not found: return NULL
STEP 7: open file /home/r/Desktop/learn_kernel/source/include/linux/jhash.h line 171-174 → read function: static inline u32 jhash_1word(u32 a, u32 initval) { return __jhash_nwords(a, 0, 0, initval + JHASH_INITVAL + (1 << 2)); } → JHASH_INITVAL = 0xdeadbeef (line 58) → calculation: initval + 0xdeadbeef + 4 → __jhash_nwords performs mixing using __jhash_final macro (lines 46-55) → output: 32-bit hash value with good distribution
STEP 8: numerical example: receiver binds to port 9999 on 127.0.0.1 → when receiver calls bind(fd, {AF_INET, 9999, 127.0.0.1}) → kernel calls __udp_lib_get_port() → kernel inserts socket into udptable at slot = ipv4_portaddr_hash(net, 127.0.0.1, 9999) & mask → later packet arrives with dest_port=9999 → kernel calls __udp4_lib_lookup() → computes same hash → finds socket in same slot → returns pointer → socket contains: sk->sk_socket->file->f_owner->pid = receiver PID
STEP 9: verify O(1): hash computation = constant time (fixed number of XOR, rotate, add) → array access = constant time → bucket search = O(k) where k = collision count → with good hash, k ≈ 1 on average → ∴ lookup = O(1) amortized
STEP 10: struct sock to PID trace: struct sock at /home/r/Desktop/learn_kernel/source/include/net/sock.h → sock->sk_socket (type struct socket *) → socket->file (type struct file *) → file->f_owner.pid (type struct pid *) → kernel can get PID value → but kernel doesn’t need PID to queue data → kernel queues sk_buff to sock->sk_receive_queue directly → process sleeping in recv() is woken via wake_up_interruptible(sk_sleep(sk))
CALCULATION PROOF — HASH FOR PORT 9999 ON 127.0.0.1:
IP = 127.0.0.1 = 0x7F000001 (hex)
port = 9999 = 0x270F (hex)
net_hash_mix(net) = random per-namespace seed, assume = 0x12345678
jhash_1word(0x7F000001, 0x12345678):
initval = 0x12345678
a = 0x7F000001
b = 0
c = 0
initval' = initval + 0xdeadbeef + 4 = 0x12345678 + 0xdeadbeef + 4 = 0xF1234571 (ignoring overflow)
a += initval' → a = 0x7F000001 + 0xF1234571 = 0x70234572
b += initval' → b = 0xF1234571
c += initval' → c = 0xF1234571
__jhash_final(a, b, c) → many XOR/rotate/subtract → final c = some 32-bit value
→ assume c = 0xABCD1234
hash2 = 0xABCD1234 ^ 9999 = 0xABCD1234 ^ 0x270F = 0xABCD353B
If table_size = 256:
mask = 255 = 0xFF
slot2 = 0xABCD353B & 0xFF = 0x3B = 59 decimal
∴ socket for port 9999 on 127.0.0.1 is in bucket 59 of hash table
CODE TO VERIFY ON YOUR MACHINE:
// file: hash_test.c — compile with: gcc -o hash_test hash_test.c
#include <stdio.h>
#include <stdint.h>
#define JHASH_INITVAL 0xdeadbeef
#define rol32(x, n) (((x) << (n)) | ((x) >> (32-(n))))
#define __jhash_final(a, b, c) \
{ \
c ^= b; c -= rol32(b, 14); \
a ^= c; a -= rol32(c, 11); \
b ^= a; b -= rol32(a, 25); \
c ^= b; c -= rol32(b, 16); \
a ^= c; a -= rol32(c, 4); \
b ^= a; b -= rol32(a, 14); \
c ^= b; c -= rol32(b, 24); \
}
uint32_t jhash_1word(uint32_t a, uint32_t initval) {
uint32_t b = 0, c = 0;
a += initval + JHASH_INITVAL + 4;
b += initval + JHASH_INITVAL + 4;
c += initval + JHASH_INITVAL + 4;
__jhash_final(a, b, c);
return c;
}
int main() {
uint32_t ip = 0x7F000001; // 127.0.0.1
uint16_t port = 9999;
uint32_t net_seed = 0; // assume 0 for init_net
uint32_t hash = jhash_1word(ip, net_seed) ^ port;
uint32_t slot = hash & 0xFF; // assume 256-entry table
printf("IP: 0x%08X (127.0.0.1)\n", ip);
printf("Port: %u (0x%04X)\n", port, port);
printf("Hash: 0x%08X\n", hash);
printf("Slot: %u (table index)\n", slot);
return 0;
}
RUN: gcc -o hash_test hash_test.c && ./hash_test → see actual slot number for port 9999
VIOLATIONS CHECK: