Learn Zig Series (#59) - Assembler: Instruction Encoding

Project G: Assembler/Disassembler (1/3)
What will I learn
- How to design a minimal instruction set architecture (ISA) with MOV, ADD, SUB, MUL, CMP, JMP, JEQ, JNE, and HLT;
- How register files work and how to model 8 general-purpose registers (R0-R7) in Zig;
- Instruction encoding schemes: packing opcodes and operands into 16-bit words;
- The difference between register-register and register-immediate instruction formats;
- Building an instruction encoder that turns structured data into raw bit patterns;
- Writing a minimal virtual machine that fetches, decodes, and executes encoded instructions;
- Running a complete program: loading instructions into memory and executing until HLT;
- Testing instruction encoding with known bit patterns using Zig's built-in test framework.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- An installed Zig 0.14+ distribution (download from ziglang.org);
- The ambition to learn Zig programming.
Difficulty
- Advanced
Curriculum (of the Learn Zig Series):
- Zig Programming Tutorial - ep001 - Intro
- Learn Zig Series (#2) - Hello Zig, Variables and Types
- Learn Zig Series (#3) - Functions and Control Flow
- Learn Zig Series (#4) - Error Handling (Zig's Best Feature)
- Learn Zig Series (#5) - Arrays, Slices, and Strings
- Learn Zig Series (#6) - Structs, Enums, and Tagged Unions
- Learn Zig Series (#7) - Memory Management and Allocators
- Learn Zig Series (#8) - Pointers and Memory Layout
- Learn Zig Series (#9) - Comptime (Zig's Superpower)
- Learn Zig Series (#10) - Project Structure, Modules, and File I/O
- Learn Zig Series (#11) - Mini Project: Building a Step Sequencer
- Learn Zig Series (#12) - Testing and Test-Driven Development
- Learn Zig Series (#13) - Interfaces via Type Erasure
- Learn Zig Series (#14) - Generics with Comptime Parameters
- Learn Zig Series (#15) - The Build System (build.zig)
- Learn Zig Series (#16) - Sentinel-Terminated Types and C Strings
- Learn Zig Series (#17) - Packed Structs and Bit Manipulation
- Learn Zig Series (#18) - Async Concepts and Event Loops
- Learn Zig Series (#18b) - Addendum: Async Returns in Zig 0.16
- Learn Zig Series (#19) - SIMD with @Vector
- Learn Zig Series (#20) - Working with JSON
- Learn Zig Series (#21) - Networking and TCP Sockets
- Learn Zig Series (#22) - Hash Maps and Data Structures
- Learn Zig Series (#23) - Iterators and Lazy Evaluation
- Learn Zig Series (#24) - Logging, Formatting, and Debug Output
- Learn Zig Series (#25) - Mini Project: HTTP Status Checker
- Learn Zig Series (#26) - Writing a Custom Allocator
- Learn Zig Series (#27) - C Interop: Calling C from Zig
- Learn Zig Series (#28) - C Interop: Exposing Zig to C
- Learn Zig Series (#29) - Inline Assembly and Low-Level Control
- Learn Zig Series (#30) - Thread Safety and Atomics
- Learn Zig Series (#31) - Memory-Mapped I/O and Files
- Learn Zig Series (#32) - Compile-Time Reflection with @typeInfo
- Learn Zig Series (#33) - Building a State Machine with Tagged Unions
- Learn Zig Series (#34) - Performance Profiling and Optimization
- Learn Zig Series (#35) - Cross-Compilation and Target Triples
- Learn Zig Series (#36) - Mini Project: CLI Task Runner
- Learn Zig Series (#37) - Markdown to HTML: Tokenizer and Lexer
- Learn Zig Series (#38) - Markdown to HTML: Parser and AST
- Learn Zig Series (#39) - Markdown to HTML: Renderer and CLI
- Learn Zig Series (#40) - Key-Value Store: In-Memory Store
- Learn Zig Series (#41) - Key-Value Store: Write-Ahead Log
- Learn Zig Series (#42) - Key-Value Store: TCP Server
- Learn Zig Series (#43) - Key-Value Store: Client Library and Benchmarks
- Learn Zig Series (#44) - Image Tool: Reading and Writing PPM/BMP
- Learn Zig Series (#45) - Image Tool: Pixel Operations
- Learn Zig Series (#46) - Image Tool: CLI Pipeline
- Learn Zig Series (#47) - Build a Shell: Parsing Commands
- Learn Zig Series (#48) - Build a Shell: Process Spawning
- Learn Zig Series (#49) - Build a Shell: Built-in Commands
- Learn Zig Series (#50) - Build a Shell: Job Control and Signals
- Learn Zig Series (#51) - HTTP Server: Accept Loop and Parsing
- Learn Zig Series (#52) - HTTP Server: Router and Responses
- Learn Zig Series (#53) - HTTP Server: Static Files and MIME
- Learn Zig Series (#54) - HTTP Server: Middleware and Logging
- Learn Zig Series (#55) - ECS Game Engine: Architecture
- Learn Zig Series (#56) - ECS Game Engine: Component Storage
- Learn Zig Series (#57) - ECS Game Engine: Systems and Queries
- Learn Zig Series (#58) - ECS Game Engine: Terminal Rendering
- Learn Zig Series (#59) - Assembler: Instruction Encoding (this post)
Learn Zig Series (#59) - Assembler: Instruction Encoding
We just finished a full ECS game engine across four episodes -- architecture, component storage, systems, and terminal rendering. Time for something completely different. This episode starts a new three-part project: we're building an assembler and disassembler from scratch in Zig. An actual tool that takes human-readable assembly mnemonics, encodes them into machine instructions, and a VM that can execute those instructions. Plus a disassembler that turns binary back into readable text.
Why an assembler? Because this is where programming meets the hardware. Every high-level language you've ever used eventually compiles or interprets down to machine instructions. Understanding how those instructions are encoded -- how ADD R0, R1 becomes a specific pattern of ones and zeroes -- gives you insight into what your code actually does when it runs. And Zig is uniquely suited for this kind of work. We've already used packed structs for bit manipulation in episode 17, inline assembly in episode 29, and we know how memory layout works from episode 8. Now we're building the layer between assembly text and raw bytes.
Today's episode covers the foundation: designing an instruction set, encoding instructions into 16-bit words, and building a minimal VM to execute them. In the next two episodes we'll add a two-pass assembler that handles labels and forward references, and then a disassembler with a binary inspector. Here we go!
Designing the instruction set
Before we can encode anything we need to decide what instructions our processor supports. Real processors (x86, ARM, RISC-V) have hundreds of instructions, but our teaching ISA needs just enough to be interesting without being overwhelming. We want arithmetic, data movement, comparisons, conditional branches, and a way to stop.
const Opcode = enum(u4) {
hlt = 0x0, // halt execution
mov = 0x1, // move: dst = src
add = 0x2, // add: dst = dst + src
sub = 0x3, // subtract: dst = dst - src
mul = 0x4, // multiply: dst = dst * src
cmp = 0x5, // compare: set flags based on dst - src
jmp = 0x6, // unconditional jump to address
jeq = 0x7, // jump if equal (zero flag set)
jne = 0x8, // jump if not equal (zero flag clear)
load = 0x9, // load from memory: dst = mem[src]
store = 0xA, // store to memory: mem[dst] = src
push = 0xB, // push register to stack
pop = 0xC, // pop stack to register
call = 0xD, // call subroutine (push PC, jump)
ret = 0xE, // return from subroutine (pop PC)
nop = 0xF, // no operation
};
The enum(u4) is critical -- we're telling Zig this enum uses exactly 4 bits of storage. That gives us 16 possible opcodes (0x0 through 0xF), and we use all of them. A real RISC architecture would use more bits for the opcode (RISC-V uses 7), but 4 bits keeps our encoding simple enough to reason about by hand.
Why these specific instructions? MOV, ADD, SUB, and MUL handle basic computation. CMP with JEQ/JNE gives us conditional logic -- you can build if/else and loops from those. JMP is for unconditional branches. LOAD/STORE let us access memory beyond registers. PUSH/POP/CALL/RET give us a stack and subroutine support -- without those you can't write structured programs. HLT stops the machine. And NOP does nothing, but it's useful as padding and for debugging.
Having said that, there's a design tradeoff here. With only 4 bits for the opcode we can't add more instructions later without redesigning the encoding. Real ISAs either use wider opcodes or use a prefix/extension scheme. For a teaching project, 16 instructions is plenty.
The register file
Our processor has 8 general-purpose registers, numbered R0 through R7. Three bits to address them (2^3 = 8). We also need a program counter (PC), a stack pointer (SP), and a flags register:
const NUM_REGISTERS = 8;
const Flags = packed struct {
zero: bool = false, // last comparison was equal
negative: bool = false, // last comparison result was negative
carry: bool = false, // unsigned overflow
_padding: u5 = 0,
};
const RegisterFile = struct {
gpr: [NUM_REGISTERS]u16, // general-purpose registers R0-R7
pc: u16, // program counter
sp: u16, // stack pointer
flags: Flags,
fn init() RegisterFile {
return .{
.gpr = [_]u16{0} ** NUM_REGISTERS,
.pc = 0,
.sp = 0xFFFF, // stack grows downward from top of memory
.flags = .{},
};
}
fn get(self: *const RegisterFile, reg: u3) u16 {
return self.gpr[reg];
}
fn set(self: *RegisterFile, reg: u3, val: u16) void {
self.gpr[reg] = val;
}
};
The Flags struct uses packed struct -- we covered those in detail back in episode 17. The flags are set by the CMP instruction and read by JEQ/JNE. The zero flag means "the two values compared were equal" (the subtraction result was zero). The negative flag means "the first value was less than the second". This is exactly how real x86 FLAGS register works, just with fewer bits.
The stack pointer starts at 0xFFFF and grows downward. This convention comes from historical hardware design -- the stack and the program (which starts at address 0 and grows upward) grow toward each other. If they meet, you've run out of memory. In our 16-bit address space that gives us 65,536 bytes of total memory, which is more than enough for our purposes.
Why u3 for the register index? Because 3 bits exactly addresses 8 registers. Zig won't let you pass a value larger than 7 to a u3 parameter -- the compiler enforces the valid range at the type level. No runtime bounds check needed.
Instruction encoding: 16-bit words
This is where the real design decisions happen. We need to pack an opcode plus operands into a fixed-size word. Real ISAs use various word sizes (x86 has variable-length instructions from 1 to 15 bytes, ARM uses fixed 32-bit, RISC-V uses 32-bit with compressed 16-bit extensions). We're using 16-bit words because they're small enough to visualize in binary on paper but large enough to hold meaningful operands.
Our encoding format splits the 16 bits as follows:
Register-Register format:
[opcode:4][dst:3][mode:1][src:3][unused:5]
15..12 11..9 8 7..5 4..0
Register-Immediate format:
[opcode:4][dst:3][mode:1][imm8:8]
15..12 11..9 8 7..0
The mode bit (bit 8) determines the format. When mode = 0, the source is a register. When mode = 1, the source is an 8-bit immediate value embedded in the instruction. This means register-register operations can use all 8 registers for both source and destination, while immediate operations can only encode values 0-255 in the instruction itself.
const Instruction = packed struct {
// Fields are ordered from LSB to MSB in packed structs
payload: u8, // bits 0-7: either [src:3][unused:5] or [imm8:8]
mode: u1, // bit 8: 0 = register, 1 = immediate
dst: u3, // bits 9-11: destination register
opcode: u4, // bits 12-15: operation code
fn encodeReg(op: Opcode, dst: u3, src: u3) Instruction {
return .{
.opcode = @intFromEnum(op),
.dst = dst,
.mode = 0,
.payload = @as(u8, src) << 5,
};
}
fn encodeImm(op: Opcode, dst: u3, imm: u8) Instruction {
return .{
.opcode = @intFromEnum(op),
.dst = dst,
.mode = 1,
.payload = imm,
};
}
fn encodeNoArgs(op: Opcode) Instruction {
return .{
.opcode = @intFromEnum(op),
.dst = 0,
.mode = 0,
.payload = 0,
};
}
fn getSrc(self: Instruction) u3 {
return @truncate(self.payload >> 5);
}
fn getImm(self: Instruction) u8 {
return self.payload;
}
fn toU16(self: Instruction) u16 {
return @bitCast(self);
}
fn fromU16(val: u16) Instruction {
return @bitCast(val);
}
};
This is packed struct territory again. Because Instruction is packed, Zig guarantees the fields occupy exactly these bit positions with no padding between them. The @bitCast between Instruction and u16 is a zero-cost operation -- it's the same bits interpreted differently. No conversion, no copying, no runtime work at all.
One subtlety: in Zig's packed structs, the first field occupies the LEAST significant bits. So payload is bits 0-7, mode is bit 8, dst is bits 9-11, and opcode is bits 12-15. This is the opposite of how we write it on paper (where we usually put the most significant bits first), so watch out for that when debugging.
The encodeReg function places the source register in the top 3 bits of the payload field (bits 5-7 of the payload, which are bits 5-7 of the full instruction). In register mode the lower 5 bits are unused. In immediate mode the full 8-bit payload is the immediate value. This dual-use of the same bit field is the whole point of the mode bit -- it tells the decoder how to interpret those 8 bits.
A helper for readable instruction building
Working with raw bit fields gets tedious fast. Let's build a helper module that creates instructions from human-readable parameters:
const asm_builder = struct {
fn mov_reg(dst: u3, src: u3) u16 {
return (Instruction.encodeReg(.mov, dst, src)).toU16();
}
fn mov_imm(dst: u3, imm: u8) u16 {
return (Instruction.encodeImm(.mov, dst, imm)).toU16();
}
fn add_reg(dst: u3, src: u3) u16 {
return (Instruction.encodeReg(.add, dst, src)).toU16();
}
fn add_imm(dst: u3, imm: u8) u16 {
return (Instruction.encodeImm(.add, dst, imm)).toU16();
}
fn sub_reg(dst: u3, src: u3) u16 {
return (Instruction.encodeReg(.sub, dst, src)).toU16();
}
fn sub_imm(dst: u3, imm: u8) u16 {
return (Instruction.encodeImm(.sub, dst, imm)).toU16();
}
fn mul_reg(dst: u3, src: u3) u16 {
return (Instruction.encodeReg(.mul, dst, src)).toU16();
}
fn mul_imm(dst: u3, imm: u8) u16 {
return (Instruction.encodeImm(.mul, dst, imm)).toU16();
}
fn cmp_reg(dst: u3, src: u3) u16 {
return (Instruction.encodeReg(.cmp, dst, src)).toU16();
}
fn cmp_imm(dst: u3, imm: u8) u16 {
return (Instruction.encodeImm(.cmp, dst, imm)).toU16();
}
fn jmp(addr: u8) u16 {
return (Instruction.encodeImm(.jmp, 0, addr)).toU16();
}
fn jeq(addr: u8) u16 {
return (Instruction.encodeImm(.jeq, 0, addr)).toU16();
}
fn jne(addr: u8) u16 {
return (Instruction.encodeImm(.jne, 0, addr)).toU16();
}
fn push(reg: u3) u16 {
return (Instruction.encodeReg(.push, reg, 0)).toU16();
}
fn pop(reg: u3) u16 {
return (Instruction.encodeReg(.pop, reg, 0)).toU16();
}
fn call(addr: u8) u16 {
return (Instruction.encodeImm(.call, 0, addr)).toU16();
}
fn halt() u16 {
return (Instruction.encodeNoArgs(.hlt)).toU16();
}
fn nop() u16 {
return (Instruction.encodeNoArgs(.nop)).toU16();
}
};
Now writing programs becomes much more readable. Instead of manually computing bit patterns, you write asm_builder.mov_imm(0, 42) to load the value 42 into R0. The builder functions return u16 values that can be stored directly in our program memory. Each function is a one-liner that creates an Instruction, converts it to u16, and returns it -- all comptime-evaluable if the arguments are comptime known.
The virtual machine
The VM is the part that executes our encoded instructions. It's a fetch-decode-execute loop: read the instruction at the current PC, decode it, do the operation, advance PC, repeat until HLT.
const MEMORY_SIZE = 65536; // 64KB -- full 16-bit address space
const VM = struct {
regs: RegisterFile,
memory: []u8,
halted: bool,
allocator: std.mem.Allocator,
fn init(allocator: std.mem.Allocator) !VM {
const mem = try allocator.alloc(u8, MEMORY_SIZE);
@memset(mem, 0);
return .{
.regs = RegisterFile.init(),
.memory = mem,
.halted = false,
.allocator = allocator,
};
}
fn deinit(self: *VM) void {
self.allocator.free(self.memory);
}
fn loadProgram(self: *VM, program: []const u16) void {
for (program, 0..) |word, i| {
const addr = i * 2;
if (addr + 1 >= MEMORY_SIZE) break;
// Store little-endian
self.memory[addr] = @truncate(word);
self.memory[addr + 1] = @truncate(word >> 8);
}
}
fn fetch(self: *VM) u16 {
const addr = @as(usize, self.regs.pc);
if (addr + 1 >= MEMORY_SIZE) {
self.halted = true;
return 0;
}
const lo: u16 = self.memory[addr];
const hi: u16 = self.memory[addr + 1];
self.regs.pc += 2; // each instruction is 2 bytes
return lo | (hi << 8);
}
fn getOperandValue(self: *VM, instr: Instruction) u16 {
if (instr.mode == 1) {
return @as(u16, instr.getImm());
} else {
return self.regs.get(instr.getSrc());
}
}
fn execute(self: *VM, instr: Instruction) void {
const op: Opcode = @enumFromInt(instr.opcode);
switch (op) {
.hlt => {
self.halted = true;
},
.nop => {},
.mov => {
const val = self.getOperandValue(instr);
self.regs.set(instr.dst, val);
},
.add => {
const a = self.regs.get(instr.dst);
const b = self.getOperandValue(instr);
const result = a +% b; // wrapping add
self.regs.set(instr.dst, result);
},
.sub => {
const a = self.regs.get(instr.dst);
const b = self.getOperandValue(instr);
const result = a -% b; // wrapping sub
self.regs.set(instr.dst, result);
},
.mul => {
const a = self.regs.get(instr.dst);
const b = self.getOperandValue(instr);
const result = a *% b; // wrapping mul
self.regs.set(instr.dst, result);
},
.cmp => {
const a = self.regs.get(instr.dst);
const b = self.getOperandValue(instr);
const result = a -% b;
self.regs.flags.zero = (result == 0);
self.regs.flags.negative = (result & 0x8000) != 0;
self.regs.flags.carry = b > a;
},
.jmp => {
const addr = self.getOperandValue(instr);
self.regs.pc = addr;
},
.jeq => {
if (self.regs.flags.zero) {
const addr = self.getOperandValue(instr);
self.regs.pc = addr;
}
},
.jne => {
if (!self.regs.flags.zero) {
const addr = self.getOperandValue(instr);
self.regs.pc = addr;
}
},
.load => {
const addr = self.getOperandValue(instr);
const uaddr = @as(usize, addr);
if (uaddr + 1 < MEMORY_SIZE) {
const lo: u16 = self.memory[uaddr];
const hi: u16 = self.memory[uaddr + 1];
self.regs.set(instr.dst, lo | (hi << 8));
}
},
.store => {
const val = self.regs.get(instr.dst);
const addr = self.getOperandValue(instr);
const uaddr = @as(usize, addr);
if (uaddr + 1 < MEMORY_SIZE) {
self.memory[uaddr] = @truncate(val);
self.memory[uaddr + 1] = @truncate(val >> 8);
}
},
.push => {
const val = self.regs.get(instr.dst);
self.regs.sp -%= 2;
const sp = @as(usize, self.regs.sp);
if (sp + 1 < MEMORY_SIZE) {
self.memory[sp] = @truncate(val);
self.memory[sp + 1] = @truncate(val >> 8);
}
},
.pop => {
const sp = @as(usize, self.regs.sp);
if (sp + 1 < MEMORY_SIZE) {
const lo: u16 = self.memory[sp];
const hi: u16 = self.memory[sp + 1];
self.regs.set(instr.dst, lo | (hi << 8));
}
self.regs.sp +%= 2;
},
.call => {
// push return address
self.regs.sp -%= 2;
const sp = @as(usize, self.regs.sp);
if (sp + 1 < MEMORY_SIZE) {
self.memory[sp] = @truncate(self.regs.pc);
self.memory[sp + 1] = @truncate(self.regs.pc >> 8);
}
const addr = self.getOperandValue(instr);
self.regs.pc = addr;
},
.ret => {
const sp = @as(usize, self.regs.sp);
if (sp + 1 < MEMORY_SIZE) {
const lo: u16 = self.memory[sp];
const hi: u16 = self.memory[sp + 1];
self.regs.pc = lo | (hi << 8);
}
self.regs.sp +%= 2;
},
}
}
fn step(self: *VM) void {
if (self.halted) return;
const word = self.fetch();
const instr = Instruction.fromU16(word);
self.execute(instr);
}
fn run(self: *VM) void {
while (!self.halted) {
self.step();
}
}
};
Let me walk through the key design decisions in this VM.
Wrapping arithmetic. We use +%, -%, and *% (Zig's wrapping operators) for arithmetic. In a real CPU, arithmetic just wraps on overflow -- adding 1 to 0xFFFF gives you 0x0000 without any error. Zig's default arithmetic operators trap on overflow (which is great for application code) but a CPU emulator needs wrapping behavior. We covered these operators early on but this is the first time they're essential to correctness.
Little-endian byte order. Instructions are stored in memory as two bytes, low byte first. This is the same byte order x86 uses. When loadProgram stores a u16 word, the lower 8 bits go to address N and the upper 8 bits go to address N+1. When fetch reads them back, it reconstructs the u16 the same way. If you get the byte order wrong, every instruction decodes to garbage -- this is one of the most common bugs in emulator development.
PC advances before execution. The fetch increments PC by 2 (instruction size) before the instruction runs. This matters for branch instructions: when JMP 10 executes, PC has already moved past the JMP instruction. So the jump target is an absolute address, not relative to the JMP. A relative jump design would use the post-increment PC as the base, which is also valid but different.
Bounds checking. Every memory access checks bounds before reading or writing. An out-of-bounds fetch halts the VM rather than crashing with an index-out-of-range. This is important because programs can contain bugs, and a VM should fail gracefully rather then taking down the host process.
Running a program
Time to use all of this. Let's write a program that computes 5 + 10 + 20 and stores the result in R0:
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer {
const check = gpa.deinit();
if (check == .leak) @panic("memory leak detected");
}
const allocator = gpa.allocator();
var vm = try VM.init(allocator);
defer vm.deinit();
// Program: R0 = 5 + 10 + 20
const program = [_]u16{
asm_builder.mov_imm(0, 5), // R0 = 5
asm_builder.mov_imm(1, 10), // R1 = 10
asm_builder.mov_imm(2, 20), // R2 = 20
asm_builder.add_reg(0, 1), // R0 = R0 + R1 = 15
asm_builder.add_reg(0, 2), // R0 = R0 + R2 = 35
asm_builder.halt(), // stop
};
vm.loadProgram(&program);
vm.run();
std.debug.print("R0 = {d}\n", .{vm.regs.get(0)});
std.debug.print("R1 = {d}\n", .{vm.regs.get(1)});
std.debug.print("R2 = {d}\n", .{vm.regs.get(2)});
std.debug.print("PC = {d}\n", .{vm.regs.pc});
}
Output:
R0 = 35
R1 = 10
R2 = 20
PC = 12
R0 holds 35 (5 + 10 + 20). R1 and R2 are unchanged after being set. PC = 12 because we executed 6 instructions of 2 bytes each (12 bytes total), and HLT doesn't advance PC further.
Here's something more interesting -- a loop that sums the numbers 1 through 10:
const loop_program = [_]u16{
asm_builder.mov_imm(0, 0), // 0x00: R0 = 0 (accumulator)
asm_builder.mov_imm(1, 1), // 0x02: R1 = 1 (counter)
asm_builder.mov_imm(2, 10), // 0x04: R2 = 10 (limit)
// loop start at address 0x06:
asm_builder.add_reg(0, 1), // 0x06: R0 += R1
asm_builder.add_imm(1, 1), // 0x08: R1 += 1
asm_builder.cmp_reg(1, 2), // 0x0A: compare R1 with R2
asm_builder.jne(0x06), // 0x0C: if R1 != R2, jump back to loop
asm_builder.add_reg(0, 1), // 0x0E: add the final value (R1 == 10)
asm_builder.halt(), // 0x10: done
};
After execution R0 = 55 (1+2+3+4+5+6+7+8+9+10). The loop runs with R1 going from 1 to 9 (adding each to R0), then falls through when R1 equals R2 (10), and we add that last value explicitly. This is the classic sum formula n*(n+1)/2 computed iteratively -- and we're doing it by shuffling bits around in memory. Kind of beautiful, honestly ;-)
Notice how we manually computed the branch target address (0x06). Every instruction is 2 bytes, so instruction 0 is at address 0x00, instruction 1 at 0x02, instruction 2 at 0x04, and the loop body starts at instruction 3 = address 0x06. This is tedious and error-prone -- it's exactly the problem an assembler solves. In the next episode we'll build a proper two-pass assembler with labels so you can write jne loop_start instead of jne 0x06.
A subroutine example: CALL and RET
The CALL and RET instructions let us write reusable subroutines. CALL pushes the return address (current PC) onto the stack and jumps to the subroutine. RET pops the return address and jumps back:
const call_program = [_]u16{
// Main program
asm_builder.mov_imm(0, 7), // 0x00: R0 = 7
asm_builder.mov_imm(1, 3), // 0x02: R1 = 3
asm_builder.call(0x0A), // 0x04: call subroutine at 0x0A
asm_builder.halt(), // 0x06: done (R0 should be 21)
asm_builder.nop(), // 0x08: padding
// Subroutine at 0x0A: multiply R0 by R1
asm_builder.mul_reg(0, 1), // 0x0A: R0 = R0 * R1
asm_builder.ret(), // 0x0C: return to caller
};
When CALL executes at address 0x04, PC has already advanced to 0x06 (the next instruction). That value (0x06) gets pushed to the stack. Then PC jumps to 0x0A (the subroutine). The MUL executes, then RET pops 0x06 from the stack and sets PC to it. Execution continues at the HLT instruction. R0 ends up holding 21 (7 * 3).
The NOP at 0x08 is just padding to make the subroutine start at a round address. A real assembler would handle address calculation automatically and you wouldn't need it. But when hand-assembling, padding NOPs are your friend.
Printing register and memory state
For debugging it's useful to dump the VM state after execution:
fn dumpRegisters(regs: *const RegisterFile) void {
std.debug.print("\n--- Register Dump ---\n", .{});
var i: u3 = 0;
while (true) {
std.debug.print(" R{d} = 0x{X:0>4} ({d})\n", .{
@as(u8, i),
regs.get(i),
regs.get(i),
});
if (i == 7) break;
i += 1;
}
std.debug.print(" PC = 0x{X:0>4}\n", .{regs.pc});
std.debug.print(" SP = 0x{X:0>4}\n", .{regs.sp});
std.debug.print(" Flags: Z={} N={} C={}\n", .{
@as(u1, @intFromBool(regs.flags.zero)),
@as(u1, @intFromBool(regs.flags.negative)),
@as(u1, @intFromBool(regs.flags.carry)),
});
}
fn dumpMemory(memory: []const u8, start: usize, count: usize) void {
std.debug.print("\n--- Memory [{X:0>4}..{X:0>4}] ---\n", .{
start,
start + count - 1,
});
var i: usize = 0;
while (i < count) : (i += 2) {
const addr = start + i;
if (addr + 1 >= memory.len) break;
const lo: u16 = memory[addr];
const hi: u16 = memory[addr + 1];
const word = lo | (hi << 8);
const instr = Instruction.fromU16(word);
const op: Opcode = @enumFromInt(instr.opcode);
std.debug.print(" 0x{X:0>4}: 0x{X:0>4} ({s})\n", .{
addr,
word,
@tagName(op),
});
}
}
The dumpMemory function does something interesting -- it decodes each 16-bit word back into an instruction and prints the opcode name. This is the seed of a disassembler (which we'll flesh out properly in episode 61). For now it helps verify that our encoding is correct: if you load mov_imm(0, 42) into memory and the dump shows mov, you know the opcode bits are in the right place.
The register dump prints each value in both hex and decimal. Hex is essential for checking bit patterns -- if R0 should be 0x0023 (35 decimal) after adding 5+10+20, seeing the hex confirms there's no high-byte corruption. The flags display shows the current state of the condition flags, which is crucial when debugging branch logic.
Testing the encoder
Tests are what keep this from being a hand-wavy mess. We need to verify that our encoder produces the exact bit patterns we expect, and that the decoder extracts the correct fields back out. Let's write tests that check specific known values:
const std = @import("std");
const testing = std.testing;
test "encode MOV R0, #42 produces correct bit pattern" {
const instr = Instruction.encodeImm(.mov, 0, 42);
const word = instr.toU16();
// Decode back and verify fields
const decoded = Instruction.fromU16(word);
const op: Opcode = @enumFromInt(decoded.opcode);
try testing.expectEqual(Opcode.mov, op);
try testing.expectEqual(@as(u3, 0), decoded.dst);
try testing.expectEqual(@as(u1, 1), decoded.mode);
try testing.expectEqual(@as(u8, 42), decoded.getImm());
}
test "encode ADD R3, R5 roundtrips correctly" {
const instr = Instruction.encodeReg(.add, 3, 5);
const word = instr.toU16();
const decoded = Instruction.fromU16(word);
const op: Opcode = @enumFromInt(decoded.opcode);
try testing.expectEqual(Opcode.add, op);
try testing.expectEqual(@as(u3, 3), decoded.dst);
try testing.expectEqual(@as(u1, 0), decoded.mode);
try testing.expectEqual(@as(u3, 5), decoded.getSrc());
}
test "HLT encodes to opcode 0" {
const instr = Instruction.encodeNoArgs(.hlt);
const word = instr.toU16();
// HLT should be opcode 0 in the top 4 bits
try testing.expectEqual(@as(u16, 0x0000), word);
}
test "NOP encodes to opcode 0xF" {
const instr = Instruction.encodeNoArgs(.nop);
const word = instr.toU16();
const decoded = Instruction.fromU16(word);
const op: Opcode = @enumFromInt(decoded.opcode);
try testing.expectEqual(Opcode.nop, op);
}
test "all opcodes roundtrip through encode/decode" {
inline for (0..16) |i| {
const op: Opcode = @enumFromInt(i);
const instr = Instruction.encodeImm(op, 0, 0);
const decoded = Instruction.fromU16(instr.toU16());
const decoded_op: Opcode = @enumFromInt(decoded.opcode);
try testing.expectEqual(op, decoded_op);
}
}
test "immediate values 0-255 survive encoding" {
var i: u16 = 0;
while (i < 256) : (i += 1) {
const imm: u8 = @truncate(i);
const instr = Instruction.encodeImm(.mov, 0, imm);
const decoded = Instruction.fromU16(instr.toU16());
try testing.expectEqual(imm, decoded.getImm());
}
}
test "all register pairs roundtrip" {
var dst: u4 = 0;
while (dst < 8) : (dst += 1) {
var src: u4 = 0;
while (src < 8) : (src += 1) {
const d: u3 = @truncate(dst);
const s: u3 = @truncate(src);
const instr = Instruction.encodeReg(.mov, d, s);
const decoded = Instruction.fromU16(instr.toU16());
try testing.expectEqual(d, decoded.dst);
try testing.expectEqual(s, decoded.getSrc());
}
}
}
The inline for in the "all opcodes" test is worth pointing out. It unrolls the loop at compile time so each opcode gets tested with its own test assertion. If opcode 7 fails, you see @enumFromInt(7) in the error, not just "one of the 16 iterations failed". We covered inline for semantics in episode 9.
The exhaustive test for immediate values checks all 256 possible values (0-255). If there's a bit-shift error in the encoding, at least one of these will fail. Same for the register pair test -- all 64 combinations of source and destination must survive the encode/decode roundtrip. These kinds of exhaustive tests are cheap to write, fast to run, and catch entire classes of bugs in one go.
Testing the VM
Encoding tests verify the format. VM tests verify the behavior -- that instructions actually DO what they should:
test "VM: MOV immediate loads value" {
const allocator = testing.allocator;
var vm = try VM.init(allocator);
defer vm.deinit();
const prog = [_]u16{
asm_builder.mov_imm(0, 99),
asm_builder.halt(),
};
vm.loadProgram(&prog);
vm.run();
try testing.expectEqual(@as(u16, 99), vm.regs.get(0));
}
test "VM: ADD register adds correctly" {
const allocator = testing.allocator;
var vm = try VM.init(allocator);
defer vm.deinit();
const prog = [_]u16{
asm_builder.mov_imm(0, 30),
asm_builder.mov_imm(1, 12),
asm_builder.add_reg(0, 1),
asm_builder.halt(),
};
vm.loadProgram(&prog);
vm.run();
try testing.expectEqual(@as(u16, 42), vm.regs.get(0));
}
test "VM: CMP sets zero flag on equal values" {
const allocator = testing.allocator;
var vm = try VM.init(allocator);
defer vm.deinit();
const prog = [_]u16{
asm_builder.mov_imm(0, 5),
asm_builder.mov_imm(1, 5),
asm_builder.cmp_reg(0, 1),
asm_builder.halt(),
};
vm.loadProgram(&prog);
vm.run();
try testing.expect(vm.regs.flags.zero);
try testing.expect(!vm.regs.flags.negative);
}
test "VM: JNE branches when not equal" {
const allocator = testing.allocator;
var vm = try VM.init(allocator);
defer vm.deinit();
const prog = [_]u16{
asm_builder.mov_imm(0, 3), // 0x00
asm_builder.mov_imm(1, 5), // 0x02
asm_builder.cmp_reg(0, 1), // 0x04: 3 != 5, zero=false
asm_builder.jne(0x0A), // 0x06: should jump
asm_builder.mov_imm(2, 99), // 0x08: skipped
asm_builder.halt(), // 0x0A: lands here
};
vm.loadProgram(&prog);
vm.run();
// R2 should NOT be 99 (the mov was skipped)
try testing.expectEqual(@as(u16, 0), vm.regs.get(2));
}
test "VM: CALL and RET work together" {
const allocator = testing.allocator;
var vm = try VM.init(allocator);
defer vm.deinit();
const prog = [_]u16{
asm_builder.mov_imm(0, 7), // 0x00
asm_builder.call(0x08), // 0x02: call sub at 0x08
asm_builder.halt(), // 0x04: return here
asm_builder.nop(), // 0x06: padding
// subroutine at 0x08:
asm_builder.add_imm(0, 3), // 0x08: R0 += 3
asm_builder.ret(), // 0x0A: return
};
vm.loadProgram(&prog);
vm.run();
try testing.expectEqual(@as(u16, 10), vm.regs.get(0));
}
test "VM: loop counts down to zero" {
const allocator = testing.allocator;
var vm = try VM.init(allocator);
defer vm.deinit();
const prog = [_]u16{
asm_builder.mov_imm(0, 5), // 0x00: R0 = 5 (counter)
asm_builder.mov_imm(1, 0), // 0x02: R1 = 0
// loop at 0x04:
asm_builder.sub_imm(0, 1), // 0x04: R0 -= 1
asm_builder.cmp_reg(0, 1), // 0x06: R0 == 0?
asm_builder.jne(0x04), // 0x08: if not, loop
asm_builder.halt(), // 0x0A
};
vm.loadProgram(&prog);
vm.run();
try testing.expectEqual(@as(u16, 0), vm.regs.get(0));
}
Each test is minimal -- one behavior per test, clear setup, clear assertion. The JNE test is particularly instructive: it puts a mov_imm(2, 99) after the branch. If the branch works correctly, that instruction gets skipped and R2 stays at 0. If the branch is broken, R2 becomes 99. You can read the test as a specification of the instruction's behavior.
The CALL/RET test verifies the full stack-based subroutine mechanism. The main program sets R0 to 7, calls a subroutine that adds 3, and returns. R0 should be 10. If the return address wasn't saved/restored correctly, the program would either crash (invalid PC) or skip the HLT and keep running. Both outcomes are detectable.
Wat we geleerd hebben
- An instruction set architecture (ISA) defines which operations a processor supports -- we built one with 16 instructions in 4 opcode bits, covering arithmetic, comparison, branching, memory access, and stack operations
- Register files give the CPU fast storage -- our 8 general-purpose registers plus PC, SP, and flags mirror the structure of real processors, scaled down to teaching size
- Instruction encoding packs opcodes and operands into fixed-size binary words -- our 16-bit format uses a mode bit to distinguish register-register from register-immediate operations
- Packed structs in Zig make bit-level encoding natural --
@bitCastbetween the struct and u16 is zero-cost, and the compiler enforces field widths at the type level - A virtual machine is a fetch-decode-execute loop -- fetch the next instruction from memory at PC, decode the bit fields, execute the operation, repeat until HLT
- Branch instructions (JMP, JEQ, JNE) change PC instead of incrementing it, enabling loops and conditional logic from just two building blocks: comparison and conditional jump
- CALL/RET enable subroutines by pushing and popping the return address on a stack -- this is how every function call in every language ultimately works at the hardware level
- Exhaustive tests over all opcodes, all register combinations, and all immediate values catch encoding bugs that spot checks would miss
This is the first episode of Project G. We have a working instruction encoder and a VM that executes programs, but writing programs by hand-computing addresses is quite some work and painful. Next time we build a proper two-pass assembler that reads text mnemonics (MOV R0, #42) and produces encoded binary -- with label support so you can write JNE loop_start instead of JNE 0x06. That will close the gap between human-readable assembly and machine code, which is what assemblers have been doing since the 1950s.
Bedankt en tot de volgende keer!