Overview

This documentation does not reflect the actual current implementation state

Words

Wherever possible we try to stick to keeping everything as a 64 bit value. Throughout the specifications of the VM, a word should be interpreted to mean a 64 bit value. Similarly a half word (or hword) should be interpreted to mean a 32 bit value.

Instructions

See Is.md

Registers

Each core has a bank of 8 general purpose registers, as well as a couple of internal registers. The general purpose registers are called a, b, c, d, e, f, x, and y. The internal registers are called l, u, and n.

  • l: exec pointer
  • u: flags and stuff
  • n: stack pointer

Privledge

The VM has a few different priviledge rankings that can be used to isolate processes.

Memory

Memory is word addressable, and uses a word as its address space. This gives a theoretical maximum memory space of 2^64 words of memory. This gives us a theoretical maximum of 32 exabytes of data, which is large.

2^10 * 2^10 * 2^10 * 2^10 * 2^10 * 2^10 * 2^4
^ K    ^ M    ^ G    ^ T    ^ P    ^ E    ^ 8    ^ 8 (64 bits -> 8 bytes)

Architecture

Registers

x   -> 1000       (4-bit word encoding)
x   -> 1110_1000  (8-bit full encoding)
xh0 -> 1100_1000
xh1 -> 1101_1000
xq0 -> 1001_1000
xq3 -> 1011_1000
xb0 -> 0000_1000
xb7 -> 0111_1000
16 registers                       16
16 registers * 2 half indexes    + 32  = 48
16 registers * 4 quarter indexes + 64  = 112
16 registers * 8 byte indexes    = 128 = 240

Could theoretically be coded as 8 bits... is saving the 1 bit worth it? 🤔

00000000 .. 01111111 -> pb0 .. fb7 (0rrrriii)
10000000 .. 10111111 -> pq0 .. fq3 (10rrrrii)
11000000 .. 11011111 -> ph0 .. fh1 (110rrrri)
11100000 .. 11101111 -> p   .. f   (1110rrrr)

...or a slightly easier to parse version...

00000000 .. 01111111 -> pb0 .. fb7 (0iiirrrr)
10000000 .. 10111111 -> pq0 .. fq3 (10iirrrr)
11000000 .. 11011111 -> ph0 .. fh1 (110irrrr)
11100000 .. 11101111 -> p   .. f   (1110rrrr)
11110000 ..          -> Impossible (used for decoding)
11110001 .. 11111011 -> undefined behavior
11111100 .. 11111111 -> fwi        (full word immediate)
11111111             -> None       (rarely useful, but worth having)

...which is nice because the physical register is always in the same spot! Oh, and the index is just the remainder of dividing the highest 4 bits by the number of register indexes.

Integers

  • i8
  • u8
  • i16
  • u16
  • i32
  • u32
  • i64
  • u64

Integers are encoded in the usual way. Unsigned integers count up starting from zero, and signed integers are stored in two's compliment fashion.

Floats

  • f32
  • f64

Floating point numbers are stored using the typical IEEE 754 standard.

Words

  • byte (8 bits)
  • qword (16 bits)
  • hword (32 bits)
  • word (64 bits)

Words are just generic bits of data, without any preassigned meaning. They can be whatever you want! Might be text, might represent a color, or something more!

Immediates

Word immediate

|--------|--------|--------|--------|--------|--------|--------|--------|
 i....... ........ ........ ........ ........ ........ ........ ........

Hword immediate

|--------|--------|--------|--------|
 i....... ........ ........ ..r.....

|--------|--------|--------|--------|
 i....... ........ ........ ........

|--------|--------|--------|--------|
 00000000 00000000 i....... ........

|--------|--------|--------|--------|
 00000000 00000000 00000000 i.......
  • i is the binary content of the immediate value
  • r is an unsigned 6-bit value that indicates how much i should be left-rotated to form the true value of the encoded immediate (only applicable to hword immediates which are encoding a word value.

The general logic behind this encoding is...

  • If the size of the word being stored is less than or equal to the available size for encoding the immediate, it should be stored exactly, aligned to the least significant digits, and padded with zeroes in the unused most significant digits, including cases where the value is a signed integer.

  • If the size of the word being stored is greater than the available size for encoding the immediate, then the first n least significant bits should be used to store an unsigned integer indicating how many digits to the left the partial value should be rotated, where s is the size of the word in bits and n is ⌈log2(s)⌉, and the remaining available bits should be used to store exact binary data.

A few examples...

  • An i8 with a value of -1 would be encoded as 00000000 00000000 00000000 11111111
  • An i64,u64 with a value of 1 could be encoded as 00000000 00000000 00000000 01000000
  • An i64,u64 with a value of 0xf0 could be encoded as 00000000 00000000 00000011 11000100
  • A u64 with a value of 0x8000000000000001 could be encoded as 00000000 00000000 00000000 11111111

Caveats

  • f64 values cannot be stored in an hword immediate, and must be addressed as a fwi.

In general the structure of the impl names is {opname}_{datatype}_{operands}

  • datatype is usually one of...

    • i8
    • u8
    • i16
    • u16
    • i32
    • u32
    • i64
    • u64
    • b (byte, 8 bits)
    • q (quarter-word, 16 bits)
    • h (half-word, 32 bits)
    • w (word, 64 bits)
  • operand is usually one of...

    • i (immediate value, encoded in the instruction)
    • r (register)

Instructions

add

  • add_i8_rir
  • add_i8_rrr
  • add_u8_rir
  • add_u8_rrr
  • add_i16_rir
  • add_i16_rrr
  • add_u16_rir
  • add_u16_rrr
  • add_f32_rir
  • add_f32_rrr
  • add_i32_rir
  • add_i32_rrr
  • add_u32_rir
  • add_u32_rrr
  • add_f64_rir
  • add_f64_rrr
  • add_i64_rir
  • add_i64_rrr
  • add_u64_rir
  • add_u64_rrr

and

  • and_b_rrr
  • and_q_rrr
  • and_h_rrr
  • and_w_rrr

br

  • br_x_i

cmp

  • cmp_b_r
  • cmp_b_rr
  • cmp_q_r
  • cmp_q_rr
  • cmp_h_r
  • cmp_h_rr
  • cmp_w_r
  • cmp_w_rr

div

  • div_i8_rirr
  • div_i8_rrrr
  • div_u8_rirr
  • div_u8_rrrr
  • div_i16_rirr
  • div_i16_rrrr
  • div_u16_rirr
  • div_u16_rrrr
  • div_f32_rir
  • div_f32_rrr
  • div_i32_rirr
  • div_i32_rrrr
  • div_u32_rirr
  • div_u32_rrrr
  • div_f64_rir
  • div_f64_rrr
  • div_i64_rirr
  • div_i64_rrrr
  • div_u64_rirr
  • div_u64_rrrr

exp

  • exp_i8_rir
  • exp_i8_rrr
  • exp_u8_rir
  • exp_u8_rrr
  • exp_i16_rir
  • exp_i16_rrr
  • exp_u16_rir
  • exp_u16_rrr
  • exp_f32_rir
  • exp_f32_rrr
  • exp_i32_rir
  • exp_i32_rrr
  • exp_u32_rir
  • exp_u32_rrr
  • exp_f64_rir
  • exp_f64_rrr
  • exp_i64_rir
  • exp_i64_rrr
  • exp_u64_rir
  • exp_u64_rrr

halt

  • halt_x_x

mul

  • mul_i8_rir
  • mul_i8_rrr
  • mul_u8_rir
  • mul_u8_rrr
  • mul_i16_rir
  • mul_i16_rrr
  • mul_u16_rir
  • mul_u16_rrr
  • mul_f32_rir
  • mul_f32_rrr
  • mul_i32_rir
  • mul_i32_rrr
  • mul_u32_rir
  • mul_u32_rrr
  • mul_f64_rir
  • mul_f64_rrr
  • mul_i64_rir
  • mul_i64_rrr
  • mul_u64_rir
  • mul_u64_rrr

mv

  • mv_b_rr
  • mv_q_rr
  • mv_h_rr
  • mv_w_rr

not

  • not_b_rr
  • not_q_rr
  • not_h_rr
  • not_w_rr

or

  • or_b_rrr
  • or_q_rrr
  • or_h_rrr
  • or_w_rrr

pop

  • pop_w_r

push

  • push_w_r

put

  • put_b_i
  • put_b_r
  • put_w_r

rotl

  • rotl_b_ri
  • rotl_b_rr
  • rotl_q_ri
  • rotl_q_rr
  • rotl_h_ri
  • rotl_h_rr
  • rotl_w_ri
  • rotl_w_rr

rotr

  • rotr_b_ri
  • rotr_b_rr
  • rotr_q_ri
  • rotr_q_rr
  • rotr_h_ri
  • rotr_h_rr
  • rotr_w_ri
  • rotr_w_rr

shiftl

  • shiftl_b_ri
  • shiftl_b_rr
  • shiftl_q_ri
  • shiftl_q_rr
  • shiftl_h_ri
  • shiftl_h_rr
  • shiftl_w_ri
  • shiftl_w_rr

shiftr

  • shiftr_b_ri
  • shiftr_b_rr
  • shiftr_q_ri
  • shiftr_q_rr
  • shiftr_h_ri
  • shiftr_h_rr
  • shiftr_w_ri
  • shiftr_w_rr

slp

  • slp

sl

  • sl_b_rir
  • sl_b_rrr
  • sl_q_rir
  • sl_q_rrr
  • sl_h_rir
  • sl_h_rrr
  • sl_w_rir
  • sl_w_rrr

sr

  • sr_b_rir
  • sr_b_rrr
  • sr_q_rir
  • sr_q_rrr
  • sr_h_rir
  • sr_h_rrr
  • sr_w_rir
  • sr_w_rrr

ser

  • ser_b_rir
  • ser_b_rrr
  • ser_q_rir
  • ser_q_rrr
  • ser_h_rir
  • ser_h_rrr
  • ser_w_rir
  • ser_w_rrr

sub

  • sub_i8_rir
  • sub_i8_rrr
  • sub_u8_rir
  • sub_u8_rrr
  • sub_i16_rir
  • sub_i16_rrr
  • sub_u16_rir
  • sub_u16_rrr
  • sub_f32_rir
  • sub_f32_rrr
  • sub_i32_rir
  • sub_i32_rrr
  • sub_u32_rir
  • sub_u32_rrr
  • sub_f64_rir
  • sub_f64_rrr
  • sub_i64_rir
  • sub_i64_rrr
  • sub_u64_rir
  • sub_u64_rrr

xor

  • xor_b_r
  • xor_b_rrr
  • xor_q_r
  • xor_q_rrr
  • xor_h_r
  • xor_h_rrr
  • xor_w_r
  • xor_w_rrr

Prologue

  • curly braces ({ and }) are used to denote a fixed set of options that can be chosen from usually separated by commas
    • {a,b} would mean: in this location, must either be "a" or be "b"
    • {a} would mean: in this location, must be "a"
  • numeric represents i8, u8, i16, u16, f32, i32, u32, f64, i64, and u64
    • often used as {numeric} to mean: in this location, must be one of the numeric data types
    • an f, an i, or a u can also be used in place of numeric in places where the data size can be inferred (usually from the operands of an instruction)
  • integer represents i8, u8, i16, u16, i32, u32, i64, and u64
  • float represents f32, or f64

add.{numeric} $r1 #i

add.{numeric} $r1 #i $rr

|--------|--------|--------|--------|--------|--------|--------|--------|
 00000000 0000000s rr...... r1...... i....... ........ ........ ........

|--------|--------|--------|--------|--------|--------|--------|--------|
 00000000 0000000s rr...... 11110000 r1...... 00000000 00000000 00000000
 i....... ........ ........ ........ ........ ........ ........ ........
  • s indicates whether the operands are signed (1) or not (0)
  • r1 The first operand register
  • i The inlined immediate value
  • rr The register to store the result in — Defaults to r1 if omitted

Adds together the values from r1 and i, stores the result in rr.

add.{numeric} $r1 $r2

add.{numeric} $r1 $r2 $rr

|--------|--------|--------|--------|--------|--------|--------|--------|
 00000000 0000001s r1...... r2...... rr...... 00000000 00000000 00000000
  • s indicates whether the operands are signed (1) or not (0)
  • r1 The first operand register
  • r2 The second operand register
  • rr The register to store the result in — Defaults to r1 if omitted

Adds together the values from r1 and r2, stores the result in rr.

div.{float} $r1 #i

div.{float} $r1 #i $rr

|--------|--------|--------|--------|--------|--------|--------|--------|
 00000000 0000000s r1...... i....... ........ ........ ........ rr......

|--------|--------|--------|--------|--------|--------|--------|--------|
 00000000 0000000s r1...... 11111110 rr...... 00000000 00000000 11110000
 i....... ........ ........ ........ ........ ........ ........ ........
  • s indicates whether the operands are signed (1) or not (0)
  • r1 The first operand register
  • i The inlined immediate value
  • rr The register to store the result in — Defaults to r1 if omitted

Divides the value from r1 by the value from i, stores the result in rr.

div.{float} $r1 $r2

div.{float} $r1 $r2 $rr

|--------|--------|--------|--------|--------|--------|--------|--------|
 00000000 0000000s r1...... r2...... rr...... 00000000 00000000 00000000
  • s indicates whether the operands are signed (1) or not (0)
  • r1 The first operand register
  • r2 The second operand register
  • rr The register to store the result in — Defaults to r1 if omitted

Divides the value from r1 by the value from r2, stores the result in rr.

div.{integer} $r1 #i $rr -

div.{integer} $r1 #i - $rm

div.{integer} $r1 #i $rr $rm

|--------|--------|--------|--------|--------|--------|--------|--------|
 00000000 000000ms r1...... i....... ........ ........ ........ rr......

|--------|--------|--------|--------|--------|--------|--------|--------|
 00000000 0000000s r1...... 11111110 rr...... rm...... 00000000 11110000
 i....... ........ ........ ........ ........ ........ ........ ........

Divides the value from r1 by the value from i, stores the quotient in rr and the remainder (otherwise called the modulus) in rm.

div.{integer} $r1 $r2

div.{integer} $r1 $r2 $rr -

div.{integer} $r1 $r2 - $rm

div.{integer} $r1 $r2 $rr $rm

|--------|--------|--------|--------|--------|--------|--------|--------|
 00000000 0000001s r1...... r2...... rr...... rm...... 00000000 00000000
  • s indicates whether the operands are signed (1) or not (0)
  • r1 The first operand register
  • r2 The second operand register
  • rr The result storage register for the quotient — Defaults to r1 if omitted
  • rm The result storage register for the remainder — Defaults to r2 if omitted

Divides the value from r1 by the value from r2, stores the quotient in rr and the remainder (otherwise called the modulus) in rm.

General design

Many instructions have variants (denoted by a . and the variant name). As an example, the typical signed integer addition instruction is written as add.i64. The instruction is add and the variant is i64, meaning it is intended to operate on signed 64 bit values. Some variants can be deduced by the register types used as operands, but some, like for signed and unsigned arithmetic, are required.

Syntax

The descriptions given make use of a variant of Lunar Assembly (LA) to show the options of how each instruction might be written in assembly by an engineer (and in turn assembled to Lunar Machine-code (LM)).

{a,b,c,...} means one of any of the comma separated values

add.{i64,u64} -> add.i64 | add.u64

:_ means one of any of the general purpose registers (a, b, c, d, e, f, x, y)

:! means one of any of the internal registers (l, u, n)

:a means the a register, :b means the b register, etc. :l means the l internal register, :u means the u internal register, etc.

# means any number (in any of the supported forms)

  • Decimal numbers can be used unprefixed
  • Hex numbers should use the 0x prefix
  • Binary numbers should use the 0b prefix

Binary representation

Opcode

Lun is designed to have a somewhat simple instruction set, and most importantly is designed to be easily (relatively) understood by a human, since it's mostly just for fun. As such, the opcode refers to the most significant two bytes of the first word of an instruction. Variants should be determined by the next four most significant bits (or in some cases, the two most significant bits of the opcode, which determine instruction size)

Registers

A register is always represented by 4 bits.

0000 l
0001 u
0010 n
------
1000 x
1001 y
1010 a
1011 b
1100 c
1101 d
1110 e
1111 f

It is worthwhile to note 2 things

  • The binary representation of registers a-f corresponds with their representation in hexadecimal. i.e. a in hexidecimal is 1010 in binary, which is the binary value that represents the a register.
  • All of the general purpose registers are represented with a most-significant bit of 1. This also means that all of the internal registers are represented with a most-significant bit of 0.

Instruction size

Instructions are all stored as words. Some instructions can be stored across multiple words, up to 5. The number of words in an instruction are encoded as part of the opcode, which is always contained in the first word.

00 - 1 word 01 - 2 words 10 - 4 words 11 - 5 words

Arithmetic

clr|xor :_

|--------|--------|--------|--------|--------|--------|--------|--------|
 00000000 00000100 r.......{00000000 00000000 00000000 00000000 00000000}

: r...  The register to clear

Shorthand for xor :_ :_ :_ (useful for zeroing a register)

xor :_ :_ :_

|--------|--------|--------|--------|--------|--------|--------|--------|
 00000000 00000101 r1...... r2...... rr......{00000000 00000000 00000000}

: r1..  The first operand register
: r2..  The second operand register
: rr..  The result storage register

Performs an exclusive or of the numbers from the first and second given registers and stores them in the third register.

Memory

i.{at,by} :_ # i.at :_ :_ i.by :_ [label]

|--------|--------|--------|--------|--------|--------|--------|--------|
 00100000 00000000 10000000 ri..0000 dddddddd dddddddd dddddddd dddddddd

: ri.. The register to inload from memory
: d    The offset from the address of the word containing this instruction

|--------|--------|--------|--------|--------|--------|--------|--------|
 00100000 00000000 11000000 ri..r@.. 00000000 00000000 00000000 00000000

: ri.. The register to inload from memory
: r@.. The register containing the address in memory to read from

|--------|--------|--------|--------|--------|--------|--------|--------|
 01100000 00000000 10000000 ri..0000 00000000 00000000 00000000 00000000
 @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@

: ri.. The register to inload from memory
: @    The address in memory to read from

Reads a value from an address in memory into the specified register

o.{at,by} :_ # o.at :_ :_ o.by :_ [label]

|--------|--------|--------|--------|--------|--------|--------|--------|
 00010000 00000000 10000000 ro..0000 dddddddd dddddddd dddddddd dddddddd

: ro.. The register to outload to memory
: d    The offset from the address of the word containing this instruction

|--------|--------|--------|--------|--------|--------|--------|--------|
 00100000 00000000 11000000 ro..r@.. 00000000 00000000 00000000 00000000

: ro.. The register to outload to memory
: r@.. The register containing the address in memory to write to

|--------|--------|--------|--------|--------|--------|--------|--------|
 01010000 00000000 10000000 ro..0000 00000000 00000000 00000000 00000000
 @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@ @@@@@@@@

: ro.. The register to outload to memory
: @    The address in memory to write to

Writes a value from the specified register into memory at an address

Registers

  • One of the registers could be used to store the thread/core number? Timers?
  • Instructions to push/pop entire register set states on and off of the stack

Hypervisor

  • Enable a "hypervisor" to schedule bounded work on cores (i.e. must return eventually)
  • Limit its access to a certain parts of memory

Threading

  • Enable a single VM to have multiple "cores"

Cores

  • In addition to a register bank, I think each core should have some working memory? Perhaps like, 4kb?

Bus

  • What would a minimal "bus" implementation look like?

Sys

  • What should the Lun binary/executable format look like?

Instructions

  • Should instructions have condition predicates built in (like ARM)?

  • Instructions should be as wide reaching as possible. If something can already be done with a single instruction, don't make a more specific one. Sugar and shorthands can be handled at assembly time, so there's no need to make the decoding and runtime logic more complex than it needs to be by duplicating functionality.