Skip to content

Complete month of February 2024

It has been a while...

Love is not dead! It just got distracted.


Yes, I have been gone for a while. More than a year, in fact. The project – lovem – is not dead, however. In fact, I even have multiple posts already written, that I just need to publish. So let's start doing that. After this short intermission, I will publish an additional entry in the journey, that will take us further along the path to creating our VM.

To be quite honest – I dated this entry back to yesterday. The reason is, that my journal, as I currently run it, does not really support multiple entries on the same day. Yes, I could simply add a time to the publication date, but that breaks continuity. And I don't plan to normally release multiple entries on the same day, as I want to keep the pace not too high. One post every two or three days is what I aim for, just the way I used to have it.

A few things have changed in the meantime. For reasons that I have no desire to explain, I have removed the link to my Twitter account from the journal, and replaced it with a link to my Mastodon account. You can find me under @kratenko@chaos.social there. I also used to announce new entries over Twitter. I guess I will move that over to Mastodon as well. I guess we will see how that goes.

But now let's get back to the journey. We will next implement a simple feature, that makes allows the VM to limit the processing time of a program – which can be very useful, especially when running user supplied code inside the machine. Building an endless loop inside a turing complete (or not even) language is quite easy. Having an embedded device stuck in an endless loop is often a catastrophe...

Stop right there, that's far enough!

We introduce an optional execution limit to our VM.


Since we have goto, we can write looping programs. With if* we have potentially looping programs as well. Both of this open the potential for endless loops. There are situations, in which endless loops are required. But often they are something to be avoided.

Looping a long time

Let us look at a little program:

pgm/long-loop.lva
## Looping a looooong time.
## This program will not run forever, but you will not see it terminate either.
  push_u8 0
loop:
  push_u8 1
  add
  dup
  ifgt loop
  pop
  fin

Someone messed up the loop condition there. If you run this program, it will be running for a long time. We start at zero and add to the value until our number is smaller than 0. Sounds impossible to reach for normal people, programmers will now better. Eventually we will reach the integer overflow, and our signed integer will loop around from its highest possible value to the lowest possible one. But do remember, what type we currently use to store our values: i64. So how big is that highest number?

9223372036854775807

Is that a lot? That depends. Last entry I had my program loop for 1 million rounds. It took my modern laptop about half a second. So reaching that number should take 9223372036854.775807 times as long, that is around 4611686018427 seconds or just about 146135 years. Is that a lot?

Oh, and by the way, the Rust professionals reading this will have spotted a potentially false claim there. While we run our program in debug mode, there will be no integer wraparound, instead the program will panic. If we build our Rust program in release mode, we will have integer wraparound, and will (theoretically) eventually reach the end of our loop. But that is besides the point.

Limited execution

The reason I started writing lovem, is that I need an embeddable lightweight VM to execute programmable handlers when certain events occur on my restrained embedded devices. So we are talking about some form of user generated content that is executed as a program! We can never trust those programs to be solid. We need a way to limit execution in some way, so that the device has the possibility to terminate those programs. There is an easy way to achieve that with what we already have. We put a limit on the number of operations the VM will execute.

We add a few lines to our VM's main loop:

src/vm.rs
// Loop going through the whole program, one instruction at a time.
loop {
    // Log the vm's complete state, so we can follow what happens in console:
    if self.trace {
        println!("{:?}", self);
    }
    // Fetch next opcode from program (increases program counter):
    let opcode = self.fetch_u8(pgm)?;
    // Limit execution by number of instructions that will be executed:
    if self.instruction_limit != 0 && self.op_cnt >= self.instruction_limit {
        return Err(RuntimeError::InstructionLimitExceeded);
    }
    // We count the number of instructions we execute:
    self.op_cnt += 1;
    // If we are done, break loop and stop execution:
    if opcode == op::FIN {
        break;
    }
    // Execute the current instruction (with the opcode we loaded already):
    self.execute_op(pgm, opcode)?;
}

And of course we also add that new RuntimeError::InstructionLimitExceeded and a new field pub instruction_limit: usize, to our VM struct.

lovas gets a new optional parameter:

src/bin/lovas.rs
##[clap(long, default_value_t = 1000000, help = "Limit max number of instructions allowed for execution. 0 for unlimited.")]
instruction_limit: usize,

And we need to pass that to the VM in the run() function:

src/bin/lovas.rs
/// Executes a program in a freshly created lovem VM.
fn run(pgm: &Pgm, args: &Cli) -> Result<()> {
    // Create our VM instance.
    let mut vm = VM::new(args.stack_size);
    vm.trace = args.trace;
    vm.instruction_limit = args.instruction_limit;
    let start = Instant::now();
    let outcome = vm.run(&pgm.text);
    let duration = start.elapsed();
...

And, well, that's it. We now have an optional execution limitation that we default at 1 million.

Testing it

kratenko@jotun:~/git/lovem$ cargo run --bin lovas -- -r pgm/long-loop.lva --print
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/lovas -r pgm/long-loop.lva --print`
Pgm { name: "pgm/long-loop.lva", text: [2, 0, 2, 1, 16, 3, 37, 255, 249, 1, 255] }
Runtime error!
Runtime=142.400812ms
op_cnt=1000000, pc=7, stack-depth=2, watermark=2
Error: InstructionLimitExceeded

We can adjust it easily:

kratenko@jotun:~/git/lovem$ cargo run --bin lovas -- -r pgm/long-loop.lva --print --instruction-limit=100
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/lovas -r pgm/long-loop.lva --print --instruction-limit=100`
Pgm { name: "pgm/long-loop.lva", text: [2, 0, 2, 1, 16, 3, 37, 255, 249, 1, 255] }
Runtime error!
Runtime=19.096µs
op_cnt=100, pc=7, stack-depth=2, watermark=2
Error: InstructionLimitExceeded

And we can just as well disable it completely:

kratenko@jotun:~/git/lovem$ cargo run --bin lovas -- -r pgm/long-loop.lva --print --instruction-limit=0
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/lovas -r pgm/long-loop.lva --print --instruction-limit=0`
Pgm { name: "pgm/long-loop.lva", text: [2, 0, 2, 1, 16, 3, 37, 255, 249, 1, 255] }

Good luck waiting for this one. I hope you know how to terminate a running program on your system...

We have variables!

A stack alone is not that mighty. But now we can stow data away.


I implemented variables for the VM. And I did it in a way, that will freak out programmers, who have only ever worked with high languages in well-behaved environments – we now have variable support, but for global variables only.

Why would I do that? Well, it was easy. You might be surprised how easy it was. And it helps a lot in having something useful. For what I am going for, it would actually be a viable thing, too. You could do a lot. But don't worry, I want local variables, too.

Tell me were to stick it

Variables need to live somewhere. When I first talked about stack machines, I said that "no other direct manipulations of the stack [were] allowed [but push or pop]." We will now see, why I added the direct there.

Variables hold values; words, to be more precise. We have an entity, that can hold an arbitrary number of words: the stack. So, what is the idea? When I write a program, I will know how many variables it will need. Actually, our assembler now can do that for us. When I pass the program to the VM for execution, it looks at that number, and pushes that many zeros on the stack. Then it marks the current stack position as the new bottom. It does that by the newly introduces special Frame Base Register (FB).

What's with that funny name? This is something I will need later, when I introduce real function calls inside the VM. A call will create a new frame that is somewhat like a new local execution environment. This will also allow for local variables (told ya, I want those). But for now we have up to 256 global variables at our disposal. That is quite a bit.

Variable operations

There are two new operations for handling global variables:

  • store: pop a value from the stack and store it in the global variable identified by the 1-byte oparg.
  • load: read value from the global variable identified by the 1-byte oparg and push it to the stack.

Variables in the assembler

This took more work than the changes in the VM. That is good, because we want to hide complexity away from the VM. The assembler runs on a powerful computer, and typically programs are run more often than they are assembled/compiled. I want named variables in assembler source. The VM works only with numbers to identify them. Our assembler translates that for us.

store and load each take the name of a variable as argument. When the assembler finds a new variable name, it is assigned a number (starting at 0). We actually just chunk them in a Vector and run through it everytime. We only support 256 variables, so there is no need to optimise there. It's fast enough. The index number is written as u8 as a single byte oparg. I leave it to you to look at the new source code in asm.rs this time. It is not too hard, and you should know enough Rust by now.

A new Program

There is more information to store for a Program now, than only the text (aka. the bytecode): the global variables. The information we store is just the number of variables the program has. That is all we need, we are not interested in their names. And it is the bytecode's responsibility, to access the correct variables.

But since we now need that information in the VM, we finally change the parameter passed to run() from &[u8] to &Pgm. That is what caused the most changes inside vm.rs. The real additions are few.

Variables in the VM

The VM itself gets a new field: fb: usize. That is the frame base register, and it currently does nothing but point to the position inside the stack behind the last global variable. So with zero variables, nothing changes. We also add RuntimeError::InvalidVariable.

Initialising the VM now includes making space for the variables:

src/vm.rs
// create global variables in stack:
for _ in 0..pgm.vars {
    self.push(0)?;
}
self.fb = pgm.vars as usize;

Popping values now needs to respect the frame base register, so it now looks this:

src/vm.rs
/// Tries and pops a value from value stack, respecting frame base.
fn pop(&mut self) -> Result<i64, RuntimeError> {
    if self.stack.len() > self.fb {
        Ok(self.stack.pop().unwrap())
    } else {
        Err(RuntimeError::StackUnderflow)
    }
}

And we need operation handlers, of course:

src/vm.rs
op::STORE => {
    let idx = self.fetch_u8(pgm)?;
    if idx >= pgm.vars {
        Err(RuntimeError::InvalidVariable)
    } else {
        let v = self.pop()?;
        self.stack[idx as usize] = v;
        Ok(())
    }
},
op::LOAD => {
    let idx = self.fetch_u8(pgm)?;
    if idx >= pgm.vars {
        Err(RuntimeError::InvalidVariable)
    } else {
        self.push(self.stack[idx as usize])?;
        Ok(())
    }
},

That's it. We now support variables!

Show me your values

I added another operation with the opname out. It pops a value from the stack and prints it to stdout. This is not an operation that you would normally want in your VM. Output should be generated by function calls. But we don't have those, yet. I want something to easily show values during development, so you can see what happens, without always using --trace. We can always remove it, later. There is nothing new to that operation, so I won't discuss the code here.

A new program!

pgm/duplicate.lva
## A program demonstrating use of variables.
start:
    # val = 1
    push_u8 1
    store val   # variable is declared implicitly here. We only have one type: i64
    # for loop, 5 rounds:
    push_u8 5
loop:
    # val = val * 2:
    load val
    push_u8 2
    mul
    store val
    # check loop counter:
    push_u8 1
    sub
    dup
    ifgt loop
end:
    pop
    # output final value of val
    load val
    out
    fin

The program is documented with comments. And you might have noticed that I define labels that I never use. I just want to structure the program and name its parts. We don't have functions, so I use what we have.

kratenko@jotun:~/git/lovem$ cargo run --bin lovas -- -r pgm/duplicate.lva --print
   Compiling lovem v0.0.13 (/home/kratenko/git/lovem)
    Finished dev [unoptimized + debuginfo] target(s) in 2.66s
     Running `target/debug/lovas -r pgm/duplicate.lva --print`
Pgm { name: "pgm/duplicate.lva", text: [2, 1, 4, 0, 2, 5, 5, 0, 2, 2, 18, 4, 0, 2, 1, 17, 3, 37, 255, 242, 1, 5, 0, 6, 255], vars: 1 }
Out: 32 (@46)
Terminated.
Runtime=18.156µs
op_cnt=47, pc=25, stack-depth=1, watermark=4

It outputs a 32. That is good, because we start with a 1 and multiply it by 2 five times. We can write programs!

Oh... and a bugfix

I found out that I introduced a bug when writing the parsing label definitions. I parsed for the colon :, before I removed the comments. So a line with no label definition, but with a comment containing a colon did produce a parsing error.

## this was fine
:label # this was fine
:another # even this : was fine
## but this would produce an error: just a colon in a comment

I fixed that by removing comments from lines first.