Crocodile is a virtual machine I wrote for a project I was writing called VirtualGuard. I did not use it because as you will see, there are a lot of bugs and compatibility issues. Despite this being the case, I still do believe that the protection of CrocodileVM was pretty good; and that it was a ways ahead of any 1:1 virtualizer. It was heavily inspired by KoiVM, so thanks to the creator of that for making that beast of a virtualizer.
After writing a few 1:1 virtual machines, I wanted to make things a bit more complicated. This first came with some really rough implementations of a pseudo register-based vm. I say pseudo because this that essentially the equivalent of adding a bunch of weird conventions for opcodes, and not an actual register implementation. Going back to CrocodileVM, I realized that the only way to truly make a vm secure is to not have a “ratio”. People write virtualizers that have 1:1 conversions, 2:1, 1:2, 1:1.5, etc. If there is a ratio, there is an easy conversion in each of these, because as in the name of it, it is an x to y conversion.
The concepts present in KoiVM seriously peaked my interest at this time. The idea of being able to have what is essentially a compiler, with all that that entails. Building something like KoiVM would take a serious amount of time and knowledge, but I wanted to try it. With having no real knowledge about proper compiler architecture, or even how KoiVM really worked at the time, I started.
Through looking through the sourcecode of KoiVM and other projects, I crafted a general understanding of basic compiler structure. How the compiler worked in CrocodileVM was as follows:
In this VM’s runtime, I wanted to use a few key concepts.
The runtime would not use any normal objects in its code. For example, normally you could define things as integers, floats, strings, whatnot. In CrocVM’s runtime, there is one object that I refer to as a VMValue. Almost all functions use this VMValue, and it functions as almost a proxy to a type definition.
In the CrocVM compiler, stack variables and locals are one and the same. Some things are removed entirely, such as occurrences of the ‘dup’ opcode. The ‘dup’ opcode duplicates a value on the stack. When being converted to a register format, it becomes apparent that things like this will become obsolete.
ldc.i4 12 // push 12 to the stack
dup // duplicates 12 value, so stack now has two 12 values
add // add these two and push output to stack
ret // return
mov eax, 12 // move value 12 into register eax
add eax, eax // add the value in eax into the value located in eax
ret eax // return the value in eax
Note: this example could be flattened down into return 24.
You can see in this example that the dup can be removed, because the register system essentially makes everything into a local variable. This is a strength because it makes it so many different inputs can go into the same output, creating the conclusion that some information from the original il cannot be recovered.
I’m not entirely sure I should have made this a key point, but in CrocVM the stack in barely used. This is because of CrocVM’s usage of registers. The opcodes ‘stloc’ and ‘ldloc’ are actually replaced with null. This is because when tracing where a variable came from, the only real thing that matters is what register it was assigned to. Once the virtualizer can figure out the source converted register of a ldloc or stloc, it can just directly replace it with a reference to said register. Going back to stack usage, the stack is only used for calling.
I did not finish this post. Instead, I continued with the development of VirtualGuard and ended up rewriting it entirely. For more info, view the projects page!