< /posts/ />
Writing a register based .NET virtualizer, and how it was cracked.
2023-3-30

#Introduction

Crocodile is a virtual machine I wrote for a project I was writing called VirtualGuard. I did not use it because as you will see, there are a lot of bugs and compatibility issues. Despite this being the case, I still do believe that the protection of CrocodileVM was pretty good; and that it was a ways ahead of any 1:1 virtualizer. It was heavily inspired by KoiVM, so thanks to the creator of that for making that beast of a virtualizer.

#Context

After writing a few 1:1 virtual machines, I wanted to make things a bit more complicated. This first came with some really rough implementations of a pseudo register-based vm. I say pseudo because this that essentially the equivalent of adding a bunch of weird conventions for opcodes, and not an actual register implementation. Going back to CrocodileVM, I realized that the only way to truly make a vm secure is to not have a “ratio”. People write virtualizers that have 1:1 conversions, 2:1, 1:2, 1:1.5, etc. If there is a ratio, there is an easy conversion in each of these, because as in the name of it, it is an x to y conversion.

#More Context (moving along though)

The concepts present in KoiVM seriously peaked my interest at this time. The idea of being able to have what is essentially a compiler, with all that that entails. Building something like KoiVM would take a serious amount of time and knowledge, but I wanted to try it. With having no real knowledge about proper compiler architecture, or even how KoiVM really worked at the time, I started.

#The Compiler

Through looking through the sourcecode of KoiVM and other projects, I crafted a general understanding of basic compiler structure. How the compiler worked in CrocodileVM was as follows:

  • The target method’s body is parsed into blocks of code, based on branching and exception handlers.
  • An “AST” is generated. “AST” is an acronym for the term “Abstract Syntax Tree”. In my implementation, this was essentially a phase where I would convert each instruction into an expression. Each expression would have some analysis done on it, to figure out which other instructions it links to. This is necessary because the .NET interpreter fundamentally just runs instructions, and does not require a understanding of what the code does. To convert this original MSIL into something it’s not, I need a full understanding on what the code body actually does, and not just a single-instruction understanding.
  • The AST is then used to create an IR (intermediary representation). In this VM, this is a set of relatively general instructions. The main purpose of the IR in my VM is as a space where registers can be allocated, and unnecessary instructions pruned. This was probably the most lengthy part of my development, due to the fact that all of the main conversions were happening here.
  • This newly created IR is now used to create the final VMIL. In my VMIL, I had more verbosely typed instructions. For example, in my IR I could have the instruction ‘MOV EAX, MY_METHOD’. In my VMIL, it would be known that the operand is a member, and therefore encode it and do this: ‘MOV_MEMBER EAX, 123456’. Some possible variants of MOV in my VMIL were: ‘MOV_CONST’, ‘MOV_VMDATA’ and ‘MOV_MEMBER’.

#Strengths

In this VM’s runtime, I wanted to use a few key concepts.

#Designated VM Objects

The runtime would not use any normal objects in its code. For example, normally you could define things as integers, floats, strings, whatnot. In CrocVM’s runtime, there is one object that I refer to as a VMValue. Almost all functions use this VMValue, and it functions as almost a proxy to a type definition.

#No Locals + Irreversible Conversions

In the CrocVM compiler, stack variables and locals are one and the same. Some things are removed entirely, such as occurrences of the ‘dup’ opcode. The ‘dup’ opcode duplicates a value on the stack. When being converted to a register format, it becomes apparent that things like this will become obsolete.

  • Original:
    ldc.i4 12 // push 12 to the stack
    dup // duplicates 12 value, so stack now has two 12 values
    add // add these two and push output to stack
    ret // return
    
  • IR:
    mov eax, 12 // move value 12 into register eax 
    add eax, eax // add the value in eax into the value located in eax 
    ret eax // return the value in eax
    

Note: this example could be flattened down into return 24.

You can see in this example that the dup can be removed, because the register system essentially makes everything into a local variable. This is a strength because it makes it so many different inputs can go into the same output, creating the conclusion that some information from the original il cannot be recovered.

#Barely any stack usage

I’m not entirely sure I should have made this a key point, but in CrocVM the stack in barely used. This is because of CrocVM’s usage of registers. The opcodes ‘stloc’ and ‘ldloc’ are actually replaced with null. This is because when tracing where a variable came from, the only real thing that matters is what register it was assigned to. Once the virtualizer can figure out the source converted register of a ldloc or stloc, it can just directly replace it with a reference to said register. Going back to stack usage, the stack is only used for calling.

I did not finish this post. Instead, I continued with the development of VirtualGuard and ended up rewriting it entirely. For more info, view the projects page!