< /posts/ />
Writing a .NET Virtualization Engine (Part 1)
2023-3-14

#Introduction

Code virtualization is a relatively modern method of obfuscation unlike any other. Most typical obfuscation techniques modify the code in ways that make it harder to comprehend. Virtualization, on the other hand, translates the original code into a completely new, potentially unique language. This process, if done right, can make the code incredibly difficult to reverse engineer. In this mini-series of articles, I will describe how you can make your own code virtualizer for a program written using the .NET framework. We will refer to said code virtualizer as a “Virtualization Engine” (flashy right?) throughout these articles.

#What will be showcased in this article?

This article will discuss the macro (big picture) ideas behind writing a virtualization engine. It will provide a basic understanding of how a hypothetical virtualizer could work.

#Why are you not providing any code support?

The intention of this mini-series is for you to be able to build your own engine. For me, writing an engine was a very ambitious project. It took a lot of researching, debugging, and frustration to build my first VM. I intend for this to be a comprehensive guide for a competent developer who is interested in creating their own unique implementation of the concept of code virtualization. With that being said, feel free to contact me if you have any questions about the content on this or the following articles.

#Prerequisites

Before we begin, please ensure you are familiar with the following concepts:

  • The Stack and its part in .NET interpreters
  • A brief understanding of MSIL OpCodes. (This link will be your main resource for building handlers).
  • A lot of patience.

#Key Terms and Concepts

  1. IL
    • The list of instructions that represent the .NET code you would like to virtualize.
  2. VMIL
    • The translated instructions that will be ran by your virtual machine.
  3. Runtime (RT)
    • The interpreter that knows how to run your VMIL. This is the virtual machine.
  4. Engine
    • Handles all translation of the original IL into the VMIL.
  5. Instruction Handler (Handler)
    • Located in the RT, it will perform the actions specified by the specified ID.

#Structure

Every virtualizer consists of two major parts:

#Engine

As previously mentioned, the engine handles all translation of the IL into your VMIL. This is often times significantly more complicated than the runtime, although it depends on the complexity of your virtual machine. The stages for a basic virtualizer could be as follows:

  1. Injection
    • The VM is injected into to the target executable.
  2. Translation
    • All of the instructions in a target method will be translated into the virtual machine’s instruction set (VMIL).
  3. Patching
    • The bodies of the virtualized methods will be removed, and replaced with a call entering the virtual machine. The VMIL will also be encoded and stored in the target executable as VMData.
  4. Saving
    • The target executable is saved to disc, now containing the virtual machine and the virtualized code!

#Runtime

The runtime is the actual virtual machine that can understand your custom instruction set. It will contain a point of entry into the VM to begin execution of the virtualized instructions. A typical VM Runtime Entry Routine could look as follows:

  1. Extract the VMData.
  2. Initialize the Context for the handlers to use.
  3. Loop through all instructions in the virtualized method’s body:
    • Locate the associated handler with the instruction.
    • Invoke said handler with the instruction’s operand.
  4. Return.

#Conclusion

Congratulations! You have finished the first article in this series of articles teaching you how to write a virtualization engine. I hope this has helped deepen your understanding of not only what goes into a code virtualizer, but some of the fundamentals behind them! In the next article, we will get into some unique cases that you may run into when attempting to emulate some instructions in the msil instruction set. I will also provide some ideas regarding optimization and custom instructions. Happy coding!