Code virtualization is a relatively modern method of obfuscation unlike any other. Most typical obfuscation techniques modify the code in ways that make it harder to comprehend. Virtualization, on the other hand, translates the original code into a completely new, potentially unique language. This process, if done right, can make the code incredibly difficult to reverse engineer. In this mini-series of articles, I will describe how you can make your own code virtualizer for a program written using the .NET framework. We will refer to said code virtualizer as a “Virtualization Engine” (flashy right?) throughout these articles.
This article will discuss the macro (big picture) ideas behind writing a virtualization engine. It will provide a basic understanding of how a hypothetical virtualizer could work.
The intention of this mini-series is for you to be able to build your own engine. For me, writing an engine was a very ambitious project. It took a lot of researching, debugging, and frustration to build my first VM. I intend for this to be a comprehensive guide for a competent developer who is interested in creating their own unique implementation of the concept of code virtualization. With that being said, feel free to contact me if you have any questions about the content on this or the following articles.
Before we begin, please ensure you are familiar with the following concepts:
Every virtualizer consists of two major parts:
As previously mentioned, the engine handles all translation of the IL into your VMIL. This is often times significantly more complicated than the runtime, although it depends on the complexity of your virtual machine. The stages for a basic virtualizer could be as follows:
The runtime is the actual virtual machine that can understand your custom instruction set. It will contain a point of entry into the VM to begin execution of the virtualized instructions. A typical VM Runtime Entry Routine could look as follows:
Congratulations! You have finished the first article in this series of articles teaching you how to write a virtualization engine. I hope this has helped deepen your understanding of not only what goes into a code virtualizer, but some of the fundamentals behind them! In the next article, we will get into some unique cases that you may run into when attempting to emulate some instructions in the msil instruction set. I will also provide some ideas regarding optimization and custom instructions. Happy coding!