Dynamic binary instrumentation (DBI) with DynamoRio

This blog introduces dynamic binary instrumentation (DBI) and guides you through building your own DBI tool with the open-source DynamoRIO framework on Windows 11.
DBI enables powerful runtime analysis and modification of binaries critical for malware analysis, security auditing, reverse engineering, and performance profiling — all without access to source code.
Learn DynamoRio’s strengths, its practical applications in evading anti-analysis techniques, and step-by-step instructions for developing and testing your own instrumentation clients.
Explore hands-on examples with access to a GitHub repository with sample code to jumpstart your own research and tooling.

Binary instrumentation involves inserting code into compiled executables to monitor, analyze, or modify their behavior — either at runtime (dynamic) or before execution (static) — without altering the original source code. Tools like DynamoRIO, Intel PIN, Valgrind, Frida, and QDBI are commonly used in the field. Static binary instrumentation (SBI) injects code before a binary runs, typically by modifying the file on disk, whereas dynamic binary instrumentation (DBI) operates in memory while the program runs. These techniques are widely used for profiling, debugging, tracing, security analysis, and reverse engineering.

Introduction to DynamoRIO

DynamoRio (DR) is a matured, well-maintained, and frequently updated open-source DBI framework. HP and the Massachusetts Institute of Technology (MIT) developed the first version in collaboration around 2000. Derek Bruening, DynamoRio’s lead developer, described the main concept in his PhD Dissertation in 2004, and further information about the history of DR can be found here.

The main reasons why Talos uses DR for Windows- and Linux-based malware analysis is its low performance impact at execution time, the excellent transparency (the target application does not recognize it is instrumented), and the open source license. Table 1 provides a brief comparison of some of the common instrumentation frameworks used in the industry. Please consider this as a general reference only, as this comparison might be biased by our use cases and may not remain accurate over time due to the ongoing development of the different frameworks. There is also not a best overall toolkit, as it depends on the use case.

It should also be noted that it always depends on the user code how well a certain instrumentation framework works. Even the best framework cannot fix bad user code.

Feature	DynamoRIO	Intel PIN	Frida
Type	DBI	DBI	Dynamic Runtime Instrumentation via API Hooking
Instrumentation granularity	Basic blocks and instructions	Instruction-level (very fine-grained)	Function-level and instruction-level (via memory hooks)
Language (API)	C/C++	C/C++	JavaScript, Python, C
Target platforms	Windows, Linux (limited macOS, Android forks)	Windows, Linux (x86/x64 only)	Windows, Linux, macOS, Android, iOS
Architecture support	x86, x64, ARM (partial), AArch64 (forks)	x86, x64	x86, x64, ARM, ARM64
License	Open source (BSD-like)	Proprietary (free for non-commercial)	Open-source core and commercial license (Frida Pro)
Performance overhead	Medium (2–10× depending on tool complexity)	High (10–20× or more with deep instrumentation)	High (especially with many hooks or on mobile)
Transparency (anti-debug evasion)	Medium (code caching may leak)	Medium to low (can be fingerprinted)	Low (easily detectable by injected libraries or syscalls)
Best use cases	Runtime analysis, instrumentation, sandboxing	Deep instruction analysis, academic research	API hooking, mobile analysis, debugging, live patching
Shellcode detection feasibility	Excellent (module-level execution monitoring)	Good, but more effort needed	Limited (good for allocation and hook, not raw exec detection)
Community and documentation	Active community, used in research and industry	Older, still maintained by Intel	Very active, large community, modern docs

Table 1. DBI framework comparison.

Why use DBI to analyze malware?

Ultimately, the possibilities are only limited by your creativity and malware technology knowledge. However, here are some examples:

Anti-anti-VM detection

Malware samples can be executed on real hardware and still be monitored and analyzed. Alternatively, VM detection functions can be patched at runtime to make sure the malware does not recognize it is running in a VM.

Anti-anti-analyzing techniques

Malware uses many simple but common anti-X techniques (such as anti-debugging, anti-emulation, anti-tamper, anti-disassembler, self modification, etc.) that do not recognize DBI or do not have any impact on the DBI analysis. Many code runtime manipulation techniques which the frameworks are using are either transparent or hidden to the malware or the malware is just not trying to find them. The latter probably applies to the majority of malware today.

De-obfuscation

For example, code traces and memory dumps based on certain conditions can give the analyst a better idea of what the malware is actually doing.

Finding interesting functions within the malware

It is relatively simple to build a shellcode execution detection tool with DBI to find a second stage in a packer by looking for functions which are allocating RWX memory, copying data into it and jumping into these memory areas later. You can also look for cryptographic constants which might be hidden in Mixed-Boolean-Arithmetic functions and are rebuilt at runtime to find unpacking routines or string obfuscation functions. Another example would be to count how often functions are called to find interesting functions which might be used to decode strings.

Dumping memory and gather runtime information

With DBI you can monitor the execution and get runtime values of registers or memory locations. You can also trigger on certain values of registers or other conditions to do dynamic memory dumps at runtime.

Automation

Build an unpacker or config extractor for frequently seen malware families

Inside DynamoRio

Before starting to write your own client, it is important to understand some basics about how DR works under the hood. DR is a process virtual machine (PVM). In the context of DR, this refers to a virtual execution environment that allows DR to dynamically instrument, monitor, and modify the behavior of a running application at the level of individual processes. DR operates entirely in user space (ignoring some experimental features). When DR starts a target application, it injects itself into the process and hooks or intercepts system calls. It takes control before the target application begins execution and starts copying the first basic block(s) of the target application into a code cache, which is a memory region DR has full control over. Then it redirects the execution flow to this code cache. This enables DR to monitor and modify the original instructions of the target application. It relocates addresses and modifies the target application code in a way that it is working semantically exactly like it would if it is executed natively (Figure 2).

Dynamic binary instrumentation (DBI) with DynamoRio — Figure 1. DynamoRio Architecture.

For performance optimization, the basic block code cache is extended with a trace cache (Figure 3). DR monitors and measures the execution of basic blocks in the basic block cache and builds groups of basic blocks which are frequently executed in the same order. At a certain threshold it combines these basic blocks and copies them into the trace cache including the instrumentation code and some inline checks. In reality, it is not a simple counter, it depends on multiple factors. This technique speeds up the execution time due to the fact that less context changes are necessary. In this context a context change means a switch between the target application code and DRs core and dispatching routines.

All these operations should be transparent/hidden to the target application. It should appear to the program as if it were running natively on hardware — unaware of and unaffected by any instrumentation. DR takes care about the following side effects:

Register states: DR saves and restores all registers it modifies so that the application sees the original values.
Stack and memory: Internal data used by DR (e.g., for managing its own state) is hidden from the application.
Instruction semantics: A transformed or instrumented instruction behaves identically to the original — no change in meaning or side effects.
Control flow: Even if DR adds trampolines or modifies jumps, the logical flow of execution remains the same from the program’s point of view.
Signals and exceptions: Any exceptions (e.g., SIGSEGV) or signals are handled in a way that preserves original behavior and context.

More details can be found here.

Writing a simple client

The development environment used:

Windows 11 Version 24H2 (Default Settings)
Visual Studio 2019, CMAKE (incl. in VS)
- The DR homepage recommends VS 2019 for building its source code, for building clients with the latest DR version, but VS 2022 might work in most cases too. If you want to be on the safe side, use VS 2019.
DR 11.3.0
- Download it here, then unzip it to a folder of your choice. These examples use “C:toolsDynamoRIO-Windows-11.3.0”. Later 11 versions should work, too.

The DR release package includes several tools for memory debugging, profiling, instrumentation, legacy CPU simulation, cache simulation and more. See the DR homepage for more details. In this blog post, we will not use these tools, but instead build our own.

When writing your own instrumentation client, you usually run it via “drrun.exe”, which is the DR loader application. The syntax is the following for a 64-bit client and 64-bit target application; for 32-bit, you just use the drrun.exe 32-bit version from the “bin32” directory, rather than the “bin64”.

<DYNAMORIO_INSTALL_DIR >bin64drrun.exe -c "

<DIR_TO_YOUR_CLIENT>client.dll" [client arg1, client arg2, …] –

“<TARGET_APP_DIR>target_app.exe”

The DR client (“client.dll”) is the instrumentation code you are writing to do something with the target application (“target_app.exe”). The “target_app.exe” is the malware which we want to instrument.

While you can also write standalone tools which are not loaded via “drrun.exe”, it is quite complex and out of scope of this tutorial. There are several options to configure the instrumentation process (details here). For this blog we will choose the simplest way via “drrun.exe” and default configurations.

To test the tool chain we will first write a DR “hello world” client (simple_client1). You can find the code on our GitHub repository. All examples in the repository are organized as seen in Figure 4.

They all include the client source code (e.g., client.c), a corresponding CMake file (CMakeLists.txt), and build32/64.bat scripts, which include the CMake build commands and MSYS_build32/64.sh scripts for starting the build process on MSYS2. All of the source code is heavily commented and prints out debugging messages and information of the important steps happening during the instrumentation phases. The code is written in a way that makes it easier to understand how things are working, not for performance or security goals. For example, it lacks some exception or input checks, which you might want to add if your client runs in a productive environment.

IMPORTANT: Make sure you change the CMakeLists.txt file content depending on your installation. In other words, mainly verify/change the directories. Also verify the other scripts to ensure the directories and filenames match your environment.

The build process is usually very easy. Install Visual Studio 2019 incl. CMAKE, verify/edit the directories inside the scripts, and run the MSYS_build32/64.sh script in a MSYS2 shell. The script mainly does the following:

Start the “Developer Command Prompt for VS 2019”.
Execute the build32/64.bat batch file, which includes the build commands.

Note: Most scripts build “Release” versions. You can change the CMAKE_BUILD_TYPE parameter of the build commands in the build32/64.bat to “RelWithDebInfo” for a release with debug information or to “Debug” for a full debug release, but in most cases this is not necessary for normal troubleshooting.

Again, the client library needs to have the same bit width like the target application; in other words, if the target application is 32-bit, you need a 32-bit DR client. If the target application is 64-bit, the client needs to be 64-bit, too. If there is a mismatch, you will get the error message below (Figure 5). This tutorial will only focus on 64-bit apps and clients, but the GitHub repository also contains 32-bit versions of most demos.

Run the client via drrun.exe against a test program. For example:

"C:toolsDynamoRIO-Windows-11.3.0bin64drrun.exe" -c ".buildReleasesimple_client.dll" – ../testsamples/threads/x64/Release/threads.exe

If you can see the “Hello from DynamoRIO client” message after instrumentation of the test application, your development environment works and you can proceed to the next example of a client which will be a bit more useful. If something goes wrong, you can also run the target application with drrun.exe only (e.g., drrun.exe – <target application>).This will tell you if there is an issue with your client or you run into a DR bug. Another useful drrun.exe switch is “-debug”; beside other things, it finds memory leaks in your client. We are aware that one of the demo clients in the repository has a memory leak, but that is on purpose. See the source code for details.

First real instrumentation client

Let’s start to write a simple client that prints out all DLLs loaded by a process at run time. Look at the simple_client2 example from GitHub for the implementation details. Running the client looks like this:

Every client starts execution with the “dr_client_main” function. This is the start function of your client. Here we will do some initialization and you can find the callback register functions. See Figure 8 below for an example.

The callback functions are controlling the custom instrumentation process. In other words, DR is using callback functions for executing the code you want to use during instrumentation. The function names give you a good idea about what they do. For example, in Figure 8, the “drmgr_register_module_load_event” function is registering a callback function “event_module_load_trace_instr” which is then getting called every time a module (e.g., a DLL) is loaded by the target application at runtime. In our simple example code, the “event_module_load_trace_instr” function will just print out the name of the loaded module.

Most of the functions you need are part of DR extensions. These extensions are part of the DR project and they are already included in the DR distribution package (the ZIP file you have downloaded). You do NOT need to install or configure them. They offer higher-level abstractions for commonly needed features, so you don’t have to implement them from scratch. These functions usually start with the name of the extensions (e.g., “drmgr_register_module_load_event” for drmgr functions). Here are some extensions with brief descriptions:

drreg: register stealing and allocating
drsyms: symbol table and debug information lookup
drcontainers: hash table, vector, and table
drmgr: basic instrumentation functions and tools
drwrap: function wrapping and replacing
drutil: memory tracing, string loop expansion
drx: multi-process management, misc. utilities
drsyscall: system call monitoring
drdecode: CPU decoding/encoding library
umbra: shadow memory framework

In this tutorial we will look at “drmgr” and “drwrap”. Let’s have a closer look at these extensions.

DynamoRIO Multi-Instrumentation Manager (drmgr)

It is a helper library designed to make writing DR clients easier, cleaner, and safer.

Drmgr offers the following functions:

Event registration: Simplifies registering for events like bb_event or thread_init
Thread-local storage (TLS): Safely manages TLS slots
Per-thread cleanup: Automatically frees thread-specific memory on exit
Safe initialization: Ensures extensions are initialized in the right order
Avoids conflicts: Handles shared slot allocation so multiple extensions don’t collide

The picture below shows some examples for event registration callback functions included in drmgr. They are called when the corresponding event occurs. The names are more or less self-explanatory.

Drmgr wraps several of the low level functions from dr_events.h. For example, “drmgr_register_bb_instrumentation_event()” wraps “dr_register_bb_event()”. In common, it is simpler and/or more secure to use the functions in drmgr.

The GitHub simple_client3 example is a simple code tracer. For the code trace, we register a callback function which is called for every instruction. To do this, use the insertion_func parameter of drmgr_register_bb_instrumentation_event(). The registered callback function and subfunction then disassembles the instructions and prints it out. The implementation details can be found in the mentioned simple_client3 project. The simple_client3 example also patches a function at runtime. The patching process is described in the next chapter.

Another function you probably always want to register in your client is the exit event callback function.

It is registered via the dr_register_exit_event() function and is used to clean up things when the target application process has exited. For example, it is used to call extension exit functions or free allocated memory (Figure 9).

DynamoRIO drwrap

Drwrap offers helper functions to simplify the runtime patching of the target application functions. For example, it can help wrap a function before (pre) and/or after (post) execution of the function. This means you can either use a pre-wrap function, which could manipulate arguments handed over to the function, or a post-function, which could manipulate the return value of a function in the target application. A good use case would be patching an VM detection routine or an anti-analyzing function in a malware which detects the instrumentation process. That being said, many of the typical anti-analysis tricks malware uses do not work under DR — or DR is actively putting counter measures in place. For most of the common anti-analyzing tricks in malware, you do not need to patch anything. We will talk about the details later on in this blog.

A full example for an instrumentation with drwrap can be found in simple_client3. From a high level overview, it is as simple as this: You first register a drwrap pre- or post-function (e.g., when the module which includes the function you want to patch is loaded). Then the registered function (wrap_post_function()), manipulates the function at runtime. In this example it sets the return value to zero (Figure 10).

This should be enough for a quick intro and overview on how to write DR clients. You can find many more examples and details in the GitHub repository.

If you wrote a useful client and want to share it with the community, we are happy to add it to the ”3rdparty” directory. We are not responsible for the code in this directory, so use it at your own risk.

Vibe coding

Vibe coding for DR clients works alright. If you use it for simple functions, it often works. From Talos’ experience, vibe coding full and complex clients usually doesn’t work. In most cases, it will just crash and you will spend more time cleaning the bugs than saving time with it. The best case scenario is it runs with sub-optimal side effects. However, AI can be an excellent tool to brainstorm which DR function to use for certain tasks or how to get started for a certain use case.

DynamoRio and anti-analyzing techniques

We have built a simple Anti-X pseudo malware without any malicious functions to test DR’s transparency and robustness against common anti-analyzing techniques frequently seen in malware. You can find it in the Anti-X GitHub repository. It includes different anti-debugging techniques, self modifying code, large loops, exceptions, and simple code verification techniques to detect hooks or breakpoints. The self-modifying aspect also changes the control flow graph at runtime to verify that DR can handle this. We provide the source code so you can verify that it does not do any harm to your machine.

Test case: shellcode

This test case decodes a shellcode at runtime and executes it.

Result: No problem for DR. The code is executed as expected. No difference to the native execution without instrumentation.

Test case – SSL/TLS traffic

In this test case we downloaded an info page about our external IP address from “https[://]ifconfig[.]me” to test whether or not we could intercept the TLS traffic.

Result: This works well on most machines we tested it on. Only on very recent Intel Mobile CPUs the download failed with ERROR 12175 (ERROR_WINHTTP_SECURE_FAILURE) if the anti-x application was instrumented. This error usually occurs if the verification of certain security parameters of the TLS connection failed (e.g. the certificate is not valid yet or timed out). Whether this is a new Windows anti-tamper feature or a DR bug is currently unknown. We are investigating this and we will update the blog once we find the root cause. If you have an idea about the root cause or if this occurs on your machine, as well, we would be happy to hear from you.

Test case: Code validation

In this test case, we are verifying the CRC32 of the bytes of a certain function in memory. If any byte of the function gets changed (e.g., a debugger sets a breakpoint [0xCC]), the CRC32 would be different than calculated over the native function.

Result: The CRC32 value does not change if the target application is instrumented.

Test case: Simple anti-debugger tests

The first test (IsDebuggerPresent) should be self-explanatory, and the second uses the threat context object to check if one of the debug registers (DR0-3) is set, which would indicate a hardware breakpoint.

Result: None of the tests detect that the application is instrumented.

Test case: Exceptions

In this test case, we trigger an exception which is handled by the application to investigate if it has any side effects to the target application.

Result: No problem. The application is executed the same way without instrumentation.

Test case: Runtime measurement

This test checks if the execution time between two code sections is too long. This detects debuggers or anything that significantly delays the code execution.

Result: As long as you do not do foolish things, such as insert too much code at runtime or instrument a large loop inside the target application, DR is usually fast enough to avoid detection. Of course, this depends on the kind of instrumentation you do and on how aggressive the timer is set. DR usually adds a delay of between 2 – 10 times that of the original execution time.

Test case: Large loops

Result: Similar to the above, you don’t want to instrument something inside this loop, but anything outside is no problem.

Test case: Self-modifying code

In our self-modifying code, we are doing multiple things:

Pre-patching code: This means we patch the code soon before it is executed.
Post-patching code: This means we patch the code after it is executed, and the next time this function is called the patch will be executed instead of the original code.
Interleave jumps: We jump in between the bytes of an instruction, which converts the bytes of the instruction into another instruction. In our case a “mov” instruction becomes a “jmp” instruction. This not only changes the instruction, but also the control flow graph (CFG) at runtime. We want to verify if this has an impact on the Basic Block and Trace Cache mentioned above.

First, let’s have a closer look at the self-modifying code which we are using. Here you can see the pre-patch code. It converts the original “jmp end_label” instruction to a “mov rax,3” instruction. Once again, it is not only the instruction that is being changed; the control flow graph (CFG) is also being modified.

The next one is the post-patch, which is similar to the pre-patch and overwrites a “mov rax,1234” with a “mov rax,1”, but this modification is done after the code was executed. This means when the function is called the next time, the values in rax and rbx are not equal anymore, the comparison later fails, and the jump is not taken.

Last but not least, in the middle of the function, we do the interleave trick. The “test rax,rax” will set ZF=0 (RAX = 3) which means the “jz” will not be taken and the “jnz” will be taken. Some static disassemblers get confused by this and disassemble the byte starting with “048h, 0C7h, …” to a “mov rax, 0FFFFFFFF90900CEB”, because they do not realize that these bytes are never executed.

Only after manually converting it, the disassembler shows the jmp (= EB 0C), in other words the byte we jnz’ed to. Again, we want to see if DR’s code cache generator or dispatcher gets confused by this.

Result: The self-modifications are working as expected. DR executes everything like it would on a real CPU. These self modifications were no problem for DR.

The screenshot below is from the simple_client3 DR client run against the anti-x target application mentioned above, which is doing a code trace over the self modifying function. We deleted all messages from the client output, except the instruction related output.

The left side is the instruction trace showing when the self modifying function was called the first time. The right side is the second call of the self modifying function.

Summary

DR offers a wide range of built-in capabilities to counter common malware anti-analysis techniques. However, some methods can still detect or circumvent DR; these were intentionally excluded from this blog to avoid aiding malware authors. In this post, we introduced how to use DR for malware analysis and demonstrated how it can help bypass typical anti-analysis measures with minimal effort. We hope this helps readers get started with DR and encourages you to contribute to our GitHub project. Have fun exploring DBI and DynamoRIO!

Cisco Talos Blog – Read More