Defeating BlackMatter's string obfuscation

<2021-08-20>

tl;dr:

This blog post focuses on the various string obfuscation methods employed by the still relatively new ransomware BlackMatter and presents ways to decode those strings by leveraging Ghidra's scripting capabilities and the usage of the Unicorn engine for CPU emulation. The corresponding Ghidra scripts published on Github aim to aid future analyses of BlackMatter-samples, which employ the following string obfuscation techniques:

Encoded strings and PE sections: The strings are stored in the .data-section and were XORed with a stream of pseudo-random numbers, which are generated by a linear congruential generator (LCG). This LCG is seeded with the first dword of the .rsrc-section. All strings are preceeded by their length, which is specified as a dword and is stored immediately before the encoded string.

Stack strings: Besides this, strings are constructed by placing dwords at adjacent memory regions on the stack. Those are decoded via a simple multi-byte XOR-operation with a constant dword.

Motivation

The threat actor behind BlackMatter is still up to mischief and extorted several companies as the actor's leak site suggests ¹. In the last blog post the API hashing mechanism of the new ransomware BlackMatter has been explained and defeated by the utilization of Ghidra's scripting capabilities and pre-computed hashlists. However, reconstructing the imported function calls of a malicious program is just one of the obstacles to uncover its behaviour and the author's intent. Another important step is to de-obfuscate all hidden strings, which are contained in the binary. As this is an important part of malware analysis, which can provide valueable insight ², this post describes BlackMatter's string obfuscation mechanisms and provides scripts to automate its decoding.

Scope

This write-up deals with a BlackMatter-sample with SHA256-hash

7f6dd0ca03f04b64024e86a72a6d7cfab6abccc2173b85896fc4b431990a5984

This was already the subject of the previous blogpost. It has a compilation timestamp of 23rd of July 2021 20:51:18 UTC and was published on the 2nd of August 2021 at MalwareBazaar ³. This blog post deals only with the string obfuscation mechanisms employed by BlackMatter and describes ways of automated de-obfuscation.

Encoded strings in the .data-section

Identification of the string deobfuscation function

There are 25 calls of a function at address 004064cc which receives a byte array as an argument. All the data buffers passed to this function are global variables and live in the program's .data-section. A quick peek into those memory areas showed, that all the content seems to have very high-entropy and is garbage to the human eye. An indicator that the .data-section houses strings, was that the function calls often happen just before API-calls like RegCreateKeyExW and others.

Encoding scheme

A closer look at the a/m function at address 004064cc – illustrated in fig. 1 – revealed, that within in its body the following steps were performed:

Retrieve the size $n$ of the given buffer by looking at the preceding dword
Allocate $n$ bytes of memory
If the allocation succeded
- Copy the content of the given buffer to the newly allocated memory chunk
- Call actual decoding function at 00401713
Return the pointer to the newly allocated (and now decoded) buffer

Figure 1: Setup for string decoding at `004064cc`

As this is just a setup function, which prepares the decoding, a closer look at the function at address 00401713 – here labeled decodeWithLCG – is needed. Within this function a loop iterates over the buffer with a stepsize of four bytes. In each iteration a pseudo-random dword is generated, which is used to XOR the current four bytes as fig. 2 illustrates.

The interesting thing here is the way the pseudo-random numbers are generated, which is done by utilizing the linear congruential generator (LCG) algorithm, which is implemented in the function at address 00401769 – here labeled getRandViaLCG. This is a fast and easy to implement method for generating a sequence of pseudo-randomized numbers. On Wikipedia the formula which defines the LCG is defined by recurrence relation as follows:

$X_{n+1} = \left( a X_n + c \right)\bmod m$

where $X$ is the sequence of pseudorandom values, and

$m, 0 < m$ — the modulus

$a,0 < a < m$ — the "multiplier",

$c, 0 \le c < m$ — the "increment"

$X_0, 0 \le X_0 < m$ — the "seed" ⁴

The implementation in the BlackMatter-binary matches this formal description very well as fig. 3 illustrates:

Figure 3: Implementation of the linear congruential generator at `00401769`

Looking at the decompiled representation after having performed some variable renaming, the similarity is even more striking:

Figure 4: Decompiled linear congruential generator at `00401769`

$m$ is here 0x8088405, $c$ is 1 and the seed $X_0$ is hard-coded as the first dword of the .rsrc-section. The only difference from the a/m formula is, that instead of doing a modulo-operation a binary shift by 32 bits is used to get the "remainder" after performing the previous calculations with 64-bit integers.

Automated decoding

Having reversed this scheme and knowing, that the function at address 004064cc is used as a setup function, it is relatively easy to decode all strings within the binary. @sysopfb published a Python2-script to automatically decode the .data-section and print the results to stdout, which can be found here. Altough this is very helpful, it does not aid much during the reversing process, because the code references are lost. Therefore a Ghidra-script was created, which decrypts all buffers, that are passed in the "setup function" at address 004064cc, which initiiates the decoding process. The script is hosted on Github as BlackMatterDecodeLCG.java. It performs the following steps to decode all obfuscated scripts:

Ask the user for the name of the function, which sets up the buffer for the decoding
Retrieve the seed by reading the first dword of the .rsrc-section of the PE-file
Find all calls of the specified function
For each call:
- Retrieve the address of the passed buffer
- Read the preceding dword, which specifies the length $n$ of the data to decode
- Generate the $n$-element sequence of pseudo-random numbers and perform the XOR-operations
- Update the buffer's bytes in the memory

After running this script all the strings in the .data-section are stored in cleartext and are accessible during further analysis. Unfortunately, there is more string obfuscation, encoded stackstrings to be specific, which are discussed in the next section.

Encoded stackstrings

Another string obfuscation technique used by BlackMatter is the construction of strings on the stack. Those stackstrings, as they are commonly called ⁵, are constructed by moving each character into adjacent stack addresses at runtime. BlackMatter goes beyond that and stores dword-values on the stack and decodes those values via an XOR-operation with a constant value. The following assembly code at 00407b6b illustrates this procedure.

Figure 5: Decoding of stackstrings at `00407b6b`

While there are various ways to counter those XORed stackstrings, the usage of code emulation seems to be a comfortable and interesting solution. FireEye's FLARE team released an IDA Pro script, which provides this functionality ⁶ and builds up on flare-emu, a library, which "marries a supported binary analysis framework, such as IDA Pro or Radare2, with Unicorn’s emulation framework".

Given that Ghidra is not supported until now, it seemed to be a worthwhile endeavour to prepare the usage of the Unicorn engine from within a Ghidra script written in Java.

Setting up Unicorn for the use with Ghidra

Obviously, it is needed to install the Unicorn Engine on the system first:

# Clone Unicorn's repo
git clone https://github.com/unicorn-engine/unicorn.git

# Compile the code and install Unicorn's engine
cd unicorn
./make.sh
sudo ./make.sh install

Then build and install Unicorn's Java-bindings, which are already included in Unicorn's repository:

cd unicorn/bindings/java
# This path change was needed for me
sed -i 's@#include "unicorn_Unicorn.h"@#include "unicorn/unicorn_Unicorn.h"@' unicorn_Unicorn.c
# Assemble the .jar and the .so
make jar
make lib

# Install the shared lib
sudo cp libunicorn_java.so /usr/lib/jni/

There are several options to add the unicorn.jar to Ghidra's build path as it is discussed in this Github issue. Personally, I prefer to place the external .jar-files to use in my scripts inside the directory ${GHIDRA_INSTALL_DIR}/Ghidra/patch, like so:

# Add unicorn.jar to Ghidra's build path
sudo mv unicorn.jar ${GHIDRA_INSTALL_DIR}/Ghidra/patch

If you need code completion in Eclipse, add the .jar-file to your build path by right clicking on it and choosing "Add to build path".

Using Unicorn to decode stackstrings

After completing the above mentioned steps, you should be able to import Unicorn in your Java-classes by adding

import unicorn.*

and be able to create Ghidra scripts, which emulate code in the binary to analyze.

For a full but rather simple Python example refer to this script called bm_stackstrings.py. The general structure of the code is adapted from Jason Reaves' blog post on the decryption of BazaarLoader-strings with the help of Unicorn's Python-library ⁷, but the regex-pattern was adapted to BlackMatter's code blocks ⁸ and the script was ported to Python 3.

I took this approach as a base for the creation of a Ghidra script which emulates either the instructions in the code range selected in Ghidra's UI or specified by a start and an end address queried via a dialog from the user. After the region of interest to emulate is defined, memory has to be allocated for the code segment and the stack segment. Afterwards the memory region marked in Ghidra's UI is copied to the code segment and the stack segment is filled with zeros. Then the emulation of the code is kicked off. When the emulation has finished, the stack is read and scanned for the existance of a printable string. If one was found, it will be set as a comment just before the start address. If you want to try it yourself or just have a peek into a full example of Unicorn's usage in a Ghidra script, refer to

https://github.com/jgru/grus-ghidra-scripts/blob/main/blackmatter/BlackMatterStackStrings.java.

Verdict

This blog post detailed the string obfuscations employed by the new ransomware BlackMatter.

To obfuscate strings in the .data-segment, multi-byte XOR-operations with a stream of pseudo-random numbers are performed in the function at address 00401713. Those pseudo-random numbers are generated by a linear congruential generator, which is seeded by the first dword in the .rsrc-section of the binary.

In addition to that BlackMatter employs stackstrings, which are XORed with a constant dword value. To decode those stackstrings quickly, it is fairly comfortable to use the CPU-emulator Unicorn. The installation and usage of Unicorn's Java bindings from within a Ghidra script is described at detail in this write-up.

If you have any notes, errata, hints, feedback, etc., please send a mail to jan _at___ digital-investigations d0t info.

Appendix: Recovered strings

Here are some of the recovered strings for the crawler's of the search engines.

Recovered stack strings

Win32_ShadowCopy.ID='%s'
Global\%.8x%.8x%.8x%.8x
Times New Roman
Control Panel\Desktop
WallPaper
WallpaperStyle
Elevation:Administrator!new:{3E5FC7F9-9A51-4367-9063-A120244FBEC7}

Recovered strings from the `.data`-section

BlackMatter Ransomware encrypted all your files!
To get your data back and keep your privacy safe,
you must find %s file
and follow the instructions!

<snip>

Accept: */*
Connection: keep-alive
Accept-Encoding: gzip, deflate, br
Content-Type: text/plain
{
"bot_version":"%s",
"bot_id":"%s",
"bot_company":"%.8x%.8x%.8x%.8x%",
%s

{
"bot_version":"%s",
"bot_id":"%s",
"bot_company":"%.8x%.8x%.8x%.8x%",
"stat_all_files":"%u",
"stat_not_encrypted":"%u",
"stat_size":"%s",
"execution_time":"%u",
"start_time":"%u",
"stop_time":"%u"
SOFTWARE\Policies\Microsoft\Windows\OOBE
DisablePrivacyExperience
SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon
AutoAdminLogon
DefaultUserName
DefaultDomainName
DefaultPassword
bcdedit /set {current} safeboot network
bcdedit /deletevalue {current} safeboot
bootcfg /raw /a /safeboot:network /id 1
bootcfg /raw /fastdetect /id
SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnce