Jan's Blog on DFIR, TI, REM,....

23 Oct 2021

# Gradual Evidence Acquisition From an Erroneous Drive

## tl;dr

A hard drive is a relatively fragile data store. After discovering the first indicators of a drive failure the hard drive might suddenly die. The present blogpost therefore discusses the gradual acquisition of evidence from an erroneous drive by utilizing the synergy of the open source tools ddrescue and partclone. In order to spare the mechanics of the drive and acquire the most critical data first, partclone is used to create a so-called domain file, where the used blocks of the file system are noted.

# Record actually used blocks in domainfile
partclone.[fstype] -s /dev/sdXY -D --offset_domain=$OFF_IN_BYTES \ -o domainfile  This is the basis for ddrescue’s recovery process, where a blocksize-changing data recovery algorithm is utilized, that will only cover these areas for the moment. # Restrict rescue area via domainfile ddrescue --domain-log domainfile /dev/sdX out.img mapfile  Afterwards, additional runs can be conducted to acquire the remaining sectors. ## Background Since HDDs are rather sensitive mechanical components, it is not uncommon for them to exhibit read errors after a certain amount of usage and the wear and tear of the magnetic platters or alternatively as a consequence of shock events, which lead to a mechanical damage inside the drive. So-called head crashs, which most commonly occur when the HDD drops during regular operation, might be lethal for a HDD and would require a complete dismanteling of the drive in specialized laboratory. Grinding sounds are typical for such scenario and requires a immediate stop of operation. However, minor shock events of the HDD and/or when the actuator arm is in its “parking position” might not lead to great physical damage, but result in mechanical disfunctioning and read/write-errors. This regularly leads to clicking, knocking or ticking noises, which stem from abnormal behaviour of the disk’s read-and-write head, when it is repeatedly trying to read a sector. If a hard disk makes noise, data loss is likely to occur in the near future or has already happened. Grinding or screeching noise should be an indicator to power down the device immediately and hand it over to a specialized data recovery laboratory, in order to secure the remaining evidence. Given minor clicking or knocking noise, one might try to recover the data with the help of specialized software as soon as possible, as it is discussed in this blog post. ## Acquisition of data from erroneous drives ### Standard approach with GNU ddrescue GNU ddrescue is the go-to tool to perform data recovery task with open source tooling1. It maximizes the amount of recovered data by reading the unproblematic sectors first and scheduling areas with read errors for later stages by keeping track of all visited sectors in a so-called mapfile. ddrescue has an excellent and exhaustive manual to consult 2. To get a first glimpse ddrescue’s procedure, which employs a block-size-changing algorithm, can summarized as follows: Per default, it’s operation is divided in four phases, where the first and last one can be divided in passes, while each phase consults the mapfile to keep track of the status of each sector (or area) in its mapfile. 1. Copying: Read non-tried parts, forwards and backwards with increasing granularity in each pass. Record the blocks, which could not be read, as non-trimmed in the mapfile. 2. Trimming: Blocks, which were marked as non-trimmed, are trimmed in this phase, meaning to read from the edge forward sector by sector until a read error is encountered. Then read the sectors backwards from the edge at the block’s end until the sector read fails and keep track of the sectors in between as non-scraped in the mapfile. 3. Scraping: In this phase non-scraped-block is scraped forward sector by sector, while marking unreadable sectors as bad. 4. Retrying: Lastly, the bad sectors can be read$n$-times with reversed directions for each try, which is disabled by default and can be set via the parameter --retry-passes=n. Unreadable sectors are filled with zeros in the resulting image (or device). Using ddrescue with its sane default settings is as simple as running ddrescue /dev/sdX out.img mapfile  In order to activate direct disk access and omit kernel caching, one must use -d/--idirect and set the sector sizes via -b/--sector-size. An indicator for kernel caching is, when the positions and sizes in the mapfile are always a multiple of the sector size 3. # Check the disk's sector size SECTOR_IN_BYTES=$(cat /sys/block/sdX/queue/physical_block_size)
# Run ddrescue with direct disk access
ddrescue -d -b $SECTOR_IN_BYTES /dev/sdX out.img mapfile  ### Gradual approach by combining partclone and ddrescue While the straightforward sector-by-sector copying of a failing HDD with ddrescue often yields good results, it might be very slow. Given the fact, that acquiring evidence after a damage is a race against the clock, because with every rotation of the platter the probability of a ultimate drive fail increases, one might want to ensure, that critical data gets acquired first by determining the actually used blocks of the filesystem and prioritizing those 4. To accomplish this, the open source tool for cloning partitions partclone comes into the (inter)play with ddrescue. partclone “provide[s] utilities to backup used blocks” and supports most of the widespread filesystems, like ext{2,3,4}, btrfs, xfs, NTFS, FAT, ExFAT and even Apple’s HFS+ 5. One of its features is the ability to list “all used blocks as domain file”, so that “it could make ddrescue smarter and faster when dumping a partition” 4. partclone operates in a similar manner like ddrutility’s tool ddru_ntfsbitmap, which extracts the bitmap file from a NTFS partition and creates a domain file 6, but works with other filesystems as well by looking at their block allocation structures to determine used blocks and store those in the a/m domain mapfile 7. The term rescue domain describes the “[b]lock or set of blocks to be acted upon” 8. By specifying --domain-mapfile=file the tool is restricted to look only at areas, which are marked with a + 9. #### Generating a domain mapfile To generate a domain file simply use partclone with the -D flag and specify the resulting domain file via -o partclone.[fstype] -s /dev/sdXY -D -o sdXY.mapfile  If you want to run ddrescue on the whole disk and not just the partition, in order to image the whole thing iteratively, it is neccessary to use --offset_domain=N, which specifies the offset in bytes to the start of the partition. This will be added to all position values in the resulting domain mapfile. To create a such a file use the following commands: # Retrieve the offset in sectors OFF_IN_SECTORS=$(mmls /dev/sdXY | awk '{ if ($2 == "001") print$3}')

# Retrieve sector size
SECTOR_IN_BYTES=$(mmls /dev/sdX | grep -P 'in\s\d*\-byte sectors' | \ grep -oP '\d*') # Calculate offset OFF_IN_BYTES=$((OFF_IN_SECTORS * SECTOR_IN_BYTES))

# Create domain file
partclone.[fstype] -s /dev/sdXY -D --offset_domain=$OFF_IN_BYTES \ -o domainfile  The resulting domain file looks like illustrated in the following listing: cat domainfile # Domain logfile created by unset_name v0.3.13 # Source: /dev/sdXY # Offset: 0x3E900000 # current_pos current_status 0xF4240000 ? # pos size status 0x3E900000 0x02135000 + 0x40A35000 0x05ECB000 ? 0x46900000 0x02204000 + 0x48B04000 0x005FC000 ? <snip>  The offset at the top denotes the beginning of the file system. The current_pos corresponds to the last sector used by the file system. Used areas are marked with a + and unused areas with a ? 7. #### Acquiring the used blocks with ddrescue To acquire only those areas with ddrescue, which are actually used by the file system and therefore have been denoted with a + in the domain file, us the following command. # Clone only blocks, which are actually used (of part Y) ddrescue --domain-log domainfile /dev/sdX out.img mapfile # Check if acquisition was successful fsstat -o$OFF_IN_SECTORS out.img


Since you already know the offset, you might omit to clone the partition table on the first run. After completion of the a/m command, you can be sure, that the mission critical file system blocks have been acquired, which can be double-checked by diffing the domain file and the mapfile, like this diff -y domainfile mapfile.

#### Acquiring the remaining blocks with ddrescue

So the additional sectors, which might contain previously deleted data, of the disk can be imaged in a subsequent and lengthy run without having to fear a definitive drive failure too much. To do this simply supply ddrescue the mapfile, which recorded all previously generated in the previous run without restricting the rescue domain this time, so that it will add the remaining blocks, which were either zero filled or omitted entirely:

# Clone remaining blocks
ddrescue /dev/sdX out.img mapfile
# Check result by inspecting the partition table
mmls out.img


After the completion of this procedure, which is a fragile process on its own, some kind of integrity protection should be employed, even though the source media could not be hashed itself. For example, this could be done by hashing the artifacts and signing the resulting file, which contains the hashes.

## Summary

The present blogpost discussed the usage of ddrescue as well as the gradual imaging of damaged drives. In order to acquire mission critical data first and rather timely, partclone was used to determine the blocks, which are actually used by the file system residing on the partition in question. This information was recorded in a so-called domain file and fed to ddrescue via the command line parameter --domain-log, so that the tool limits its operation on the blocks specified in there. Afterwards, another lengthy run could be initiated to image the remaining sectors.

# Logical imaging with AFF4-L

<2021-08-03>

## tl;dr

Using AFF4-L containers is an efficient and forensically sound way of storing selectively imaged evidence. There exist two open source implementations to perform this task: c-aff4 and pyaff4. To image a directory with aff4.py, run:

python3 aff4.py --verbose --recursive --create-logical \
$(date +%Y-%m-%d)_EVIDENCE_1.aff4 /path/to/dir  If you have to acquire logical files on Windows systems, c-aff4 can be used with Powershell in a convenient way to recurse the directory of interest: Get-ChildItem -Recurse C:\Users\sysprog\Desktop\  | ForEach {$_.FullName} | .\aff4imager.exe --input '@' 
--output .\$(Get-Date -UFormat '+%Y-%m-%dT%H%M%S')_EVIDENCE_2.aff4  Using the @-sign as the input filename (after --input), makes aff4imager read the list of files to acquire from stdin and place those in the resulting container, specified by --output. To list the acquired files stored in the container and the corresponding metadata, run: # List files in container, --list .\aff4imager.exe -l .\container.aff4 # View metadata, --view .\aff4imager.exe -v .\container.aff4  For the full documentation on aff4imager.exe’s usage, refer the official documentation at http://docs.aff4.org/en/latest/. While this worked well for me, aff4imager.exe does not seem to be an overall mature tool. Once I got an error, when I tried to image a large directory with the -i'@' syntax on a Debian system. Another time I was very surprised, that aff4imager.exe (Commit 657dc28) truncates an existing output file without asking for approval, although the help-page states it would append to an existing container by default. So, please take this warning seriously and use the tool only after extensive testing and consider to contribute some bug fixes eventually. ## Verdict Logical imaging is nowadays often the preferred way of acquiring digital evidence. AFF4-L seems to be the best container-format defined in an open standard. pyaff4 and aff4imager (provided by c-aff4) are two open source tools to image files logically “into” AFF4-L-containers. The present blogpost gave an introduction into their usage. Unfortunately the AFF4-L-container format and the corresponding imaging tools seem not get the momentum and attention of the open source community they deserve, which might change with their popularity. ## Footnotes: 4 Schatz, B. L. (2019). AFF4-L: a scalable open logical evidence container. Digital Investigation, 29, S143-S149. https://www.sciencedirect.com/science/article/pii/S1742287619301653 5 Ibid, p. 15 9 The oneliner-version is: Get-ChildItem -Recurse C:\Users\sysprog\Desktop\ | ForEach {$_.FullName} | .\aff4imager.exe -i '@' -o .\$(Get-Date -UFormat '+%Y-%m-%dT%H%M%S')_EVIDENCE_2.aff4 # Analyzing VM images <2021-06-20> ## tl;dr: Virtualization is everywhere nowadays. So when you have to perform an analysis of an incident, you often come across virtual hard disks in the form of sparsely allocated VMDKs, VDIs, QCOW2s and the like. To inspect the data in the virtual machine images you have several options to do so. guestmount is a helpful tool to perform a logical inspection of the fileystem, while qemu-nbd is a good choice, if you want to work on a raw bock device without having to convert a sparsely allocated virtual disk into a raw image or to rely on proprietary software. ## Motivation Popular open source forensic analysis suites like Sleuthkit can not handle sparsely allocated VMDKs and several other virtual machine image formats, like VDI, QCOW2 etc. Since virtualized systems are part of almost every investigation, it is often needed to perform a conversion of some kind, which can be cumbersome, if you decide to convert each and every piece of evidence to raw-format or .E01. Furthermore you might depend on proprietary tools like vmware-mount, which are not freely available. Maybe you heard, that Sleuthkit can handle VMDKs. This is partly true, since it can handle only the so-called flat format and not sparsely allocated VMDKs. To check, whether you could ingest the .vmdk-file in question, you have to look at the so-called text descriptor within the .vmdk, which describes the layout of the data in the virtual disk 1 and starts at Offset 0x200. It may look like the following snippet: # Disk DescriptorFile version=1 CID=7cecb317 parentCID=ffffffff createType="monolithicSparse" # Extent description RW 4194304 SPARSE "NewVirtualDisk.vdi.vhd.vmdk" # The disk Data Base <snip>  Within this descriptor you have to look for the "createType"-field, which specifies, whether the disk is flat or monolithic and if it was sparsely allocated or if it is of fixed size. To conveniently check for this, use the following command, which greps for the string within the byte-range of the offsets 0x200 and 0x400: xxd -s 0x200 -l 0x200 file.vmdk | xxd -r | strings | grep -i sparse  If it is a sparsely allocated .vmdk-file, it is recommended by some renowned practitioners in the forensics community to use qemu-img to convert it then to a raw image 2 and be able to conduct a proper post-mortem-analysis. # Check the size of the resulting raw image cat evidences/srv.ovf | grep '<Disk' # Convert .vmdk to raw qemu-img convert evidences/srv.vmdk ./srv.dd  While this works well, it is a time- as well space-consuming endevour. So I have been looking for alternative solutions. ## Inspecting sparsely-allocated virtual disks without conversion ### Using vmware-mount One solution is – as already mentioned above – the usage of VMWare's tool vmware-mount, which creates a flat representation of the disk at a given mount point, when the -f-option is specified. This approach, however, requires the use of a piece of proprietary software named Virtual Disk Development Kit (VDDK), which is not easily available (and of coursee not in Debians package archives ;)). ### Using guestmount for content inspection If you only need to look and extract certain files, guestmount of the libguestfs-library is a a handy solution. Libguestfs is the library for accessing and modifying virtual machine images 3. After installing it via the package manager of your distribution, you can inspect the partitions inside the virtual machine image in question by using virt-filesystems. Once you identified the partition of your interest, you can mount it via guestmount – read-only of course – on a mount point of your forensics workstation. Alternatively you might explore it interactively with guestfish – the so-called guest filesystem shell. (If you install libguestfs-rescue, the tool virt-rescue comes already with TSK installed, but it is a bit cumbersome to use imho). So you might use the following commands to get started with guestmount: # Install it via apt sudo apt-get install libguestfs-tools # Debian/Ubuntu # Check which partition to mount virt-filesystems -a disk.img -l # Mount it via guestmount on a mount point of the host guestmount -a evidences/srv.vmdk -m /dev/sda2 --ro ./mntpt # Alternatively: Inspect it interactively with guestfish guestfish --ro -a evidences/srv.vmdk -m /dev/sda2  At times just accessing the filesystem might not be enough. If this is the case, you might look at the following solution. ### Using qemu-nbd To access the sparsely allocated virtual disk image as a raw device, it is advisable to use qemu-nbd, which is the QEMU Disk Network Block Device Server. Install the containing package named qemu-utils via apt and load the NBD-kernel module via modprobe, then use qemu-nbd to expose the virtual disk image as a read-only block device – in the following example /dev/nbd0. Then you can work on it with Sleuthkit or your favorite FS forensics tool. # Install qemu-nbd sudo apt install qemu-utils # Load NBD kernel module sudo modprobe nbd # Check, that it was loaded sudo lsmod | grep nbd # Use QEMU Disk Network Block Device Server to expose .vmdk as NBD # Note: make it read-only ! sudo qemu-nbd -r -c /dev/nbd0 evidences/srv.vmdk # Check partition table sudo mmls /dev/nbd0 # Access partitions directly sudo fsstat /dev/nbd0p2 # Disconnect NBD sudo qemu-nbd -d /dev/nbd0 # Remove kernel module sudo rmmod nbd  IMHO this is the quickest and most elegant way of performing post-mortem-analysis or triage on a sparsely allocated virtual machine disk image. There might be some alternatives ### Specific tasks with vminspect Another interesting solution is vminspect, which is a set of tools developed in Python for disk forensic analysis. It provides APIs and a command line tool for analysing disk images of various formats, whereas it relies on libguestfs. It focuses on the automation of virtual disk analysis and on safely supporting multiple file systems and disk formats. On the one hand it is not as generic as the previous presented solutions, but it offers some specific capabilities helpful in forensic investigations, like extracting event timelines of NTFS disks or parsing of Windows Event Log files. To make it tasty for you and get you going, refer to the following command-snippets taken from VMInspect's documentation 4: # Compares to registries vminspect compare --identify --registry win7.qcow2 win7zeroaccess.qcow2 #Extract the NTFS USN journal vminspect usnjrnl_timeline --identify --hash win7.qcow2 # Parse eventlos vminspect eventlog win7.qcow2 C:\\Windows\\System32\\winevt\\Logs\\Security.evtx  If you have know a better way of doing this or want to leave any notes, proposals or comments, please contact me under ca473c19fd9b81c045094121827b3548 at digital-investigations.info. ## Footnotes: # Dump Linux process memory <2021-05-27> ## tl;dr If you need to acquire the process memory of a process running on a Linux system, you can use gcore 1 to create a core file or alternativly retrieve its memory areas from /proc/<PID>/maps and use GDB 2 itself to dump the content to a file. For a convenient way to do this, refer to the a basic shell script hosted as a gist named dump_pmem.sh 3. ## Motivation It is well known, that process memory contains a wealth of information, therefore it is often need to inspect the memory contents of specific process. Since I wanted to write autopkgtests for the continuous integration of memory forensics-software packaged as Debian packages, I was looking for a convenient way to dump the process memory (preferrably with on-board equipment). ## One-liner solution I found a neat solution from A. Nilsson on serverfault.com 4, which I enhanced to create a single output file. Basically it reads all memory areas from the proc-filesystem, which is a pseudo-filesystem providing an interface to kernel data structures 5 and then utilizies gdb’s memory dumping capability to copy those memory regions into a file 6. To use the one-liner-solution, which is a bit ugly indeed, just modify the PID and run the following command: sudo su -; \ PID=2633; \ grep rw-p /proc/${PID}/maps \
| sed -n 's/^$$[0-9a-f]*$$-$$[0-9a-f]*$$ .*$/\1\t\2/p' \ | while read start stop; \ do sudo gdb --batch --pid${PID} -ex "append memory ${PID}.dump 0x$start 0x\$stop" > /dev/null 2>&1; \
done;


Note, that GDB has to be available on the system, whereas glibc-sources are not required.

## Script dump_pmem.sh

Furthermore, I created a basic shell script, which can be found at

It simplifies the process of dumping and creates an additional acquision log (which is printed to stderr). This is how you use it:

sudo ./dumpmem.sh

Usage:
dump_pmem.sh <PID>

Example:
./dump_pmem.sh 1137 > 1337.dmp


Note, that root-permissions are obviously needed and a process ID has to be supplied as positional argument. The resulting output has to be redirected to a file. Informational output printed to stderr looks like the following snippet:

2021-05-27T08:48:34+02:00       Starting acquision of process 1337
2021-05-27T08:48:34+02:00       Proc cmdline: "opensslenc-aes-256-cbc-k-p-mdsha1"
2021-05-27T08:48:34+02:00       Dumping 55a195984000 - 55a19598c000
2021-05-27T08:48:34+02:00       Dumping 55a19598c000 - 55a19598e000

<snip>

2021-05-27T08:48:36+02:00       Dumping 7f990d714000 - 7f990d715000
2021-05-27T08:48:37+02:00       Dumping 7ffe3413f000 - 7ffe34160000
2021-05-27T08:48:37+02:00       Resulting SHA512: cb4e949c7b...


Note, that the script currently does not performs zero-padding for recreating the virtual address space as seen by the process.

gcore is part of the GNU debugger gdb, see https://manpages.debian.org/buster/gdb/gcore.1.en.html