Disk Imaging Drag Race
The following text is an excerpt from a paper co-written by Jonathan Farbowitz, Flaminia Fortunato, Caroline Gil and myself. The paper was written as a part of a year-long collaborative research project on disk imaging. An abridge version of the paper will eventually be available from the American Institute for Conservation’s Electronic Media Review. The full version of the paper will be made available online, details tbd.
I tracked the results of the tests described below in a spreadsheet, which you can download here.
Because this is an excerpt from a larger paper, if you’re not familiar with disk imaging, this might not be the most friendly place to start. For a list of resources on the topic, check out the MoMA Media Conservation Initiative’s website: https://www.mediaconservation.io/disk-imaging.
There are many applications to choose from when deciding to create a disk image. As a part of developing the disk imaging workflow at the Hirshhorn, five common disk imaging applications were tested. These applications were chosen based partly on the media conservation team’s familiarity with these tools, as well as their prevalence in the field of digital preservation. In order to assess the performance of different disk imaging software, the same 32GB USB 3 Samsung flash drive, containing 2.42GB of data in a FAT32 file system was imaged using ddrescue (v.1.24), Guymager (v.0.8.8), FTK Imager for Mac (v.3.1.1 CLI), FTK Imager for Windows (v.3.2.4.6), and Tableau Forensic Imager (v.1.2.1).
The tests were conducted using the following technical environment: Mac Pro (Late 2013) OS El Capitan (v.10.11) equipped with VirtualBox Virtual Machine (v. 5.2.16) and Windows 10 Parallels Desktop Virtual machine.
Guymager was run in a BitCurator version 2.0.6 (Ubuntu 18.04, “Bionic” based) VirtualBox Virtual Machine; FTK Imager for Windows and Tableau Forensic Imager were run on a Windows 10 Parallels Desktop Virtual Machine, and ddrescue and FTK Imager for Mac were run on the native OS El Capitan (v. 10.11).
These tests were performed as a part of developing a disk imaging workflow at the Hirshhorn. The results of these tests should be viewed in that context, keeping in mind that the eccentricities of a particular computer system can significantly influence results, so while these results will hopefully be of help to others, they should not take the place of further testing, or individual evaluation.
Perhaps the most surprising result of the comparisons at the Hirshhorn was that there were very few surprises at all. While it is logical that five applications that perform the same function, when run on the same machine, would produce very similar results in a similar amount of time, one might expect that subtle differences in the software, their required operating system, or the need to virtualize those operating systems would result in more varied performance. However, the Hirshhorn’s tests yielded similar results across the five applications.
Write Blocker Comparison
The use of a write-blocker, sometimes called a forensic bridge, to prevent inadvertent alteration to a digital storage device is a best practice in media conservation. In order to better understand this tool’s impact on disk imaging software, each software being evaluated by the Hirshhorn was tested without a write-blocker, with a USB 2.0 Tableau Forensic Bridge (model T8-r2), and a USB 3.0 Tableau Forensic Bridge (model T8u). As shown in the table below, labeled “Forensic Bridge Effects on Disk Imaging,” without the use of a write-blocker, each program produced a disk image of the 32GB Samsung flash drive in about 50 minutes. FTK Imager for Mac and ddrescue took slightly longer; 1 hour, 6 minutes, and 1 second; and 1 hour, 9 minutes, and 18 seconds; respectively. To be clear, disk imaging a volume without the use of a write-blocker is not recommended, as connecting the volume to a computer could inadvertently alter the targeted data. The speed at which an image is acquired without a forensic bridge is notable, however, as all imaging times increase with the introduction of a USB 2.0 Tableau Forensic Bridge (model T8-r2). Of course, using a USB 2.0 forensic bridge in between a USB 3.0 drive and a USB 3.0 port is inefficient. USB 2.0 is a slower interface than USB 3.0, so obviously introducing a slower interface will result in a slower data transfer. However, the tests showed that certain software, more than others, were particularly impacted by the use of a USB 2.0 Tableau Forensic Blocker. For instance, creating a raw disk image using Guymager only took approximately 25 minutes longer when using a USB 2.0 write blocker, while ddrescue took an additional 1 hour and 45 minutes. Almost fully tripling the time to disk image the data carrier compared to when a forensic bridge was not employed.
Finally, when using a USB 3.0 forensic bridge (Tableau model T8u), the imaging times for all disk imaging applications were comparable to the disk imaging time needed when the write blocker was not used. This test also showed that creating an EWF disk image (regardless of compression) took mildly longer than a raw disk image, but with an essentially negligible difference, 2 minutes at most.
Forensic Bridge Effects on Disk Imaging
Application | Write blocker interface | Format | Time to Image |
ddrescue | None | raw | 01:09:18 |
FTK for mac | None | raw | 01:06:01 |
FTK for mac | None | E01 - compression 5 | 01:07:32 |
FTK for Windows | None | raw | 00:49:44 |
FTK for Windows | None | E01 - compression 9 | 00:49:49 |
Guymager | None | raw | 00:49:16 |
Guymager | None | Guymager EWF - compression “fast” | 00:49:47 |
ddrescue | USB 2.0 | raw | 02:55:24 |
FTK for mac | USB 2.0 | E01 - compression 5 | 02:54:33 |
FTK for Windows | USB 2.0 | E01 - compression 9 | 01:42:13 |
Guymager | USB 2.0 | raw | 01:15:29 |
Tableau Imager | USB 2.0 | raw | 01:39:00 |
ddrescue | USB 3.0 | raw | 00:50:24 |
FTK for mac | USB 3.0 | E01 - compression 5 | 00:50:06 |
FTK for Windows | USB 3.0 | E01 - compression 5 | 00:48:59 |
Guymager | USB 3.0 | Guymager EWF - compression “fast” | 00:48:53 |
Tableau Imager | USB 3.0 | E01 - no compression | 00:48:47 |
Compression Comparison
Apart from a write-blocker, another choice media conservators are faced with when evaluating disk imaging software is compression. FTK Imager, Tableau Forensic Imager, and Guymager all offer levels of compression when creating EWF disk images. The compression options of the different software offer the user a choice between imaging speed and level of compression. This is a logical trade off - the more compressed the data, the longer the compression takes. The FTK User manual offers a practical, but nonspecific explanation of the compression levels, stating that level “1” compression is the “fastest, least compressed” option, and that “9” is the “smallest file, slowest to create.” The Tableau Forensic Imager software provides a relative definition, explaining that the software’s “Maximum Speed” compression is the equivalent to FTK Imager’s “1” compression level, and that the “Minimum Size” compression is the equivalent to FTK’s “9.” Guymager offers the most information about its compression options, “Fast,” “Best,” or “Empty.” In the software’s configuration file, stored by default at “/etc/guymager/guymager.cfg,” “Empty” compression is said to do “no compression, except if a block contains zero bytes only. Such blocks are replaced by their compressed equivalent.” The other two options “Best” and “Fast,” are defined as using “Fast Z” and “Best Z” compression. The “Z” compression refers to the “zlib” abstraction of the DEFLATE compression algorithm (“Zlib” 2018).
To determine the impact of these different compression options, the same 32 GB flash drive was imaged with FTK Imager, Tableau Forensic Imager, and Guymager. In the case of FTK’s 1-9 scale, the top, middle and bottom of the scale were used to gauge the breadth of the spectrum. Similarly the most compressed and least compressed options offered by Guymager and Tableau Imager were tested, in an aim to determine the fastest the test drive could be imaged, and the highest level of compression that could be achieved. The goal of testing this variable was to enable an informed choice when selecting a compression setting - how big of a difference is level 1 compression and level 9? How much more time is spent on creating a more compressed disk image? How much smaller will the disk image be if a media conservator chooses to invest that additional time? Given that the 32 GB drive contained just under 2.5 GB of data, the expectation was that the software would be able to compress a great deal of empty space on the drive. The results of these tests are summarized in the table below:
Disk Imaging Software Compression Comparison
Application | Compression | Time to Image | File Size |
FTK for mac | "0" | 01:08:23 | 32.1 GB |
FTK for mac | "5" | 01:07:32 | 22.68 |
FTK for mac | "9" | 01:07:46 | 22.67 GB |
FTK for Windows | "1" | 00:49:37 | 23.2 GB |
FTK for Windows | "5" | 00:49:43 | 21 GB |
FTK for Windows | "9" | 00:49:49 | 20.9 GB |
Guymager | Empty | 00:48:44 | 32.1 GB |
Guymager | Fast | 00:48:53 | 23.2 GB |
Tableau Imager | "Maximize Speed" | 00:49:02 | 21.6 GB |
Tableau Imager | "Minimum Size" | 00:49:05 | 21.0 GB |
Tableau Imager | "None" | 00:48:47 | 29.8 GB |
In the test performed at the Hirshhorn, changing these variables had little result on the imaging applications’ speed, or the resulting size of the disk image. This was surprising, and at first seemed to contradict the described functionality of the settings. In almost every case, the “fastest” setting took about as long as the most compressed setting (a difference of 12 seconds or less), and in the case of Guymager actually took nine seconds longer. With so much empty space on the drive, why was the software unable to create smaller disk images?
While the root cause of this inconsistency was never verified, it is possibly due to the drive that was selected and not due to the software. While the drive only contained 2.42 GB of data, unused space on the drive may have not been read as “empty” by the imaging software. If the disk imaging software was unable to verify the unused space on the drive as containing zero data, it would have been unable to compress these areas, and therefore render the compression options ineffective.
The Tableau Imager software documentation underscores the impact the data being imaged has on the imaging software, stating that the “compression often yields the greatest benefit on images which have unused or blank regions, for example, 0s, 0xff’s, etc” (Tableau Imager Software 2019). Guy Voncken, the developer of the Guymager software, explained that
“For ‘empty’ to work there must be whole chunks filled with only zeroes. A chunk is the block size used in EWF, it very often is 32K. So, only if there is a block with 32768 zero bytes then the ‘empty’ compression jumps in and replaces that chunk by a precomputed z-compressed-zero-chunk” (Voncken and Colloton 2018).
For any number of reasons, available space on a hard drive may not contain 0s. One of the most likely explanations is that there was pre-existing data on the drive, that was deleted. The flash drive that was used for testing was formatted with the FAT32 file system. When a file is deleted from a FAT file system, it is simply removed from the File Allocation Table. This makes the sectors the file was stored on available for overwriting, but the data remains at the same physical address until needed for new data (“Undeletion” 2019). In this way, if data had previously been stored on the drive, and then deleted, the sectors would still not read as only zeros, and would therefore not benefit from the disk imaging software’s compression. Voncken also indicated that the format of the drive may be the culprit, as “SSDs and sticks are generally problematic for the ‘empty’ setting, as even brand new devices often are not initialised with zeros (unlike HDDs)” (Voncken and Colloton 2018).
Regardless of the cause, to better test the ability of the software’s compression functions, Voncken suggested using the “dd” command to create a large file on the test drive containing only “0s,” and then to erase that file, in an attempt to “zero out” any unused space on the drive (Voncken and Colloton 2018). After following the instructions Voncken provided on the Guymager wiki, the drive was re-imaged, in the hopes of verifying the hypothesis that the unused sectors on the drive were simply not containing zeros, and accordingly would now benefit from the disk imaging software’s compression.
The results of the tests, displayed in the table below, show that the disk imaging software’s performance were changed by following Voncken’s advice. That being said, the ensuing image files were still quite larger than expected. The disk images were more compressed by an average of approximately three gigabytes (Tableau Imager going from 21.0 GB to 18.2 GB, FTK Imager from 20.9 GB to 18.2 GB, and Guymager going from 23.2 GB to 20.1 GB). This resulted in disk images that were, at their most compressed, 15.78 GB larger than the data that was stored on the volume that was imaged. It is possible that the intervention, which consisted of creating a large file containing only zeros using the dd command, was not fully successful. The cause of this larger file size is still unresolved, although the volatility of reads from unused sectors on flash media may be the source of the issue (Voncken and Colloton 2018). Regardless, these tests demonstrate that the effectivity of the compression options provided by disk imaging software may vary from drive to drive, and are dependent on the data being captured.
Maximizing Flash Storage Compression
Application | Intervention | Compression | File Size |
FTK for Windows | Pre-zeroing | "9" | 20.9 GB |
FTK for Windows | Post-zeroing | "9" | 20.9 GB |
Guymager | Pre-zeroing | Fast | 23.2 GB |
Guymager | Post-zeroing | Fast | 20.1 GB |
Tableau Imager | Pre-zeroing | "Minimum Size" | 21.0 GB |
Tableau Imager | Post-zeroing | "Minimum Size" | 18.2 GB |
Sidecar File Comparison & Verification
Each of the five disk imaging applications reviewed as a part of the Hirshhorn’s tests produces a sidecar “info” file that describes the disk imaging process. All of these files, typically with file extensions like .info or .txt, are automatically produced by the software, and accompany the disk image output. All of the applications include the start and end time of the acquisition process, but that is where their similarities end. The ddrescue output is likely the most different from the other four reviewed, which makes sense given that ddrescue is not designed for digital forensics specifically. While not including the checksum hashes for the image, ddrescue does certain technical metadata that some of the other tools do not, like the block size of the file system, or specific errors encountered. The Guymager output is the most thorough, including the most fields, the most specific description of both the “host” machine and the “target” device, and listing in the info file the commands that were run to collect this information. Perhaps most significantly, the Guymager application includes the option to automate source verification by re-hashing the source volume after imaging, and include both the checksum of the volume and the image in the output. The developer of the software included the feature after observing that “hard drives containing bad sectors may deliver different data each time a bad sector is read” (Voncken 2015).
Guymager’s transparency and level of technical detail does highlight a difference between this free and open source application and the other forensic tools (Tableau and FTK) assessed as part of this research project. Both FTK and Tableau include “verification” in their outputs, but this does not include hashing the entire volume as a disk. The technical support for the company that develops FTK Imager, Access Data, was contacted to better understand the different verification options that the software provides. Like the other disk imaging software tested, FTK Imager computes a hash as it is imaging the volume. The info file describes this as the “computed hash.” It also provides a “Report Hash.” After 3 inquiries, technical support defined the process of creating the Report Hash as “When the image is complete, it is verified and this is the hash value of that verification. It should match that of the computed hash.” After seeking clarification, the technical support professional’s manager, replied that “Verifying the image goes out and scans the drive/folder you just imaged and compares that hash with the hash of the contents of the newly created image….It hashes ALL the contents (files and subfolders) of the drive or location you imaged” (Harmon 2019). Again pushed for more details the Advanced Product Support Engineer confessed “I am not sure of the details” (Harmon 2019).
Access Data and Guymager’s different verification functions represent different philosophies on the verification process, and either may be more appropriate for a particular use. If both functions were desired, either could be augmented with a simple additional step to perform the other function manually. Functionality aside, the lack of clarity and specificity in this interaction contrasted strongly with interactions with Guy Voncken, the developer of Guymager, on the specific compression algorithm used by the software, and troubleshooting timeout errors with the software. Voncken responded typically within the same day to very specific questions, explained his thinking, provided examples, and offered potential solutions to challenges with the software.
Conclusion
The choice of which disk imaging tools to adopt and incorporate into a workflow involves weighing an array of variables, many of which will be specific to an institution. The easy of using a particular operating system can often be dictated by other departments, existing hardware is commonly inherited from predecessors, and of course budgetary considerations are always pertinent. The digital preservation actions of an institution can also be dictated by the museum's digital asset management system, or storage limitations. Specific metadata fields may be of value to different collecting institution for any variety of reasons, so the significance of a particular field is best assessed on a case-by-case basis. In general, functions and technical metadata generation not automatically performed by a particular application, can still be produced manually, or through customized scripting. The Guggenheim has used different programs depending on what computers were available at the time and how much hard drive space or which write-blocking interfaces were available. The Hirshhorn has had success with different applications for different types of media. And of course, the motive for disk imaging may be different from volume to volume, and therefore different tools may be more appropriate if the physical volume is of great significance, if only the logical file system is of a concern, or if disk imaging is simply being performed for research purposes.
The tests conducted at the Hirshhorn found similar functionality and performance across the five disk imaging applications reviewed by the museum. The inability to significantly compress the unused space on the test drive was consistent from software to software, again demonstrating fairly consistent behavior across the tools tested. Even in cases where these tools differ, in verification or automated metadata creation, much of the same results can be achieved through augmenting one’s workflow through additional steps or tools, if necessary. Of course, the functionality of these tools cannot be viewed in a vacuum. The way that a drive structures the data it stores, how the file system organizes that data, and the data itself, are all significant to understanding the process of disk imaging. In the same way, the disk image must exist within an institution and a repository which will have its needs and limitations. The choice of disk imaging software should be made with all of these unique factors in mind.
References
Harmon, Brandon. 2019. AccessData Support Request #1084755. Conversation with Eddy Colloton.
Tableau Imager Software v.1.2.1. n.d. “E01 Format Specific Options.” Guidance Software. Accessed August 5, 2019.
“Undeletion.” 2019. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Undeletion&oldid=883845962.
Voncken, Guy. 2015. “Guymager’s Source Verification.” Guymager Wiki. March 21, 2015. https://sourceforge.net/p/guymager/wiki/Guymager%27s%20source%20verification/.
Voncken, Guy, and Eddy Colloton. 2018. “Guymager / Wiki / Guymager’s Source Verification.” December 13, 2018. https://sourceforge.net/p/guymager/wiki/Home/#fd33/e1e8/0a96.
“Zlib.” 2018. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Zlib&oldid=875113857.