NMAAHC Disk Imaging Workshop
In the summer of 2021, my friend and colleague Caroline Gil & I taught a two-day workshop on disk imaging for the media archiving team at the National Museum of African American History and Culture (NMAAHC). The workshop focused on the process of disk imaging physical media: the hardware and software involved, quality assurance procedures for disk images, and, of course, use cases for disk images in a museum collection.
Gil and I have been talking disk imaging for quite some time now. Along with Flaminia Fortunato and Jonathan Farbowitz we wrote a research paper for the Electronic Media Review titled Towards Best Practices In Disk Imaging: A Cross-Institutional Approach which focused on when and how conservators working in art museums can use disk images to their advantage to preserve media artworks. The winter before the NMAAHC workshop, Gil and I co-taught a course remotely for the NYU Moving Image Archiving and Preservation (MIAP) program, the MA program we both attended. The syllabus for that course is available from the MIAP website, currently under Semester 4, Spring 2021. Farbowitz, Gil and I also co-authored a chapter on disk imaging for Deena Engle and Joanna Phillips’ upcoming book Conservation of Time-Based Media Art. So when Blake McDowell, Media Archivist for the NMAAHC and my Smithsonian colleague, reached out asking if we would like to create a two-day workshop on disk imaging, Caroline and I were ready to go.
Workshop AGENDA
Day One
Introduction to Disk Imaging
What is a disk image? How is it structured?Glossary of Terms
Sectors, File systems, Physical vs. Logical, Digital forensics, etc.EWF vs Raw
Overview of both formatsOptical Media Basics
Media specifications, disk imaging software and hardwareFloppy Disk Basics
Media specifications, disk imaging software and hardware
Day Two
Hard Drives
Media specifications, disk imaging software and hardwareTools: Hardware & Software
FREDs, Guymager, libEWF, write blockers, ESD protection, BitCurator, and moreQC and Reporting of a Disk Image
Verification, mounting, and moreAnalysis, Metadata and Reporting
Beyond “it’s data about data”Troubleshooting
Hex code, offsets, and diff'ing
We divided the workshop up into lectures, which were followed by Q&A sessions. I’ve shared a brief description of each lecture below, along with the slides and speaker notes as a downloadable PDF. Each day was closed out by disk imaging activities, which I won’t be diving into in this post.
Day One
Introduction to Disk Imaging
Our first day began with an introduction to disk imaging and disk images themselves. Many of the workshop participants had interacted with disk images in one way or another, so much of this was review. As with all of the lectures I’ll describe in this post, you can download the slides and speaker notes as a PDF.
Glossary of Terms
Disk imaging software is full of vocabulary and jargon which the average archivist is unlikely to have encountered. Some of these terms are used more broadly in computer science to describe specific parts of storage media or file systems, and others are entirely unique to disk imaging. The Digital Archival traNsfer, iNgest, and packagiNg Group, or DANNNG!, have a great resource for tackling this challenge, the Digital Archives Technical Glossary. A complete list of resources is available at the bottom of this post. In this lecture, Caroline and I introduced terms we would use (and further define) throughout the workshop. Download the slides and speaker notes as a PDF, here.
EWF vs Raw
This may seem early in the workshop to dive into the two leading disk image formats used for preservation, but we knew it was of great interest to our audience. Besides, with these formats and terms defined, we could continue to refer to them throughout all of our lectures, and continue to expound on their costs and benefits. Download the slides and speaker notes as a PDF, here.
Optical Media Basics
In three separate lectures we outlined how data is organized both physically and logically on different storage media. In the first we focused on optical media. The lecture describes the standardization of various optical media formats, the filesystems associated with optical media, conservation risks to the physical media, and format specific disk imaging tools commonly used with optical media. Download the slides and speaker notes as a PDF, here.
Floppy Disks Basics
The second media specific lecture of the workshop was all about floppy disks. Archival collections are rife with floppies! From personal digital archives to media art in museum collections, floppies hold data from decades of computing. Just as with optical media, we outlined the formats technical properties, how data is stored on the drive, and then preferred tools for imaging floppies. Download the slides and speaker notes as a PDF, here.
Day Two
HarD Drives
The last of our three format lectures focused on hard drives. The lecture begins with basics, like the difference between hard disk drives and solid state drives, transitions into the different connections and cables you will need when imaging drives, and concludes with recommendations for documenting hard drives and their contents prior to disk imaging. Download the slides and speaker notes as a PDF, here.
Tools: Hardware & Software
Diving into the nitty gritty of disk imaging, this lecture details both the hardware tools (everything from anti-static bags to FREDS) and the software tools (including how to install libewf on macOS) used to create disk image. Download the slides and speaker notes as a PDF, here.
Quality Assurance of a Disk Image
As quality control and assurance are baked into the process of disk imaging, we discussed both processes simultaneously. The importance of fixity verification, the process of mounting disk images and even a bit about virtualization. Download the slides and speaker notes as a PDF, here.
Analysis, Metadata and Reporting
Describing a disk image and its contents is an important aspect of the disk imaging process, but the convoluted nature of some tool’s reports can make the process challenging. To address this need we provided a comparison of the sidecar files created by the most popular disk imaging tools in cultural heritage, as well as a description of command line tools that can be used to extract useful metadata from disk images, such as disktype, sluethkit and fiwalk. Download the slides and the speaker notes as a PDF, here.
Troubleshooting
We rounded out our series of lectures with a bit of an “odds and ends” section dedicated to discussing our strategies for troubleshooting issues with disk images themselves, and the process of creating disk images. Download the slides and speaker notes as a PDF, here.
Conclusion
Thanks for your interest in our disk imaging workshop, I hope the documentation of Caroline Gil & my lectures is helpful. I have provided a list of resources that we compiled for further disk imaging research below. And of course, feel free to reach out with questions!
Resources
Disk Imaging - General
“Archival Science, Digital Forensics, and New Media Art.” by Dianne Dietrich and Frank Adelstein: https://doi.org/10.1016/j.diin.2015.05.004
“Archiving Computer-Based Artworks” by Jonathan Farbowitz: http://resources.conservation-us.org/emg-review/volume-5-2017-2018/farbowitz/.
BitCurator Users Google Group: https://groups.google.com/g/bitcurator-users
DANNNG Working Group: https://dannng.github.io/
DFXML on Github: https://github.com/simsong/dfxml
“Digital Curation at Work: Modeling Workflows for Digital Archival Materials.” By Colin Post, Alexandra Chassanoff, Christopher Lee, Andrew Rabkin, Yinglong Zhang, Katherine Skinner, and Sam Meister: https://ieeexplore.ieee.org/document/8791228/.
Digital Curation Google Group: https://groups.google.com/g/digital-curation
“Digital Forensics and Born-Digital Content in Cultural Heritage Collections” by Matthew G Kirschenbaum, Richard Ovenden, Gabriela Redwine, and Rachel Donahue: https://www.clir.org/pubs/reports/pub149/.
“Digital Forensics and Preservation” by John and Jeremy Leighton: http://www.dpconline.org/component/docman/doc_download/810-dpctw12-03pdf.
Digital Forensics Wiki : https://forensics.wiki/
“Digital forensics XML and the DFXML toolset” by Simson Garfinkle: https://simson.net/clips/academic/2012.DI.dfxml.pdf
“Disk Image Content Model and Metadata Analysis” developed by AVPS for Harvard Library:https://wiki.harvard.edu/confluence/display/digitalpreservation/Disk+Image+Formats.
“Disk Imaging” MoMA Media Conservation Initiative: https://www.mediaconservation.io/disk-imaging.
“EWF Specification” by Joacquim Metz: https://github.com/libyal/libewf/blob/master/documentation/Expert%20Witness%20Compression%20Format%20%28EWF%29.asciidoc
“Expert Witness Disk Image Format (EWF) Family,” Library of Congress: https://www.loc.gov/preservation/digital/formats/fdd/fdd000406.shtml.
“Forensically Sound Mac Acquisition In Target Mode” by Paul Henry: https://digital-forensics.sans.org/blog/2011/02/02/forensically-sound-mac-acquisition-target-mode.
“Guymager’s Source Verification” by Guy Voncken: https://sourceforge.net/p/guymager/wiki/Guymager%27s%20source%20verification/.
“Towards Best Practices In Disk Imaging: A Cross-Institutional Approach” by Eddy Colloton,Jonathan Farbowitz, Flaminia Fortunato, Caroline Gil: https://resources.culturalheritage.org/emg-review/volume-6-2019-2020/colloton/
“‘Tell Us about Your Digital Archives Workstation’: A Survey and Case Study.” by Elvia Arroyo-Ramírez, Kelly Bolding, Faith Charlton, and Allison Hughes: https://elischolar.library.yale.edu/jcas/vol5/iss1/16.
“Visual Binary Diff” by Christopher Madsen: https://www.cjmweb.net/vbindiff/
Optical Media
“CDs Are Not Forever: The Truth About CD/DVD Longevity” by Tina Sieber: http://www.makeuseof.com/tag/cds-truth-cddvd-longevity-mold-rot/
"Computer Hard Disks and Diskettes." by Christopher Dicks for the Canadian Conservation Institute: http://canada.pch.gc.ca/eng/1456340763236
“Getting Public Radio’s Legacy Off Ageing Rewritable CDs: An Interview with WNYC’s John Passmore.” by Trevor Owens:https://blogs.loc.gov/thesignal/2014/02/getting-public-radios-legacy-off-ageing-rewritable-cds-an-interview-with-wnycs-john-passmore/
“How Long Can You Store CDs and DVDs and Use Them Again?" by Fred R. Byers: http://www.clir.org/pubs/reports/pub121/sec4.html.
"How Long Is Long Term Data Storage." by Barry Lunt: http://www.imaging.org/ist/publications/reporter/articles/REP26_3_4_ARCH2011_Lunt.pdf.
"An Introduction to Optical Media Preservation." by Alexander Duryee: http://journal.code4lib.org/articles/9581.
“‘ISO Disk Image File Format,’ Sustainability of Digital Formats.” Library of Congress: https://www.loc.gov/preservation/digital/formats/fdd/fdd000348.shtml.
“NIST/Library of Congress (LoC) Optical Disc Longevity Study.” by Anna O. Nhan, Slattery, F. Byers, A. Klepchukov, J. Zheng, C. Shahani, M. Youket, E. Eusman, and N. Olson:http://www.loc.gov/preservation/resources/rt/NIST_LC_OpticalDiscLongevity.pdf.
“An Optical Media Preservation Strategy for New York University’s Fales Library & Special Collections” by Annie Schweikert: https://archive.nyu.edu/handle/2451/43877
“Preserving Write-Once DVDs Producing Disc Images, Extracting Content, and Addressing Flaws and Errors.” by Morgan Morel and George Blood: http://www.digitizationguidelines.gov/audio-visual/documents/Preserve_DVDs_BloodReport_20140901.pdf.
“To Image or Copy -The Compact Disc Digital Audio Dilemma.” by Alice Prael: http://campuspress.yale.edu/borndigital/2016/12/20/to-image-or-copy-the-compact-disc-digital-audio-dilemma/
Floppy Discs
“Archivist Guide to Kryoflux” https://github.com/archivistsguidetokryoflux/archivists-guide-to-kryoflux
“A Dogged Pursuit: Capturing Forensic Images of 3.5” Floppy Disks” by Dorothy Waugh: https://practicaltechnologyforarchives.org/issue2_waugh/
“Floppy Disk Data Separator Design Guide for the DP8473.” National Semiconductor Corporation: http://info-coach.fr/atari/hardware/_fd-hard/AN-505.pdf
“Kryoflux Support Forum” https://forum.kryoflux.com/
“Project KryoFlux—Part 2: Why Bother With It?” by Gough Lui: http://goughlui.com/2013/04/21/project-kryoflux-part-3-recovery-in-practise/