Written by Sarah Lake, Junior Digital Archivist
Context and Goals
Since August 2019, John Richan, Digital Archivist and Sarah Lake, Junior Digital Archivist have been working on a project to bridge gaps in digital archives processing workflows using open-source digital forensics applications found in the BitCurator Environment. This project, which is funded in part by a grant from the Young Canada Works Program, is part of the development of RMA’s Digital Preservation Lab which is set to launch in 2020.
From the BitCurator Consortium website:
The BitCurator Environment is a Ubuntu-derived Linux distribution geared towards the needs of archivists and librarians. It includes a suite of open source digital forensics and data analysis tools to help collecting institutions process born-digital materials. BitCurator supports positive digital preservation outcomes using software and practices adopted from the digital forensics community.
- In the BitCurator Environment you can:
- Create forensic disk images: Disk images packaged with metadata about devices, file systems, and the creation process.
- Analyze files and file systems: View details on file system contents from a wide variety of file systems.
- Extract file system metadata: File system metadata is a critical link in the chain of custody and in records of provenance.
- Identify sensitive information: Locate private and sensitive information on digital media and prepare materials for access.
- Locate and remove duplicate files: Know what files to keep and what can be discarded.
Dedicated workstation for all digital archives processing
Before undertaking this project, we were working with a variety of digital processing tools scattered over different computers, the reason for this being that some commonly-used digital forensics tools only run in a Windows operating system, while others, like those in the BitCurator environment, only run in Linux. We were using BitCurator tools through a virtual machine set up on a MacBook, and often had to transfer files to and from the Digital Archivist’s Windows computer to be able to use programs such as FTK Imager and DROID.
This workflow was problematic from an efficiency standpoint, because of the added steps and time wasted, but also from a preservation standpoint. Every time we copy files to another computer, there is a risk of data being modified. Also, while BitCurator is able to function in a virtual machine, we often experienced lagging and bugs as a result of this setup. BitCurator documentation suggests opting for a native installation to avoid these issues.
The first step to solve this problem was to purchase a dedicated computer that met the recommended specifications for installing BitCurator. Then, with the help of online documentation and advice from user group forums, we were able to do a native install of BitCurator alongside Windows in a dual-boot system. Being able to start the computer in Windows or in Ubuntu means that all of our digital processing tasks are now consolidated into a single workstation. Not only does BitCurator run more smoothly than it did in the virtual machine, but we can also access files from both the BitCurator and Windows partition of the computer without moving them, meaning that we can avoid risking data loss.
Closing gaps in digital processing workflow
Through a cycle of discussion, testing and revision with RMA staff, we have produced a first set of digital processing workflows intended to train staff working with BitCurator tools. These workflows enable us to examine and appraise the contents of obsolete media carriers like floppy disks, zip disks and optical media in an efficient and standardized way, while ensuring the long-term authenticity and integrity of the data in our care, and providing access to born-digital materials without compromising the confidentiality of sensitive information.
Using Brunnhilde, a command-line tool written by Concordia Digital Preservation Librarian Tim Walsh, we are now able to extract files from disk images, identify file formats, document technical metadata and run a virus scan on the contents all with a single command, whereas we were previously using four or five different tools to accomplish all of these tasks. Also, the Bulk Reviewer tool allows us to automatically identify and isolate files containing potentially sensitive information. Although it’s not yet perfect and still requires significant manual revision, it’s a crucial first step towards providing access to born-digital materials.
As we put these workflows into practice, we will continue to refine and adapt them for new situations or issues that we encounter. We expect that this project will serve as a model for us to follow as we further expand our digital preservation practices in the context of the launch of our new lab in 2020.