Inside the Archive #49: World Digital Preservation Day 2025 part one
A special post celebrating the 10th anniversary of the BFI National Archive’s digital preservation infrastructure.

Digital Preservation Infrastructure at the BFI
In the early 2010s, the BFI National Archive had been doing digital preservation work for some years. It followed good practice storage methods, such as keeping two copies of the digital media files on data tapes and storing them in separate locations, although the processes were managed manually. When we made a bid to the National Lottery to fund a mass film digitisation project (eventually leading to Britain on Film on BFI Player) we knew that we had to move up a gear: digitising 5,000 films from the collection would create a huge digital preservation challenge that could not be managed manually.
We conceived a Digital Preservation Infrastructure (DPI), which is a set of integrated systems and services designed to manage the full preservation lifecycle of digital moving image collections. This lifecycle can be broken down into distinct stages:
- acquisition of born-digital files or digitisation of physical objects
- validation and quality control
- documentation in our Collections Information Database
- storage to automated data tape libraries
- creation of viewing copies for BFI staff and public access via the Mediatheque at BFI Southbank and the online platform BFI Player
- data migration – both from obsolete or risky file formats to new formats; and from one generation of data tape to the next
The procurement process began in 2014 and the first data was ingested to DPI in November 2015. In the ten years since then, we have placed 20 Petabytes of data under preservation in DPI.
DPI is a combination of technology, standards, policies and workflows
We created a new secure network at the archive’s Conservation Centre, initially with fast 10Gbps connectivity, which has since been upgraded to an even faster 100Gbps. We also deployed high-spec Windows and Mac workstations, along with film and audio scanners, enabling the archive’s expert conservation teams to digitise videotape, film, and audio into high-quality digital media files.
We did similar for the team that manages born-digital acquisitions; from digital cinema, broadcast TV and other digital content creation streams that the archive collects. We implemented a fork of the BBC’s Redux system for automating the recording of television off-air, and used it to record 18 channels of UK TV. We built a cluster of high performance Linux servers, running Python code (using open source tools) to manage the enormous volumes of data flowing through these new workflows. We also implemented a media asset management solution, to allow BFI staff to find and access the digital collections.
Finally, we built a resilient data storage solution using two robotised data tape libraries on site. Each of the libraries uses a different data tape format, to mitigate risks of data tape supply. A third copy is stored offsite, over 50 miles away, for disaster recovery. This meets the level 4 standard for data storage as described in the US National Digital Stewardship Alliance’s Levels of Digital Preservation – with three copies in geographically separate locations.

Alongside these technology components, we also defined new policies, standards and workflows for the digital files that we expect from digitisation, both in-house and outsourced; and from acquisition from donors to the collection. The community of moving image digital preservation experts that coalesced around the No Time To Wait conference series, were invaluable allies as we started to imagine what best practice would look like for the archive. We committed to the open, lossless FFV1 Matroska combination as the preferred format for videotape digitisation, and for digital preservation of the DPX files that get created in our film scanning workflows.
We learned to use the amazing open source tooling that archive-focussed developers from MediaArea have created to help archives achieve robust long-term preservation of FFV1 Matroska. Underpinning everything is FFMPEG; an open source framework for managing audiovisual media. It is impossible to imagine high volume, complex and standards-compliant moving image digital preservation in DPI without FFMPEG. We are so grateful to the communities who maintain and develop these open source tools.
We created semi-automated mass ingest workflows for the 1.4 million digitised photographs from the BFI National Archive Screencraft collection, stewarding one of the most important film-focussed photograph collections in the world into digital preservation. We created similar mass ingest workflows for the moving image data, using Python to orchestrate a complex set of micro services to guide the data though that lifecycle described at the start. And we committed to an open-source-by-default policy for the codebase, publishing everything to our Data & Digital Preservation department GitHub unless prevented by sensitivity or security considerations.

What has changed in the ten years of DPI?
A lot has changed in the last ten years, most notably:
- data size: file sizes have increased due to higher resolutions like 4K and bit depth such as 16-bit, but have also decreased thanks to our choice of encoding format, FFV1, which uses lossless compression to maintain full quality while reducing file size by up to 60%
- network speed: we started with a 10Gbps network in 2015, but increased it to 100Gbps to handle the high volumes of data flowing through the workflows
- off-air television recording: in 2022, the BBC Redux system was reaching end-of-life, so we created our own solution using Python and open source technologies like FFmpeg and VLC. It is called STORA – System for Television Off-air Recording and Archiving – and it records around 170,000 broadcasts every year
- data migration: we began migrating DPI data in 2020 and completed it in 2022. This involved moving data from LTO6 tapes to LTO8 in one library, and from IBM 3592 JD to JE in another. The process is managed by the data tape infrastructure to ensure a bit-perfect copy, meaning every 0 and 1 is confirmed as accurately copied
- media asset management: in 2022, we took control of this part of DPI after using a commercial product for seven years. The BFI’s Technology & Digital Transformation directorate built a solution called DPI Browser, allowing colleagues to search, view and download digital collections for their work
- digital collecting: the breadth of born-digital content collected by the BFI National Archive has grown since DPI launched. In 2022, we made an agreement with Netflix to collect and preserve a selection of their UK productions; in 2023, we did the same with Amazon Prime Video. Since 2024, we have also been collecting web video under the National Lottery-funded Our Screen Heritage project
- cyber security: the DPI network was secured from the beginning, but attacks on cultural heritage organisations have increased dramatically. Following the cyber attack on the British Library and its detailed incident report, we took further steps to fortify DPI against future threats.
What’s next for DPI and digital preservation in general?
Our next data migration – this time moving around 20 Petabytes of data – is scheduled to begin in 2028 and complete in 2030 or 2031, moving the national collections data from LTO9 and IBM 3592 JE to the data tape generations current at that point. This will safeguard the data until the migration in 2036 or 2037, which may be the final migration to data tape, if the prophets of advanced technology for data storage are right. Among the experimental formats for high volume, affordable and environmentally sustainable data storage are DNA and ceramic.
The BFI National Archive’s digital collections will continue to grow, from both digitisation of physical collections and acquisition of born-digital content, and we’ll strive to create scaleable, sustainable workflows and solutions to manage the data through the technology changes ahead. Our inspiration is the preservation of physical film from the earliest moments in film history – with the existing films from Queen Victoria’s reign preserved so effectively and now available to view on BFI Player. We will work to make sure that the new digital content entering the national collection is as carefully preserved, making it accessible for centuries ahead.
– Stephen McConnachie, Head of Data and Digital Preservation.
The Inside the Archive blog is supported by the BFI Screen Heritage Fund, awarding National Lottery funding.
