A few centuries from now, someone could dig up a silver halide film plate in an ancient coal mine in the arctic circle and stare at the code of Microsoft MS-DOS with the same curiosity millennials today have for Egyptian hieroglyphics. The Github Archive Program at the Artic Vault in Norway has just been loaded with a backup of the open source software repository to last at least 1,000 years.
“We believe it is worthwhile to preserve all the open source software because so much of our life today depends on open source software, whether it is my cell phone that I use to order groceries or watch a movie or communicate with my family and my friends all around the world,” reasoned Thomas Dohmke, Vice President for Special Projects at GitHub.
In a video call with indianexpress.com, Dohmke said the vault in the archipelago of Svalbard, next to the Global Seed Vault, was part of Github’s layered approach to archiving all open source software. The vault deep in the permafrost forms the cold layer, where a snapshot taken on February 2, 2020 (2/2/2020) has been stored. Dohmke said the backup taken on the day was saved on a couple of hard drives and shipped to Github’s Norwegian partner Piql for printing on the film reels. “Two weeks back, they shipped the files to the vault,” he said, expressing a bit of disappointment at not being able to be there for the occasion because of the Covid travel restrictions. There is also a hot layer which is a live streaming backup. In the warm layer, backups are saved monthly and quarterly.
But planning an archive to last a millennia is much more than a tech challenge. While you need to keep in mind that the tech of today might make no sense to future generations, there is even more basic stuff to consider like the language to use future proof the concept. “At GitHub we are all software developers and not archiving experts. So we looked for a panel of advisors and partners who have been advising us for the last year of what the right approaches to archive data would be,” explained Dohmke.
So Github is now working with archaeologists, archivists, linguists and scientists to figure out what’s the best way to approach the problem at hand. It also gets help from the Long Now Foundation, the Internet Archive, the Software Heritage Foundation, Arctic World Archive, Microsoft Research, the Bodleian Library, and Stanford Libraries, he added.
“We looked at who else was doing something similar and found this company in Norway already offering archiving solutions. So we didn’t invent the archive, we found someone already doing it.” The old coal mine where the vault is situated already has archives from Unicef and the Vatican Library. It is also on a hill preventing any eventuality of flooding with say rising water levels or melting arctic ice. The film reels used in the Github archiving project were stress tested to see if they could survive a thousand years.
To ensure that whoever finds these archives after centuries is able to make sense of the project, Github got advisors from the Library of Alexandria in Egypt to help understand the cues archaeologists and linguists will look for. “Another benefit is that the film we have used is readable to the human eye,” he explained, so that you do not need compatible tech to make sense of the code itself. Then there is an introductory user guide to the archive, which along with English has Hindi, simplified Chinese, Arabic and Spanish translations. This note gives an insight into what is stored and an index of what is where.
Dohmke said that in a thousand years software development would be very different and people would have forgotten how things are done now and this is why they have provided a tech tree to introduce them to how software development is done now. “It’s basically like a whole library of books that explain all those basics. So in theory you know in a thousand years you should be able to make good use of all the software in the tech stack that we have today,” he explained.
Also, to make it easier to save, the code has been converted into a type of binary barcode that can be decoded without a machine. “Even if they have no idea about software development and no idea about technology, they should be able to understand what’s in the archives.”
While the archive does have forks of software, they have filtered out some specific file types like binary and exe files. “All open source code that has had any activity in the last year, or some likes, then they are in the archive,” he said on what all has been added in the archive. All that adds up to about 180 film reels and 20 terabytes on hard drives.
📣 The Indian Express is now on Telegram. Click here to join our channel (@indianexpress) and stay updated with the latest headlines