An article by the Conversation on how your Internet data is rotting continues to stay with me. Did you know many MySpace users were heartbroken to learn that the platform lost over 50 million files uploaded between 2003 and 2015. Google keeps hinting that they are going to soon limit storage space for photos, potentially deleting billions of photographs from users over the limits, and turn on the paywall for Gmail accounts, potentially causing a loss of emails.
Data loses, purposefully or unintentionally, is a loss of information, of artifacts, of digital letters and correspondence, of visual treasures whose absence in our history, specifically our family’s history, is incalculable and very heartbreaking.
The article reminds us that acid-free paper may last 500 years or longer if preserved properly. Magnetic media like hard drives last three to five years.
Then there is also a problem of software preservation: How can people today or in the future interpret those WordPerfect or WordStar files from the 1980s, when the original software companies have stopped supporting them or gone out of business?
A nonprofit startup called The Internet Archive is preserving snapshots of the web on an ongoing basis, but mostly this is for top-level public HTML webpages such as The New York Times website and Facebook, not for underlying content files. As of last fall, its Wayback Machine held over 450 billion pages in 25 petabytes of data. This would represent .0003% of the total internet.
Universities, governments and scientific societies are struggling to preserve scientific data in a hodgepodge of archives, such as the U.K.‘s Digital Preservation Coalition, MetaArchive, or the now-disbanded collaborative Digital Preservation Network. Preservation is hard and expensive in time, money and equipment. To be most useful, it not only has to be stored, but hosted in a form that is accessible and available for future reuse.
Did you know that the very first website in the world published 30 April 1993 by CERN and Tim Berners-Lee’s team was lost. Considered insignificant to preserve? In 2013 CERN undertook a mission to restore that first web page, an important part of our legacy in this new era of online computing.
Thanks to the Internet Archive and the Wayback Machine, along with the work of archives and other digital preservation projects, data is being stored, but I would guess that barely 5 percent, likely much less, is preserved, especially when it comes to our personal family history online. I’m just guessing, but the thought can keep one up at night.
Begin by backing up your data, especially your genealogy research, in multiple places and methods. Ensure at least one solid copy is backed up and off-site.
Then, consider donating to the causes protecting and preserving our digital data. Yes, we need to have a right to privacy, but we also need to ensure that our digital heritage is preserved for future generations to understand how we lived, why we did what we did, and learn from our mistakes and success.