Mobile and Portable Digitization Experiments in The National Archives

Photographing yearbook with portable studio and digital camera.Genealogists have been making their own digitization and portable (and not) photo studio kits since the beginning of photographic and digital history. While digital cameras and mobile devices make life easier, there is new hope on the horizon to make digitization not only more accessible, but faster and easier.

The US National Archives recently tested a new portable “digitisation on demand” system using a mobile phone and portable lighting studio kit. Both the mobile app and portable system are still in development, but the results could be a game changer for genealogical societies and individuals to digitize their genealogical and other research, materials, and inventories.

The mobile app is currently only available for Android devices, DocScan. There are many apps with similar names, so look for the one from HofApps.

The portable lighting box system is The ScanTent, and consists of a “pop-up” tent with a LED lighting strip box that clips in with magnets, and a platform at the top for the mobile device.

There are a variety of similar apps and light box systems available, so what makes this one different?

First, there is the tent system. The use of reflective surfaces definitely would improve illumination of paper, blocking the ambient light responsible for reflections and lighting color influences. Most portable light boxes feature white and black nylon fabrics, allowing some bounce and absorption properties, but this material appears to be better designed to accommodate paper and photographic images. According to the specs, the included LED light is a non-destructive light source, which means it won’t damage the delicate paper materials used within it, and the lights use polarization filters to reduce reflections. Other systems only offer photo lights, which have their own challenges, but polarized filters may be added to reduce reflections and glare, as can they be added to the camera lens. My only worry is the size. While this unit is designed for books and papers, there is no reason why it shouldn’t work with those precious old photo albums, too delicate and oversized to put on a flatbed scanner. Once they get the prototype set, hopefully they will offer it in various sizes.

Second is the mobile app, which is the truly ground-breaking tech.

According to The National Archives article, this system was tested during the week leading up to International Archives Day in cooperation with The National Archives of Finland and The State Archives of Zurich, all testing and discussing new archival digitization tools. They described the advantages of the DocScan app:

DocScan is ideal for those who wish to work hands free with the ScanTent because it has an auto-shoot feature that will take a photo every time a page is turned. DocScan also gives users the option to upload their images directly to the Transkribus platform, where they can be used as training data for Automated Text Recognition.

Transkribus is a transcription platform which enables the automated recognition, transcription and searching of both printed and handwritten historical documents of any date, language or style. The software is at the centre of the READ project, an EU-funded initiative which aims to revolutionise access to archival material through the development and dissemination of Automated Text Recognition and other cutting-edge tools. DocScan and the ScanTent have been developed by the Computer Vision Lab at the Technical University of Vienna, as part of the READ project. By facilitating the digitisation of historical documents, they too aim to enhance the accessibility of global cultural heritage.

The READ project (Recognition and Enrichment of Archival Documents) is a international program funded by the European Union’s Horizon 2020 research and innovation program. Their programs and funding for development of digitization programs could help bring this technology to your local library, genealogical society, or even to your home.

The article explained how The National Archives have been using Transkribus for Handwritten Text Recognition (HTR) software on their collection of wills, considered better-crafted handwriting, for their testing rather than letters and general manuscripts. According to the success stories published by Transkribus, their training model covers multiple languages and handwritten script types, and the Character Error Rate (CER) for most of their projects is about 10%, or 90% accuracy. In a project transcribing the handwriting of Foucault, the French philosopher whose handwriting I could barely read, they had an 8% CER, which means 92% accuracy. The National Archives project with wills gives them hope of similar rates. I’ve found about the same or even worse ratings on some OCR software and apps, but these rates will vary with the complexity of the handwriting, typeface, and document quality.

More traditional glass plate book and document scanner in use at RootsTech in FamilySearch booth.

Traditional book and page scanners can cost thousands of dollars. Home systems range in the hundreds. When you add up all the associated costs, it can easily reach $1000 with a good quality camera (and/or your mobile phone or device), quality lights, stands, etc. Many people work with their flatbed printer/scanners, but the resolution is typically lower than a camera, and isn’t appropriate for delicate archival papers, books, or oversized materials.

Another alternative is to send your archival material out for scanning with a commercial service. Prices have come down tremendously recently as more and more people are using such services, but they still can add up quickly in price, and there is always a risk in transporting delicate materials.

When archives such as The National Archives, libraries, and even genealogy societies confront their digitization options, the costs and human-power to digitize is often overwhelming. Obtaining an affordable and easy-to-use system could inspire more confidence in such efforts.

I currently have hundreds and hundreds of scanned pages of typewritten content, and scores of handwritten material waiting for digitization and conversion to text (OCR). Imagine the ability to quickly and easily digitize this material for my personal use. Throw in easy-to-use and affordable machine translation and I’d be singing very happy tunes.

Reading through the description of some law students building a book scanner from a DIY kit gives you a taste of how far people have gone to avoid the high costs associated with non-destructive book scanning.

Their attempt is not the only one. David Landon’s Easy Book Scanner is estimated to cost less than POUND 200 (USD$260). The Book Scanning YouTube Channel with David Landon offers step-by-step instructions for using plastic plumbing pipes, plexiglass, ankle weights, household light, cheap portable wardrobe with cover, aluminum casserole pan, pipe insulation, two inexpensive digital cameras with infra-red controlled shutters and firing trigger remote and bike tripod mounts. It’s such a simple and clean-looking system, I’m even giving thought to building one myself.

The biggest challenge to genealogy research is access. Access to records and documents. Access to locations housing such records and documents. Access to fund such excursions and access. And access to the information to know that these record and documents even exist. Digitization and open access to archival records are essential to overcoming those barriers to access.

I often hear people claim that they can’t take their research back further than the 1700s because no documentation exists. That’s just not true. It does, but it is in too fragile a condition to digitize. New technology, such as using artificial intelligence technology on the Vatican’s archives, x-ray scanners to read burned and ancient papers at the Morgan Library and Museum in New York as well as around the world such as the 2016 announcement of the oldest hand-written passages from the Hebrew Bible dating back to the 3rd or 4th Century AD, and the Diamond Light Source X-ray facilities in England scanning charred bits of documents from 2,000 years ago, are changing our perspective on our written past. By reaching deep into the past with modern digitization methods, you may not learn something specific about an ancestor, but we are learning more about the past in which our ancestor survived every day.

The more we digitize the records of the past, not only are they more accessible, but they are better protected. We can seal these precious bits of parchment and paper from the ravages of time and use, and explore the digital images and the facts and information within them endlessly.

Products like the ScanTent and DocScan, combined with the power of services like Transkribus, will really transform the digitization process and industry. I sure hope so and I’m looking forward to playing with it myself some day soon.

Here is more information on book scanning and digitization processes.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.