Ellen Nantau

— Touring the Turing Digital Archive for References to Fiction and Myth

Saint Mary’s University, Halifax, NS / September 2022

I pictured doing archive work as pouring over delicate, aged documents under dimmed lights; as lightly blowing the dust from a long-lost page; as the gentle pull of expectation and the soft thrill of discovery. Less Indiana Jones, more Rachel Weisz before she meets Brendan Fraser in The Mummy. Of course, exploring the Digital Turing Archives – my first opportunity to do archival research – was nothing like I expected. As I searched for instances of reference to fictional, mythic, or literary influences within Alan Turing’s publications, the experience felt too contemporary and too ordinary. Instead of risking a topple from a tall ladder or paper cut-turned septicemic, I sat at my computer in my tiny home office and tried to ignore the click of my husband’s keyboard. I tried to pretend that my itching nose was due to dusty parchments rather than my Pomeranian’s summer shedding and that my eyes were blurring from flickering lights rather than my monitor’s liquid crystal display.

Simultaneously, the experience was not modern enough. As my husband Chris pointed out, it was ironic and wrong that Alan Turing’s archive was not digitally searchable. When said husband – a software engineer – offered to convert Turing’s handwritten and type-faced documents into a searchable format, I was doubtful. How could machine learning even with all of its magic “decipher” the scrawl that I was struggling to make out? And was this cheating? Still, there was an uncomfortable lack of poetry to the fact that Turing’s works were not fully digitized. Would the man himself not be disgusted to see me go word-by-word through his works when there is a technological alternative? Was this not the exact sort of work that he would have envisioned a machine performing? Such rationalization is a magic of the human mind; I said yes. The result of the endeavor was more than a searchable set of documents; it was a lesson in the nature of archives and of Alan Turing’s intelligent, objective machine. (1)

I cannot say that I perfectly understand what we did or even that I could reproduce it. The TLDR of it though is that a Machine Learning program took an image and made it into text. Picture into words; PDF into JSON. In doing so, it changed more than just the nature of the original digital file though; it changed the words themselves.

OCR’ed documents generally do not offer a perfect conversion, especially initially. Each letter appearing in the image of text is algorithmically matched to what letter of the alphabet it most likely represents. Some are matched with high certainty. For example, the program will be able to say with 98% certainty that my handwritten “cat” is C-A-T, just as image-recognition software would take a picture of a feline and name it a “cat.” This can certainly go awry, just as some image software has failed to recognize a cow when the creature is placed on a street instead of in a grassy field. In our case, the blurrier the image and the less uniform the lettering, the less certain the results. Old, stained, blurry, or hand-written documents in particular caused problems. For instance, on one page the program decided that every “e” in Turing’s Chapter of The Programmer’s Handbook was actually an “o,” leaving me to navigate some unusual sentences.(2) While the intended wording was still decipherable to a human mind (another of the brain’s magics), the physical representation of Turing’s words – the things intended to convey his meaning – changed. At the same time, the digital conversion of the documents changed the process of deriving meaning from Turing’s words and from the archive itself, just as archival work changes the nature of the documents being worked upon.

Machine learning programs are perhaps the closest we have come to realizing Turing’s original idea of the intelligent machine. In this case, the technology transformed Turing’s words into a block of text that was no longer intended to be read in full; after all, this was technically the intent of my project, even if I did not initially acknowledge that aspect of what I was trying to do. Beneath this lens, Turing’s works became reduced to a single search term, a tiny piece of red text in a sea of white words, the red to be read, the white to be perused or ignored as desired. I chose how much I “bothered” to read, how much context to that red, highlighted word I deigned to ingest. Is something lost with that loss of that context? Certainly. What is gained is an irony. While Turing’s intelligent machine was based upon the “objective” discipline of mathematics and has come to be viewed as objective in turn, the result of this ML endeavor is a narrowed scope of context for his own works. And the less of Turing’s actual words I read, the more context my own brain supplied. Like any archivist, I created my own narrative to fill the gaps in the understanding of Turing I was creating, but in the bit-encoded world of the truly digital archive focus becomes zoomed even more. If the typical archivist can only see the iceberg sitting above the waves, I was choosing to ignore even the water lapping at its base.

In the end, the mechanized searcher did not lend objectivity to the human archivist. While we entre an archive to learn, to stitch together a truth, what we end up doing is constructing a narrative. We adjust the context. We read the documents that answer our specific questions and skim those that seem irrelevant to our needs. In entering an archive, we don’t become archeologists or librarians, working day in and out to preserve the past; we become creatives, picking and choosing – even if on a subconscious level – what works and what doesn’t. We are less the characters Indiana or Evelyn and more Stephens Spielberg or Sommers. In the digital archive though, the context we are required to ingest to find what we need becomes less, and the parts our minds fill in becomes more. In either case we are like Turing. We trick ourselves into believing we will produce something objective and true even as, for better or for worse, we dream, expect, and create.

(1) I feel the need to add here that Chris and I did consult the archive’s terms of use prior to setting out to convert them. The OCR process was undertaken as a personal experiment, rather than something to sell or to share with others. The entire archive was not converted, as the purpose of the exercise was for us to experiment with the process rather than to produce a work of completionism.

(2) Of course, there are ways to help the program “learn,” so as Turing says, “Tho boginnor will do woll to ask for advice.” (pg.2 AMT/B/32)