How the Mysteries of the Vatican Secret Archives Are Being Revealed by Artificial Intelligence


Some­where with­in the Vat­i­can exists the Vat­i­can Secret Archives, whose 53 miles of shelv­ing con­tains more than 600 col­lec­tions of account books, offi­cial acts, papal cor­re­spon­dence, and oth­er his­tor­i­cal doc­u­ments. Though its hold­ings date back to the eighth cen­tu­ry, it has in the past few weeks come to world­wide atten­tion. This has brought about all man­ner of jokes about the plot of Dan Brown’s next nov­el, but also impor­tant news about the tech­nol­o­gy of man­u­script dig­i­ti­za­tion. It seems a project to get the con­tents of the Vat­i­can Secret Archives dig­i­tized and online has made great progress crack­ing a prob­lem that once seemed impos­si­bly dif­fi­cult: turn­ing hand­writ­ing into com­put­er-search­able text.

In Codice Ratio is “devel­op­ing a full-fledged sys­tem to auto­mat­i­cal­ly tran­scribe the con­tents of the man­u­scripts” that uses not the stan­dard method of opti­cal char­ac­ter recog­ni­tion (OCR), which looks for the spaces between words, but a new way that can han­dle con­nect­ed cur­sive and cal­li­graph­ic let­ters. Their method, in the lin­go of the field, “is to gov­ern impre­cise char­ac­ter seg­men­ta­tion by con­sid­er­ing that cor­rect seg­ments are those that give rise to a sequence of char­ac­ters that more like­ly com­pose a Latin word. We have designed a prin­ci­pled solu­tion that relies on con­vo­lu­tion­al neur­al net­works and sta­tis­ti­cal lan­guage mod­els.”

This is a job, in oth­er words, for arti­fi­cial intel­li­gence, but in part­ner­ship with human intel­li­gence, a sel­dom-tapped source of which the sci­en­tists behind In Codice Ratio have har­nessed: that of high-school stu­dents. Their spe­cial OCR soft­ware, writes the Atlantic’s Sam Kean, works by “divid­ing each word into a series of ver­ti­cal and hor­i­zon­tal bands and look­ing for local minimums—the thin­ner por­tions, where there’s less ink (or real­ly, few­er pix­els). The soft­ware then carves the let­ters at these joints.” But the soft­ware “needs to know which groups of chunks rep­re­sent real let­ters and which are bogus,” and so “the team recruit­ed stu­dents at 24 schools in Italy to build the projects’ mem­o­ry banks,” man­u­al­ly sep­a­rat­ing the let­ters the sys­tem had prop­er­ly rec­og­nized from those over which it had stum­bled.

And so the stu­dents became the sys­tem’s “teach­ers,” improv­ing its abil­i­ty to extract the con­tent of hand­writ­ing, and not just hand­writ­ing but vast quan­ti­ties of archa­ic hand­writ­ing, with every click they made. The encour­ag­ing results thus far mean that it prob­a­bly won’t be long before large por­tions of the Vat­i­can Secret Archives (which, con­trary to its awk­ward­ly trans­lat­ed name, is such a non-secret it even has its own offi­cial web site) will final­ly become easy to browse, search, copy, paste, and ana­lyze. So they may, in the full­ness of time, prove a fruit­ful resource indeed to writ­ers of Catholi­cism-cen­tric thrillers like Brown — who, after all, has already gone pub­lic with his enthu­si­asm for man­u­script dig­i­ti­za­tion.

Relat­ed Con­tent:

Explore 5,300 Rare Man­u­scripts Dig­i­tized by the Vat­i­can: From The Ili­ad & Aeneid, to Japan­ese & Aztec Illus­tra­tions

Behold 3,000 Dig­i­tized Man­u­scripts from the Bib­lio­the­ca Palati­na: The Moth­er of All Medieval Libraries Is Get­ting Recon­struct­ed Online

3,500 Occult Man­u­scripts Will Be Dig­i­tized & Made Freely Avail­able Online, Thanks to Da Vin­ci Code Author Dan Brown

Based in Seoul, Col­in Mar­shall writes and broad­casts on cities and cul­ture. His projects include the book The State­less City: a Walk through 21st-Cen­tu­ry Los Ange­les and the video series The City in Cin­e­ma. Fol­low him on Twit­ter at @colinmarshall or on Face­book.


by | Permalink | Comments (0) |

Sup­port Open Cul­ture

We’re hop­ing to rely on our loy­al read­ers rather than errat­ic ads. To sup­port Open Cul­ture’s edu­ca­tion­al mis­sion, please con­sid­er mak­ing a dona­tion. We accept Pay­Pal, Ven­mo (@openculture), Patre­on and Cryp­to! Please find all options here. We thank you!


Leave a Reply

Quantcast
Open Culture was founded by Dan Colman.