Back in 2014, we brought to your attention an image archive rivaling the largest of its kind on the web: the Internet Archive Book Images collection at Flickr. There, you’ll find millions of “public domain images, all extracted from books, magazines and newspapers published over a 500 year period.”
At the time, the collection contained 2.6 million public domain images, but “eventually,” we noted in a previous post, “this archive will grow to 14.6 million images.” Well, it has almost doubled in size since our first post, and it now features over 5.3 million images, thanks again to Kalev Leetaru, who headed the digitization project while on a Yahoo-sponsored fellowship at Georgetown University.
Rather than using optical character recognition (OCR), as most digitization software does to scan only the text of books, Leetaru’s code reversed the process, extracting the images the Internet Archive’s OCR typically ignores. Thousands of graphic illustrations and photographs await your discovery in the searchable database. Type in “records,” for example, and you’ll run into the 1917 ad in “Colombia Records for June” (top) or the creepy 1910 photograph above from “Records of big game: with their distribution, characteristics, dimensions, weights, and horn & tusk measurements.” Two of many gems amidst utilitarian images from dull corporate and government record books.
Search “library” and you’ll arrive at a fascinating assemblage, from the fashionable room above from 1912’s “Book of Home Building and Decoration,” to the rotund, mournful, soon-to-be carved pig below from 1882’s “The American Farmer: A Complete Agricultural Library,” to the nifty Nautilus drawing further down from an 1869 British Museum of Natural History publication. To see more images from any of the sources, simply click on the title of the book that appears in the search results. The organization of the archive could use some improvement: as yet millions of images have not been organized into thematic albums, which would greatly streamline browsing through them. But it’s a minor gripe given the number and variety of free, public domain images available for any kind of use.
Moreover, Leetaru has planned to offer his code to institutions, telling the BBC, “Any library could repeat this process. That’s actually my hope, that libraries around the world run this same process of their digitized books to constantly expand this universe of images.” Scholars and archivists of book and art history and visual culture will find such a “universe of images” invaluable, as will editors of Wikipedia. “What I want to see,” Leetaru also said, “is… Wikipedia have a national day of going through this [collection] to illustrate Wikipedia articles.”
Short of that, individual editors and users can sort through images of all kinds when they can’t find freely available pictures of their subject. And, of course, sites like Open Culture—which rely mainly on public domain and creative commons images—benefit greatly as well. So, thanks, Internet Archive Book Images Collection! We’ll check back later and let you know when they’ve grown even more.