The Internet Archive Will Digitize & Preserve Millions of Academic Articles with Its New Database, “Internet Archive Scholar”

Open access pub­lish­ing has, indeed, made aca­d­e­m­ic research more acces­si­ble, but in “the move from phys­i­cal aca­d­e­m­ic jour­nals to dig­i­tal­ly-acces­si­ble papers,” Saman­tha Cole writes at Vice, it has also become “more pre­car­i­ous to pre­serve…. If an insti­tu­tion stops pay­ing for web host­ing or changes servers, the research with­in could dis­ap­pear.” At least a cou­ple hun­dred open access jour­nals van­ished in this way between 2000 and 2019, a new study pub­lished on arx­iv found. Anoth­er 900 jour­nals are in dan­ger of meet­ing the same fate.

The jour­nals in per­il include schol­ar­ship in the human­i­ties and sci­ences, though many pub­li­ca­tions may only be of inter­est to his­to­ri­ans, giv­en the speed at which sci­en­tif­ic research tends to move. In any case, “there shouldn’t real­ly be any decay or loss in sci­en­tif­ic pub­li­ca­tions, par­tic­u­lar­ly those that have been open on the web,” says study co-author Mikael Laasko, infor­ma­tion sci­en­tist at the Han­ken School of Eco­nom­ics in Helsin­ki. Yet, in dig­i­tal pub­lish­ing, there are no print­ed copies in uni­ver­si­ty libraries, cat­a­logued and main­tained by librar­i­ans.

To fill the need, the Inter­net Archive has cre­at­ed its own schol­ar­ly search plat­form, a “full­text search index” that includes “over 25 mil­lion research arti­cles and oth­er schol­ar­ly doc­u­ments” pre­served on its servers. These col­lec­tions span dig­i­tized and orig­i­nal dig­i­tal arti­cles pub­lished from the 18th cen­tu­ry to “the lat­est Open Access con­fer­ence pro­ceed­ings and pre-prints crawled from the World Wide Web.” Con­tent in this search index comes in one of three forms:

  • pub­lic web con­tent in the Way­back Machine web archives (, either iden­ti­fied from his­toric col­lect­ing, crawled specif­i­cal­ly to ensure long-term access to schol­ar­ly mate­ri­als, or crawled at the direc­tion of Archive-It part­ners
  • dig­i­tized print mate­r­i­al from paper and micro­form col­lec­tions pur­chased and scanned by Inter­net Archive or its part­ners
  • gen­er­al mate­ri­als on the col­lec­tions, includ­ing con­tent from part­ner orga­ni­za­tions, uploads from the gen­er­al pub­lic, and mir­rors of oth­er projects

The project is still in “alpha” and “has sev­er­al bugs,” the site cau­tions, but it could, when it’s ful­ly up and run­ning, become part of a much-need­ed rev­o­lu­tion in aca­d­e­m­ic research—that is if the major aca­d­e­m­ic pub­lish­ers don’t find some legal pre­text to shut it down.

Aca­d­e­m­ic pub­lish­ing boasts one of the most rapa­cious legal busi­ness mod­els on the glob­al mar­ket, and one of the most exploita­tive: a dou­ble stan­dard in which schol­ars freely pub­lish and review research for the pub­lic ben­e­fit (osten­si­bly) and very often on the pub­lic dime; while pri­vate inter­me­di­aries rake in astro­nom­i­cal sums for them­selves with pay­walls. The open access mod­el has changed things, but the only way to tru­ly serve the “best inter­ests of researchers and the pub­lic,” neu­ro­sci­en­tist Shaun Khoo argues, is through pub­lic infra­struc­ture and ful­ly non-prof­it pub­li­ca­tion.

Maybe Inter­net Archive Schol­ar can go some way toward bridg­ing the gap, as a pub­licly acces­si­ble, non-prof­it search engine, dig­i­tal cat­a­logue, and library for research that is worth pre­serv­ing, read­ing, and build­ing upon even if it does­n’t gen­er­ate share­hold­er rev­enue. For a deep­er dive into how the Archive built its for­mi­da­ble, still devel­op­ing, new data­base, see the video pre­sen­ta­tion above from Jef­fer­son Bai­ley, Direc­tor of Web Archiv­ing & Data Ser­vices. And have a look at Inter­net Archive Schol­ar here. It cur­rent­ly lacks advanced search func­tions, but plug in any search term and pre­pare to be amazed by the incred­i­ble vol­ume of archived full text arti­cles you turn up.

Relat­ed Con­tent:

The Inter­net Archive Makes 2,500 More Clas­sic MS-DOS Video Games Free to Play Online: Alone in the Dark, Doom, Microsoft Adven­ture, and Oth­ers

Libraries & Archivists Are Dig­i­tiz­ing 480,000 Books Pub­lished in 20th Cen­tu­ry That Are Secret­ly in the Pub­lic Domain

The Boston Pub­lic Library Will Dig­i­tize & Put Online 200,000+ Vin­tage Records

Josh Jones is a writer and musi­cian based in Durham, NC. Fol­low him at @jdmagness

by | Permalink | Comments (0) |

Sup­port Open Cul­ture

We’re hop­ing to rely on our loy­al read­ers rather than errat­ic ads. To sup­port Open Cul­ture’s edu­ca­tion­al mis­sion, please con­sid­er mak­ing a dona­tion. We accept Pay­Pal, Ven­mo (@openculture), Patre­on and Cryp­to! Please find all options here. We thank you!

Leave a Reply

Open Culture was founded by Dan Colman.