The Internet Archive Will Digitize & Preserve Millions of Academic Articles with Its New Database, “Internet Archive Scholar”

Open access publishing has, indeed, made academic research more accessible, but in “the move from physical academic journals to digitally-accessible papers,” Samantha Cole writes at Vice, it has also become “more precarious to preserve…. If an institution stops paying for web hosting or changes servers, the research within could disappear.” At least a couple hundred open access journals vanished in this way between 2000 and 2019, a new study published on arxiv found. Another 900 journals are in danger of meeting the same fate.

The journals in peril include scholarship in the humanities and sciences, though many publications may only be of interest to historians, given the speed at which scientific research tends to move. In any case, “there shouldn’t really be any decay or loss in scientific publications, particularly those that have been open on the web,” says study co-author Mikael Laasko, information scientist at the Hanken School of Economics in Helsinki. Yet, in digital publishing, there are no printed copies in university libraries, catalogued and maintained by librarians.




To fill the need, the Internet Archive has created its own scholarly search platform, a “fulltext search index” that includes “over 25 million research articles and other scholarly documents” preserved on its servers. These collections span digitized and original digital articles published from the 18th century to “the latest Open Access conference proceedings and pre-prints crawled from the World Wide Web.” Content in this search index comes in one of three forms:

  • public web content in the Wayback Machine web archives (web.archive.org), either identified from historic collecting, crawled specifically to ensure long-term access to scholarly materials, or crawled at the direction of Archive-It partners
  • digitized print material from paper and microform collections purchased and scanned by Internet Archive or its partners
  • general materials on the archive.org collections, including content from partner organizations, uploads from the general public, and mirrors of other projects

The project is still in “alpha” and “has several bugs,” the site cautions, but it could, when it’s fully up and running, become part of a much-needed revolution in academic research—that is if the major academic publishers don’t find some legal pretext to shut it down.

Academic publishing boasts one of the most rapacious legal business models on the global market, and one of the most exploitative: a double standard in which scholars freely publish and review research for the public benefit (ostensibly) and very often on the public dime; while private intermediaries rake in astronomical sums for themselves with paywalls. The open access model has changed things, but the only way to truly serve the “best interests of researchers and the public,” neuroscientist Shaun Khoo argues, is through public infrastructure and fully non-profit publication.

Maybe Internet Archive Scholar can go some way toward bridging the gap, as a publicly accessible, non-profit search engine, digital catalogue, and library for research that is worth preserving, reading, and building upon even if it doesn’t generate shareholder revenue. For a deeper dive into how the Archive built its formidable, still developing, new database, see the video presentation above from Jefferson Bailey, Director of Web Archiving & Data Services. And have a look at Internet Archive Scholar here. It currently lacks advanced search functions, but plug in any search term and prepare to be amazed by the incredible volume of archived full text articles you turn up.

Related Content:

The Internet Archive Makes 2,500 More Classic MS-DOS Video Games Free to Play Online: Alone in the Dark, Doom, Microsoft Adventure, and Others

Libraries & Archivists Are Digitizing 480,000 Books Published in 20th Century That Are Secretly in the Public Domain

The Boston Public Library Will Digitize & Put Online 200,000+ Vintage Records

Josh Jones is a writer and musician based in Durham, NC. Follow him at @jdmagness

The National Emergency Library Makes 1.5 Million Books Free to Read Right Now

The coronavirus has closed libraries in countries all around the world. Or rather, it’s closed physical libraries: each week of struggle against the epidemic that goes by, more resources for books open to the public on the internet. Most recently, we have the Internet Archive’s opening of the National Emergency Library, “a collection of books that supports emergency remote teaching, research activities, independent scholarship, and intellectual stimulation while universities, schools, training centers, and libraries are closed.” While the “national” in the name refers to the United States, where the Internet Archive operates, anyone in the world can read its nearly 1.5 million books, immediately and without waitlists, from now “through June 30, 2020, or the end of the US national emergency, whichever is later.”

“Not to be sneezed at is the sheer pleasure of browsing through the titles,” writes The New Yorker‘s Jill Lepore of the National Emergency Library, going on to mention such volumes as How to Succeed in Singing, Interesting Facts about How Spiders Live, and An Introduction to Kant’s Philosophy, as well as “Beckett on Proust, or Bloom on Proust, or just On Proust.” A historian of America, Lepore finds herself reminded of the Council on Books in Wartime, “a collection of libraries, booksellers, and publishers, founded in 1942.” On the premise that “books are useful, necessary, and indispensable,” the council “picked over a thousand volumes, from Virginia Woolf’s The Years to Raymond Chandler’s The Big Sleep, and sold the books, around six cents a copy, to the U.S. military.” By practically giving away 120 million copies of such books, the project “created a nation of readers.”




In fact, the Council on Books in Wartime created more than a nation of readers: the American “soldiers and sailors and Army nurses and anyone else in uniform” who received these books passed them along, or even left them behind in the far-flung places they’d been stationed. Haruki Murakami once told the Paris Review of his youth in Kobe, “a port city where many foreigners and sailors used to come and sell their paperbacks to the secondhand bookshops. I was poor, but I could buy paperbacks cheaply. I learned to read English from those books and that was so exciting.” Seeing as Murakami himself later translated The Big Sleep into his native Japanese, it’s certainly not impossible that an Armed Services Edition counted among his purchases back then.

Now, in translations into English and other languages as well, we can all read Murakami’s work — novels like Norwegian Wood and Kafka on the Shore, short-story collections like The Elephant Vanishes, and even the memoir What I Talk About When I Talk About Running — free at the National Emergency Library. The most popular books now available include everything from Margaret Atwood’s The Handmaid’s Tale to the Kama SutraDr. Seuss’s ABC to Alvin Schwartz’s Scary Stories to Tell in the Dark (and its two sequels), Chinua Achebe’s Things Fall Apart to, in disconcerting first place, Sylvia Browne’s End of Days: Predictions and Prophecy About the End of the World. You’ll even find, in the original French as well as English translation, Albert Camus’ existential epidemic novel La Peste, or The Plague, featured earlier this month here on Open Culture. And if you’d rather not confront its subject matter at this particular moment, you’ll find more than enough to take your mind elsewhere. Enter the National Emergency Library here.

Related Content:

800 Free eBooks for iPad, Kindle & Other Devices

The Internet Archive “Liberates” Books Published Between 1923 and 1941, and Will Put 10,000 Digitized Books Online

11,000 Digitized Books From 1923 Are Now Available Online at the Internet Archive

Free: You Can Now Read Classic Books by MIT Press on Archive.org

Enter “The Magazine Rack,” the Internet Archive’s Collection of 34,000 Digitized Magazines

Use Your Time in Isolation to Learn Everything You’ve Always Wanted To: Free Online Courses, Audio Books, eBooks, Movies, Coloring Books & More

Based in Seoul, Colin Marshall writes and broadcasts on cities, language, and culture. His projects include the book The Stateless City: a Walk through 21st-Century Los Angeles and the video series The City in Cinema. Follow him on Twitter at @colinmarshall or on Facebook.

The Internet Archive Makes 2,500 More Classic MS-DOS Video Games Free to Play Online: Alone in the Dark, Doom, Microsoft Adventure, and Others

Back in 2015 we let you know that the Internet Archive made 2,400 computer games from the era of MS-DOS free to play online: titles like Commander KeenScorched Earth, and Prince of Persia may have brought back fond 1990s gaming memories, as well as promised hours of more such enjoyment here in the 21st century. That set of games included Id Software’s Wolfenstein 3D, which created the genre of the first-person shooter as we know it, but the Internet Archive’s latest DOS-game upload — an addition of more than 2,500 titles — includes its follow-up Doom, which took computer gaming itself to, as it were, a new level.

The Internet Archive’s Jason Scott calls this “our biggest update yet, ranging from tiny recent independent productions to long-forgotten big-name releases from decades ago.” After detailing some of the technical challenges he and his team faced in getting many of the games to work properly in web browsers on modern computers — “a lot has changed under the hood and programs were sometimes only written to work on very specific hardware and a very specific setup” — he makes a few recommendations from this newest crop of games.

Scott’s picks include Microsoft Adventure, the DOS version of the very first computer adventure game; the 1960s-themed racer Street Rod; and Super Munchers, one in a line of educational titles all of us of a certain generation will remember from our classroom computers. Oddities highlighted by classic game enthusiasts around the internet include Mr. Blobby, based on the eponymous character from the BBC comedy show Noel’s House Party; the undoubtedly thrilling simulator President Elect – 1988 Edition; and Zool, the only ninja-space-alien platformer sponsored by lollipop brand Chupa Chups.

This addition of 2,500 computer games to the Internet Archive also brings in no few undisputed classics whose influence on the art and design of games is still felt today: Alone in the Dark, for example, progenitor of the entire survival-horror genre; Microsoft Flight Simulator, inspiration for a generation of pilots; and SimCity 2000, inspiration for a generation of urban planners. Among the adventure games, one of the strongest genres of the MS-DOS era, we have Discworld, based on Terry Pratchett’s comedic fantasy novels, and from the mind of Harlan Ellison the somewhat less comedic I Have No Mouth and I Must Scream. One glance at the Internet Archive’s updated computer game collection reveals that, no matter how many games you played in the 90s, you’ll never be able to play them all.

Get more information on the new batch of games at the Internet Archive.

via Boing Boing

Related Content:

The Internet Arcade Lets You Play 900 Vintage Video Games in Your Web Browser (Free)

Free: Play 2,400 Vintage Computer Games in Your Web Browser

Play a Collection of Classic Handheld Video Games at the Internet Archive: Pac-Man, Donkey Kong, Tron and MC Hammer

1,100 Classic Arcade Machines Added to the Internet Arcade: Play Them Free Online

Based in Seoul, Colin Marshall writes and broadcasts on cities, language, and culture. His projects include the book The Stateless City: a Walk through 21st-Century Los Angeles and the video series The City in Cinema. Follow him on Twitter at @colinmarshall or on Facebook.

Libraries & Archivists Are Digitizing 480,000 Books Published in 20th Century That Are Secretly in the Public Domain

Image by Jason “Textfiles” Scott, via Wikimedia Commons

All books in the public domain are free. Most books in the public domain are, by definition, on the old side, and a great many aren’t easy to find in any case. But the books now being scanned and uploaded by libraries aren’t quite so old, and they’ll soon get much easier to find. They’ve fallen through a loophole because their copyright-holders never renewed their copyright, but until recently the technology wasn’t quite in place to reliably identify and digitally store them.

Now, though, as Vice’s Karl Bode writes, “a coalition of archivists, activists, and libraries are working overtime to make it easier to identify the many books that are secretly in the public domain, digitize them, and make them freely available online to everyone.” These were published between 1923 and 1964, and the goal of this digitization project is to upload all of these surprisingly out-of-copyright books to the Internet Archive, a glimpse of whose book-scanning operation appears above.




“Historically, it’s been fairly easy to tell whether a book published between 1923 and 1964 had its copyright renewed, because the renewal records were already digitized,” writes Bode. “But proving that a book hadn’t had its copyright renewed has historically been more difficult.” You can learn more about what it takes to do that from this blog post by New York Public Library Senior Product Manager Sean Redmond, who first crunched the numbers and estimated that 70 percent of the titles published over those 41 years may now be out of copyright: “around 480,000 public domain books, in other words.”

The first important stage is the conversion of copyright records into the XML format, a large part of which the New York Public Library has recently completed. Bode also mentions a software developer and science fiction author named Leonard Richardson who has written Python scripts to expedite the process (including a matching script to identify potentially non-renewed copyrights in the Internet Archive collection) and a bot that identifies newly discovered secretly public-domain books daily. Richardson himself underscores the necessity of volunteers to take on tasks like seeking out a copy of each such book, “scanning it, proofing it, then putting out HTML and plain-text editions.”

This work is now happening at American libraries and among volunteers from organizations like Project Gutenberg. The Internet Archive’s Jason Scott has also pitched in with his own resources, recently putting out a call for more help on the “very boring, VERY BORING (did I mention boring)” project of determining “which books are actually in the public domain to either surface them on or help make a hitlist.” Of course, many more obviously stimulating tasks exist even in the realm of digital archiving. But then, each secretly public-domain book identified, found, scanned, and uploaded brings humanity’s print and digital civilizations one step closer together. Whatever comes out of that union, it certainly won’t be boring.

via Vice

Related Content:

Public Domain Day Is Finally Here!: Copyrighted Works Have Entered the Public Domain Today for the First Time in 21 Years

11,000 Digitized Books From 1923 Are Now Available Online at the Internet Archive

British Library to Offer 65,000 Free eBooks

Download for Free 2.6 Million Images from Books Published Over Last 500 Years on Flickr

Free: You Can Now Read Classic Books by MIT Press on Archive.org

The Library of Congress Launches the National Screening Room, Putting Online Hundreds of Historic Films

Based in Seoul, Colin Marshall writes and broadcasts on cities, language, and culture. His projects include the book The Stateless City: a Walk through 21st-Century Los Angeles and the video series The City in Cinema. Follow him on Twitter at @colinmarshall or on Facebook.

1,100 Classic Arcade Machines Added to the Internet Arcade: Play Them Free Online

Once we could hardly imagine such things as video games. Then, all of a sudden, they appeared, though for years we had to go out to bars — and later, purpose-built “arcades” filled with video game machines — in order to play them, and we paid money to do so. When they came into our homes in the form of consoles we could hook up to our television sets, we at first felt only disappointment: these versions of Space InvadersDonkey Kong, and Defender neither looked nor felt much like the originals into which we’d pumped so many coins. But only now that the technology in our homes has long since surpassed most of the technology outside them can we play faithful reproductions of all our old favorite games without going out to the arcade.

Not that many arcades still stand, although the Internet Archive has made up for that absence by building the Internet Arcade, which we previously featured here on Open Culture a few years ago. Having made it possible for us to play an enormous variety of classic arcade games free in our web browsers, the Internet Archive looks on its way to creating not just the largest arcade in existence but an infinite arcade, the kind that Borges would have imagined had he grown up in the video-game age.  Just last week, developments in the software that powers it allowed Internet Archive to add more than a thousand new machines to the Internet Arcade, from games for which we had to wait in line back in the day to obscurities on which few of us have ever even laid eyes, let alone hands, before.

“The majority of these newly-available games date to the 1990s and early 2000s, as arcade machines both became significantly more complicated and graphically rich,” writes the Internet Archive’s Jason Scott, “while also suffering from the ever-present and home-based video game consoles that would come to dominate gaming to the present day. Even fervent gamers might have missed some of these arcade machines when they were in the physical world, due to lower distribution numbers and shorter times on the floor.” You can explore the new wing of the Internet Arcade here, some of whose most popular games include Puzzle Bobble (better known in the West as Bust-a-Move), X-MenMetal Slug 5Teenage Mutant Ninja Turtles: Turtles in Time, and Street Fighter Alpha 2. Maybe their sound and graphics no longer wow us as once they did, but the years have done nothing to diminish their fun factor — and for many of us, not having to spend our quarters will always be a feeling to savor.

Related Content:

The Internet Arcade Lets You Play 900 Vintage Video Games in Your Web Browser (Free)

Free: Play 2,400 Vintage Computer Games in Your Web Browser

Play a Collection of Classic Handheld Video Games at the Internet Archive: Pac-Man, Donkey Kong, Tron and MC Hammer

Based in Seoul, Colin Marshall writes and broadcasts on cities and culture. His projects include the book The Stateless City: a Walk through 21st-Century Los Angeles and the video series The City in Cinema. Follow him on Twitter at @colinmarshall or on Facebook.

Enter the Pulp Magazine Archive, Featuring Over 11,000 Digitized Issues of Classic Sci-Fi, Fantasy & Detective Fiction

Pulp Fiction will likely hold up generations from now, but the resonance of its title may already be lost to history. Pulp magazines, or “the pulps,” as they were called, once held special significance for lovers of adventure stories, detective and science fiction, and horror and fantasy. Acquiring the name from the cheap paper on which they were printed, pulp magazines might be said, in large part, to have shaped the pop culture of our contemporary world, publishing respected authors like H.G. Wells and Jules Verne and many an unknown newcomer, some of whom became household names (in certain houses), like Isaac Asimov, Arthur C. Clarke, and Philip K. Dick.

Beginning in the late 19th century, the pulps opened up the publishing space that became flooded with comic books and popular novels like those of Stephen King and Michael Crichton in the latter half of the twentieth century.




They varied widely in quality and subject matter but all share certain preoccupations. Sexual taboos are explored in their naked essence or through various genre devices. Monsters, aliens, and other features of the “weird” predominate, as do the forerunners of DC and Marvel’s superhero empires in characters like the Shadow and the Phantom Detective.

Unlike higher-rent “slicks” or “glossies,” pulp magazines had license to go places respectable publications feared to tread. Genre fiction now spawns multimillion dollar franchises, one after another, purged of much of the pulps’ salacious content. But paging through the thousands of back issues available at the Pulp Magazine Archive will give you a sense of just how outré such magazines once were—a quality that survived in the underground comics and zines of the 60s and beyond and in genre tabloids like Scream Queens.

The enormous archive contains over 11,000 digitized issues of such titles as If, True Detective Mysteries, Witchcraft and Sorcery, Weird Tales, Uncensored Detective, Captain Billy’s Whiz Bang, and Adventure (“America’s most exciting fiction for men!”). It also features early celebrity rags like Movie Pictorial and Hush Hush, and retrospectives like Dirty Pictures, a 1990s comic reprinting the often quite misogynist pulp art of the 30s.

There’s great science fiction, no small amount of creepy teen boy wish-fulfillment, and lots of lurid, noir appeals to fantasies of sex and violence. Swords and sorcery, guns and trussed-up pin-ups, and plenty of creature features. The pulps were once mass culture’s id, we might say, and they have now become its ego.

Enter the Pulp Magazine Archive here.

Related Content:

Enter a Huge Archive of Amazing Stories, the World’s First Science Fiction Magazine, Launched in 1926

Free: 355 Issues of Galaxy, the Groundbreaking 1950s Science Fiction Magazine

Isaac Asimov Recalls the Golden Age of Science Fiction (1937-1950)

Josh Jones is a writer and musician based in Durham, NC. Follow him at @jdmagness

The Boston Public Library Will Digitize & Put Online 200,000+ Vintage Records

It may be a great irony that our age of cultural destruction and—many would argue—decline also happens to be a golden age of preservation, thanks to the very new media and big data forces credited with dumbing things down. We spend ample time contemplating the losses; archival initiatives like The Great 78 Project, like so many others we regularly feature here, should give us reasons to celebrate.

In a post this past August, we outlined the goals and methods of the project. Centralized at the Internet Archive—that magnanimous citizens’ repository of digitized texts, recordings, films, etc.—the project contains several thousand carefully preserved 78rpm recordings, which document the distinctive sounds of the early 20th century from 1898 to the late-1950s.




Thanks to partners like preservation company George Blood, L.P. and the ARChive of Contemporary Music, we can hear many thousands of records from artists both famous and obscure in the original sound of the first mass-produced consumer audio format.

Just a few days ago, the Internet Archive announced that they would be joined in the endeavor by the Boston Public Library, who, writes Wendy Hanamura, “will digitize, preserve” and make available to the public “hundreds of thousands of audio recordings in a variety of historical formats,” including not only 78s, but also LP’s and Thomas Edison’s first recording medium, the wax cylinder. “These recordings have never been circulated and were in storage for several decades, uncatalogued and inaccessible to the public.”

The process, notes WBUR, “could take a few years,” given the sizable bulk of the collection and the meticulous methods of the Internet Archive’s technicians, who labor to preserve the condition of the often fragile materials, and to produce a number of different versions, “from remastered to raw.” The object, says Boston Public Library president David Leonard, is to “produce recordings in a way that’s interesting to the casual listener as well as to the hard-core music listener in the research business.”

Thus far, only two recordings from BPL’s extensive collections have become available—a 1938 recording called “Please Pass the Biscuits, Pappy (I Like Mountain Music)” by W. Lee O’Daniel and His Hillbilly Boys and Edvard Grieg’s only piano concerto, recorded by Freddy Martin and His Orchestra in 1947. Even in this tiny sampling, you can see the range of material the archive will feature, consistent with the tremendous variety the Great 78 Project already contains.

While we can count it as a great gain to have free and open access to this historic vault of recorded audio, it is also the case that digital archiving has become an urgent bulwark against total loss. Current recording formats instantly spawn innumerable copies of themselves. The physical media of the past existed in finite numbers and are subject to total erasure with time. “The simple fact of the matter,” archivist George Blood tells the BPL, “is most audiovisual recordings will be lost. These 78s are disappearing left and right. It is important that we do a good job preserving what we can get to, because there won’t be a second chance.”

via WBUR

Related Content:

25,000+ 78RPM Records Now Professionally Digitized & Streaming Online: A Treasure Trove of Early 20th Century Music

The British Library’s “Sounds” Archive Presents 80,000 Free Audio Recordings: World & Classical Music, Interviews, Nature Sounds & More

BBC Launches World Music Archive

Josh Jones is a writer and musician based in Durham, NC. Follow him at @jdmagness

More in this category... »
Quantcast