AUSTIN, Texas—As much as subscription services want you to believe it, not everything can be found on Amazon or Netflix. Want to read Brett Kavanaugh buddy Mark Judge’s old book, for instance (or their now infamous yearbook even)? Curious to watch a bunch of vintage smoking ads? How about perusing the largest collection of Tibetan Buddhist literature in the world? There’s one place to turn today, and it’s not Google or any pirate sites you may or may not frequent.
“I’ve got government video of how to wash your hands or prep for nuclear war,” says Mark Graham, director of the Wayback Machine at the Internet Archive. “We could easily make a list of .ppt files in all the websites from .mil, the Military Industrial PowerPoint Complex.”
Graham recently talked with several small groups of attendees at the 2018 Online News Association conference, and Ars was lucky enough to be part of one. He later made a full presentation to the conference, which is now available in audio form. And the immediate takeaway is that the scale of the Internet Archive today may be as hard to fathom as the scale of the Internet itself.
The longtime non-profit’s physical space remains easy to comprehend, at least, so Graham starts there. The main operation now runs out of an old church (pews still intact) in San Francisco, with the Internet Archive today employing nearly 200 staffers. The archive also maintains a nearby warehouse for storing physical media—not just books, but things like vinyl records, too. That’s where Graham jokes the main unit of measurement is “shipping container.” The archive gets that much material every two weeks.
The company currently stands as the second-largest scanner of books in the world, next to Google. Graham put the current total above four million. The archive even has a wishlist for its next 1.5 million scans, including anything cited on Wikipedia. Yes, the Wayback Machine is in the process of making sure you’re not finding 404s during any Wiki rabbithole (Graham recently told the BBC that Wayback bots have restored nearly six million pages lost to link rot as part of that effort). Today, books published prior to 1923 are free to download through the Internet Archive, and a lot of the stuff from afterwards can be borrowed as a digital copy.