Alexandria 2.0: One Millionaire's Quest to Build the Biggest Library on Earth

Brewster Kahle never had to work again after selling his company to Amazon for a quarter-billion dollars in the dot-com boom. But he then began working on building the world's biggest digital library, earning him a spot in the Internet Hall of Fame.
Image may contain Human Person Military Military Uniform Electronics Pc Computer Laptop Armored Army and Soldier

Here’s the problem with libraries. They catch on fire really easily. As such, they were the prized targets of the invading hordes of antiquity – the model collections of knowledge of their times, whose only fault was their inherent flammability. They were one-man, one-torch jobs. But the hordes didn’t prize the library only for how powerfully it burned. Back in those days, if you wanted to kill a culture, you killed its library. All it took was one chucklehead with a flaming stick to annihilate thousands of years of accumulated knowledge. And it happened often. “If this is what happens to libraries, make copies,” says Brewster Kahle.

And it's Kahle's impulse to copy and preserve that prompted the Internet Society to induct the serial entrepreneur and digital archivist into the Internet Hall of Fame on April 23 in its inaugural year.

Kahle took the library of libraries -- the internet -- and made a couple of copies of it, and keeps making copies. One he keeps in servers in San Francisco, the other in mirror servers in Alexandria, where the world’s most famous library burned 2,000 years ago. (His data survived the Egyptian revolution unscathed.)

Through the Wayback Machine, you can see what the web looked like in 1996. And 1997. And 2011.

It’s just one arm of Kahle’s ambitious goal to provide the world with universal access to all knowledge.

His vehicle is the Internet Archive, a nonprofit organization Kahle founded in 1996, the same year he started analytics firm Alexa Internet, a pioneer in collaborative filtering, which he sold in 1999 to Amazon for $250 million.

“I won the internet lottery,” Kahle says.

Or, more aptly, the internet won the internet lottery. Since selling Alexa, Kahle has grown the Internet Archive, which he refers to as Alexandria 2.0, into a massive digital repository that has not only made copies of the internet, but has made available 200,000 e-books (and digitizes 1,000 more each day), 100,000 concert recordings, and some 700,000 films.

All are available online for free.

The quest pitted him against the lords of copyright and their swarming lawyers, against internet giants like Google, and against the prevailing wisdom that information must have an owner -- and a price.

Operating out of a palatial old Christian Science church in San Francisco’s Richmond district, the Internet Archive is the web’s most extensive library. It draws funding both from donations and from traditional libraries that pay to have their collections digitized by Archive workers, who carefully turn each page of a book for the scanners.

And if these employees stick around for three years, they’ll get a sculpture of their likeness in between the pews in the building’s main hall. It’s Kahle’s own private army of information gatherers, a la Qin Shi Huang’s Terracotta Warriors, only much more colorful and cheerful.

“I’m a digital librarian,” says Kahle, “which seems a little retro for somebody who started companies, and who lives in the Bay Area, and does high tech and designs supercomputers. But I think that some of the roles and traditions that evolved before the electronic stuff make a lot of sense. Like publishers that sell books, libraries that buy books, libraries that lend books.”

So with one foot nudging the radically progressive, open distribution of information, and the other foot planted in respect of certain reactionary industry tendencies, Kahle has for 16 years now struck a balance between giving things away for free and avoiding infuriating the most powerful intellectual property owners in the world.

Take, for instance, the 200,000 e-books housed in the Open Library, an offshoot of the Internet Archive. Here, users digitally borrow the donated and purchased books scanned into the system either by Kahle’s team or by participating libraries. But only one person is given access to each book for up to two weeks, unless rights have been purchased for multiple copies. It’s a seemingly antiquated system, but it keeps the rights holders from mutiny.

“The thing that I’ve learned out of operating a library of everything is people don’t want to feel like they’re being taken advantage of,” says Kahle. “If they do feel like they’re being taken advantage of, they’ll throw things at you. They’ll throw laws at you, they’ll try to take you down any which way they can if they feel like they’re being taken advantage of. So the key thing for me is to stay on the other side of that line.”

Forty miles south of the Internet Archive’s offices, Google was exercising no such caution in its mad dash to build its competing Google Books into a mammoth digital library -- copyright niceties be damned.

As the Internet Archive concentrated on out-of-print and out-of-copyright works, Google drew the ire of the Author’s Guild, which cried copyright infringement as Google copied and used snippets of in-copyright books without asking permission. Having spent seven years scrapping with the internet giant -- at one point settling, only to have a judge toss out the settlement -- the Guild earlier this month claimed damages of $750 per book and demanded an end to copying books without permission, though many online activists maintain that Google has a fair use right to scan and display snippets of books.

“I think [Google is] backing off,” says Kahle. “I think they learned something from that one. Or at least they feel a little burned from that one.”

There’s the issue of image here, of course. As a nonprofit, the Internet Archive enjoys an inherent public trustworthiness, while profit-minded Google has tarnished its "Don't Be Evil" mantra with various transgressions of late. The differences between the two projects turn on the specter of rampant monetization of information and whether people will trust a single corporation to command the world’s collected knowledge.

“If there’s something I’ve got as sort of a theme in my career, or point of view,” says Kahle, “is the only way to make things kind of interesting and grow, is to have no central points of control. If you have any central points of control, somebody will evolve to exploit it.”

As far as information exchange goes, it doesn’t get more decentralized than bittorrent, the file-sharing protocol best known for giving panic attacks to Hollywood and the recording industry. On Aug. 7, the Internet Archive announced it would host 1.4 million torrent files, in addition to the actual public domain data that the files point to -- nearly a petabyte of it.

As these torrents spread, they create a sort of distributed preservation system, essentially guaranteeing the survival of the information, save for a worldwide EMP blast. Should the Internet Archive’s collection for some reason be destroyed, the data will persist, distributed on hard drives around the world.

The Internet Archive also operates the other way around, absorbing distributed collections badly in need of organization. Their largest and most famous such collection was the brainchild of one of Kahle’s interns, who in 2002 suggested they approach the prolific recorders of Grateful Dead shows, the Deadheads, and offer them unlimited free storage for their tapes. With the blessing of the band, the Internet Archive’s collection of fan-taped Grateful Dead shows has grown to 9,000 recordings.

The Archive has since opened its servers to other distributed audio collections, and today boasts a stockpile of some 5,000 bands.

“We often look at these sort of institutions like the Library of Congress or even the Internet Archive,” says Kahle, “and we forget that a lot of the preservation really comes about through amateur librarians and archivists.

“But this role of a distributed preservation system for things that we love, it’s like the end of Fahrenheit 451, when people would be walking around and they were a particular book. I think people were walking around and starting to be particular collections.”

But there’s also information that has no fan base -- say, a rare 300-year-old manuscript on badger hunting.

It’s this difficult-to-access analog media that’s the real challenge for the Internet Archive. The technology to catalog the digital stuff is there, Kahle says, but the overwhelming quantity of physical media poses a daunting task. In particular, the Archive is still wading through the 20th century. The explosion of media in the 1900s, along with severely tightened copyright laws, make for material that is both overwhelmingly plentiful and legally restricted.

So the Archive takes whatever it can get. No information is too obscure -- Kahle just got back from Bali, where he helped digitize everything ever written in Balinese. And nothing is wasted -- every physical book that is digitized is sent across the San Francisco Bay to Richmond, where it’s added to one of many climate-controlled shipping containers.

So far Kahle has archived 500,000 books, with another 500,000 in process. Though he admits he’ll never get there, Kahle wants to collect one of every book ever written.

“I think it’s a supply problem,” he says. “It’s not a demand problem. People want it.... People aren’t really stupid out there. They may be very particular, very peculiar, and they may not be interested in the things you are, or maybe even vote the same way you do, but they’re interested in what they’re interested in.”

So grows the second library of Alexandria, a collection with something for everyone. Except for the invading hordes. Not that they’d have any idea where to begin lighting fires.