How I Learned to Stop Worrying and Love the Panopticon
Pages: 1, 2
An imperfect forgettery
Meatspace ASCII, the revered printed word, has many things going for it:
It's high-resolution: Whether scrawled with a toddler's crayon or hammered out by a quaint, humming Selectric's print-ball, a traditionally printed word is an order of magnitude sharper and better-defined than the phosphors marching across your screen.
It requires no specialized reader: A printed word can be read by any literate human being during daylight hours without any particular technological assist, specialized readers, or even electricity.
It is hard to make obsolete: Printed works don't staledate the way that electronic words do. It's difficult to apply "digital rights management" schemes to the printed word that will stymie generations to come with bizarre cryptosystems that seek to circumvent posterity.
As someone in possession of tens of thousands of books, I understand why people get misty and sentimental about dead-tree libraries. As someone who has moved twice in the past 18 months, I feel compelled to point out that the printed word has a couple of major downsides:
It is fragile: We print books on the same substrate we employ for cleaning our nether regions after excreting. Think about that for a second: Paper is considered degradable enough to flush billions of sheets of it down the crapper every day, and yet we entrust our precious words to a material that auto-incinerates if you put it into contact with oxygen.
Well, so what? We've got mass production techniques that will let us preserve our most important documents by making millions of copies of them. Which brings us to the next problem:
It is bulky. Moving-box companies sell specialized shipping boxes for books, boxes that are smaller than all the other species of boxen. That's because books are freakin' heavy. They're made from trees!
By contrast, of course, bytes are pretty manageable. They've got their own degradability issues -- CDs, magnetic tape, flash, and platters all fall apart pretty quickly -- but that's OK, because bytes are not only comparatively tiny (I can carry 50 novels on my 3-ounce PDA, or 7,000 novels on my 6.5-ounce iPod), but they get tinier every year.
Every year, storage media increases in density, decreases in size, and gets cheaper. I can fit all the hard drives of all the computers I've owned, plus all the floppies for all the computers that I owned before hard drives were common, onto the hard drive of my latest laptop, with storage to spare. Hell, most of that stuff will fit on my iPod! The data that previously occupied a roomful of storage devices now fits comfortably in my pocket.
In a world of degradable storage, replicating copies is the surest way to guarantee longevity. Whether your data is in atoms or bits, the more copies you make of it and the more widely you disperse it, the greater the likelihood that your data will persist forever. (That's why Jaron Lanier jokingly proposed encoding printed matter into the DNA of the notoriously prolific cockroach, as a means of ensuring archives through a nuclear war and beyond.)
With bulky printed words, only the commercially successful (and hence prolific) and very lucky works are likely to survive the voyage through history. All the words we write try to crowd into the lifeboat, but only a lucky few survive.
The historical forgettery is something of a blessing, though. Many's the word that's been penned, in casual correspondence or published works, that is best forgotten. I know that I've written a few things I'd rather no one ever saw. Much of it is embarrassing; most of it is banal. History flenses away the great bulk of utterance and leaves behind a barely manageable archive that we can get our heads around.
Words-as-bytes need not be forgotten! Storage is cheap, storage is compact, and the lifeboat has got plenty of room for every jot and tittle keyed into the Internet. Brewster Kahle built an archive with several copies of the Web at different times, using off-the-shelf PCs and standard drives.
This is a good thing, but it's also a pain in the ass. Our embarrassing excesses, drunken rants, typos and brain farts and flames no longer vanish into our sub-consciences, but rather hang around like embarrassing relatives, undeniably ours, with us forever.
There's an upside, of course. The enduring presence of our publicly stated positions acts as an accountability system, making us own up to our errors and perhaps encouraging us to think carefully before putting our fingers on our keyboards. Old Usenet clients used to have a standard warning that would appear the first time you used Usenet to send a message, a dire warning to the effect that your words were about to pass from your computer and onto the computers of thousands of other people, and are you really sure that you've expressed yourself adequately?
Jonathan Lethem's Motherless Brooklyn features Lionel Essrog, a private detective with Tourette's Syndrome whose obsessive-compulsive illness makes him ideal for long, boring stake outs and wiretap parties. Once the compulsion to listen for a keyword in the soup of a rambling conversation or to continually re-check a staked-out doorway for a suspect has been planted in Lionel's Tourettic brain, he is unable to do anything except listen and watch until the compulsion has been satisfied.
Boring, repetitive, endless tasks don't actually require someone with a compulsive disorder to do them; computers can do them just fine. A computer can sieve through the torrent of packets passing over the Internet and look for keywords like "terrorism" and "anthrax" and "fissile" and "child-porn," then flag them for later consideration by law-enforcement officials at spooky three-letter agencies.
Law enforcement doesn't really need any specialized equipment to surveil the average netizen. Google does it better than anything else possibly could (dirty snitch), and it doesn't cost a cent.
But Google only acts on the public data that human beings are free to link to and that the Googlebot is free to discover. Private documents (email, instant messages, internal memos) are off-limits to Google. Even if you manually poured them down the Googlebot's throat, the absence of incoming or outgoing links to these documents means that they won't be placed in any meaningful context in the Googleverse.
Increasingly, law-enforcement agencies are pushing for (or owning up to) the creation of really creepy spyware projects like Eschelon, Magic Lantern, and Carnivore, systems that are placed on your computer, at your ISP or at a major Internet backbone, and used to indiscriminately capture all of the data they encounter, shunting it off to shadowy bunkers where the secret masters of the universe can use it to shine a light up the skirts of your privacy and, possibly, that of criminals, too.
People are, rightfully, very upset about all of this. Continuous wiretapping of the entire Internet is a revolting idea, something like the Panopticon, a prison where the warders can see your every move from perfect obscurity. It's enough to make you want to draw your blinds and curl up under the sofa.
AltaVista for them, Google for us
But what do they do with all of that data that they collect? Filter it for keywords? Fat chance. The volume of false positives (e.g., people talking about child pornography who aren't child pornographers) far exceeds the volume of actual criminal activity. Even creaky old Lycos gave up on plain-old keyword matching a long, long time ago.
Maybe they manually check it. After all, that approach worked for Yahoo, right? Oh, right, it didn't work. Scratch that.
Then they must use some hybrid approach: human editors and AI (Artificial Intelligence or Almost Implemented, take your pick) working in concert to tweeze out the most relevant material as quickly and efficiently as possible.
Return to the O'Reilly Network.