O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 See this if you're having trouble printing code examples

Platform Independent Marshall McLuhan vs. Marshalling Regular Expressions

by Andy Oram

Glosses on Marshall McLuhan must chase one of the strangest trajectories of any public figure. In less than a decade, McLuhan went from Catholic university professor to media celebrity to cliché. His very career proved one of the theses for which he is best known (although not the points he most desired to make known): that the new television culture blankets the world with impressions that hit all viewers simultaneously and that are not subject to analysis or dissection but merely to reaction. We no longer preserve the neat perspective that distinguishes figure from ground; we have only a field of impressions.

McLuhan's books struck the public in the 1960s (a time when figure and ground switched places for many) with the projective force he had assigned to print media. This explosion was followed by an implosion driven by the iconic force of the electronic age. His ideas gained currency through television appearances, jokes, a scene in Woody Allen's movie Annie Hall, and not least the conscious homage paid to his ideas by the advertising industry. If you missed McLuhan's ideas going out, don't worry--you certainly received them coming back in.

Recently, I began to research and re-evaluate McLuhan. The impetus was a surprise I had not known how to deal with for several years: the success of a book called Mastering Regular Expressions by Jeffrey Friedl. Now that Friedl has just finished with writing, and I with editing, a second edition, this is a good historical moment to integrate the phenomenon into our understanding of the role computer processing plays in social development.

Every medium, technology, and concept, no matter how modest, has a social context.

Regular expressions, from a practical standpoint, are tools for handling text. No one processes a JPEG with regular expressions. Television is totally off the agenda. A McLuhanite would take one look at the subject matter for regular expressions and declare them irrelevant.

Related Reading

Mastering Regular Expressions
By Jeffrey E. F. Friedl

Yet Mastering Regular Expressions came out and became an instant hit. The Perl community (where regular expressions had taken hold most strongly at the time) treated Friedl as a hero. His talk at the first O'Reilly Perl Conference filled a large hall right up to the back doors. We sold out all copies of his book at the conference, even though it had released six months before, and brought in another batch of copies that were promptly sold out as well. Five years after publication and 22 years after the death of McLuhan, the first edition still sells several hundred copies per month and is continually recommended on mailing lists and in journal articles.

Not only Friedl's book but regular expressions themselves have marched to commanding heights that no other language can point to. Originally marginal outside of Perl and a few old Unix tools, regular expressions now appear as standard offerings in virtually every modern language--including the .NET framework, Java, Python, and PHP--and turn up as conveniences in major utilities such as the Apache web server, the MySQL data base engine, and the Postfix and Exim mail transfer agents. (The author of Exim, Philip Hazel, developed a popular regular expression library used in several tools).

What does the success of regular expressions have to do with McLuhan? Simply put, the technology and Friedl's book seem to embody everything McLuhan said was passé: they celebrate and support a reverence for text that McLuhan expected current generations to abandon. The actual message, as I will show, is more subtle and enhances McLuhan's work substantially.

McLuhan was the Shannon of sociology, placing his bets on the primacy of communication over content.

A thesis concerning the obsolescence of print or text culture, and the ascension of electronic media, would seem extremely odd coming from an English professor. But to McLuhan, such a provocative statement was merely a vehicle for a deeper message: that we are leaving behind the age dominated by the logical argumentation and specialized analysis characteristic of text study. At best, such study could moderate and soften the impact of the new electronic media.

On the summer morning in 1440 when Johann Gutenberg rose from his bed in Strasbourg to print the first trial leaflet on his printing press, the sun rose on an essentially holistic world of oral, aural, and tactile participation. The primary experience of every human being was face to face and responsive--one did not simply take information in, but dialogued with the person offering it. Manuscripts, richly impressed with the hand and personality of the scribe, did not alter the primary nature of information as a kind of speech or activity.

Space and time in those days were gloriously diverse, to the extent that each location and each moment held its own meaning. Besides the grand distinction between the heaven and the earth, the earth itself--and the diverse tribes within it--clearly differed from place to place. There was no uniformity or redundancy.

Into this multisensuous world Gutenberg threw a shocking anomaly: several thousand Bibles that were all alike, each containing several thousand As and Bs that were all alike. The notion that one could subdivide an experience and make it precisely repeatable and predictable has thrown into commotion every society where movable type has become commonplace. As McLuhan says in The Gutenberg Galaxy, "the most obvious character of print is repetition." Among its consequences, according to McLuhan, are:

All this is so essential and pervasive that it would never have been examined by anybody, but for an equally fundamental revolution taking place now with electronic media: the telegraph, the telephone, radio, phonograph, television, and (the medium to which the final chapter of McLuhan's Understanding Media is devoted) the digital computer.

In contrast to print culture, our growing electronic culture features instantaneous, non-verbal impressions that are taken in whole and that reverberate with the emotions of the viewers...a breakdown of traditional cause-and-effect reasoning, replaced by a re-ascendance of myth as the driving force behind decisions and actions...a similar substitution of holistic experience for print culture's focus on individual components of the situation (print culture's separation of figure from ground)...A weakening of Western individualism in favor of a renewed group or tribal membership, created by the simultaneous exposure of millions of people to the same images

Computers and automation are playing a major role in this transformation of society, according to McLuhan, because they speed up events so much that traditional sequences are replaced with simultaneous activities (he would have been quite comfortable with computer clusters), and because humans have to deal with the outcomes of computation as a whole rather than with the individual steps of which they were made.

In later years, McLuhan became fascinated with new research on the hemispheres of the brain, and associated the dominance of the left brain with print culture while associating right-brain thinking both with the pre-print oral culture and the emerging electronic media culture.

It is hard to discern McLuhan's moral values among his torrents of literary references and comparisons. Certainly, he blamed Gutenberg for the emergence of Protestantism and other historic trends he disliked. Private correspondence also reveals that he hoped the electronic age would bring people spiritual revelation. On the other hand, his view of television can be gleaned from the wish expressed in The Gutenberg Galaxy that we "mastered the nature and effects of all our technologies, instead of being pushed around by them." He shows even less reserve in his posthumous Laws of Media, which darkly states that television watching causes an "impulse...towards anarchy and lawlessness."

While the pre-eminent medium of McLuhan's time was television, had he managed to experience the Internet he would undoubtedly have seized on it and declared it even more characteristic of the sketchiness, immediacy, and intense user engagement engendered by new media. During McLuhan's lifetime, digital networks existed only as instruments of business, and he could comment on them only from that standpoint. Now that the Internet is a personal and widely shared experience, many have pointed out its new-media traits: its adaptation to short, quick-breaking information, its reliance on rumor, and its awkward presentation formats.

Regular expressions extend the reach of text, and therefore inexorably change how we sense the text.

It has been said already in this article that regular expressions are supremely a textual medium. They represent the complete conquest of text. They become a world contender when fortified by Unicode, which is now supported by most computer languages as well as regular expression packages to some degree. Unicode gratifies the alphabetic print culturist's ultimate fantasy by regularizing all linguistic expressions in ordered, discrete abstractions.

The renewed importance assigned by computer programmers, perhaps surprisingly, to the old medium of text reflects the intrusion of the Internet into a field of electronic media previously focused on entertainment. The Internet has raised the retrieval of textual and numeric information (such as news, weather, and financial data) to a mass phenomenon.

Like calculus (which McLuhan considered a conquest of the tactile area of numbers) regular expressions anticipate the unpredictable and bring repeatability to the immeasurable. A simple * (which means "zero or more of the preceding item") compresses everything from zero to infinity into a calculable scheme.

But let us look more closely at this *. It challenges the precision of text. It is neither an A nor a B, and therefore cannot be found in Gutenberg's box of type. Its location cannot even be fixed.

McLuhan writes in Understanding Media that "the clock visually separates time from space." But the computer's millisecond-driven clock destroys all time on a human scale in "the electronic age, which found that instant speeds abolish time and space." In the same way, while text parses and subdivides thought, * dissolves and absorbs all text. Gutenberg separated oral speech into figure and ground, but * combines them again. Like the electron in its post-Newtonian atom shell, * ranges freely and resides nowhere.

With traditional text compilation (using yacc or similar tools), text is parsed in iterative steps. It is broken down into the smallest possible atoms called tokens and processed in figure/ground fashion with an intense attention to the relationship of each token to its context--the classic scientific method promoted (according to McLuhan) by print culture.

Used tentatively, as a beginner would use them, regular expressions may seem just an added convenience to the traditional lexicon of token-processing tools. A garden-variety use of regular expressions might be "extract the text between the fourth and fifth colons in a line," a perfectly natural operation that, for instance, can obtain a user's real name from a Unix system's password file.

A reader of Friedl's book may well begin it with such tasks in mind, but a world-altering shift in thinking takes place by the time he or she progresses beyond the third chapter. It may occur gradually and intermittently, because Friedl takes care to present it through quiet demonstration and example, gingerly pushing forward the reader's transformation from different angles--but it definitely occurs.

Used to their fullest, regular expressions ignore figure/ground. They operate holistically. They swallow the entire text--sometimes tens of thousands of characters in one fell swoop--and create an impression of it. When you are processing a concept like "find a quote-delimited string, but not where either quote lies inside a comment," the result is a function of the whole text, not of individual characters.

Gutenberg set his type one character at a time; regular expressions combine characters into their conceptual functions. You can extract an XML tag by entering <[^>]+>, which appears to fulfill the print-culture's goals of isolating and dissecting an object. But <[^>]+> is fuzzy, matching any XML tag rather than a fixed sequence of text.

The elusive tension between print-culture analysis and electronic-culture holism gives Mastering Regular Expressions its power to intrigue. The book itself celebrates print culture in a myriad ways. The writing is precise enough to reward careful readers and to prepare them for the dual job required by the technology: to analyze the effects of each regular expression and to analyze the text which it is parsing. One must possess a print-culture's training to compare (\d)+ to (\d+) and determine the differences in their side effects.

As another sign of its obeisance to print culture, Mastering Regular Expressions digs into every available cranny in the print-maker's toolbox. Fonts, special characters, and page layout are all put to work; Friedl's mastery over the dominant sense of a print culture--the visual sense--is evident.

But even in his superficial concerns, Friedl departs from McLuhan's characterization of print culture. When he explores the meaning of uppercase and lowercase, the question of whether the accent on the é in cliché is integral to the é or separate from it, or the problem of recognizing a space character among its "dozen or so" different Unicode representations, these are not truly literary concerns.

An interest in typography is not the same as an interest in text. Members of print culture are interested in a passage's sense as abstracted from its appearance. Just as printers believed they were preserving all aspects of text as they transferred it from manuscript to plates of type, a member of the print culture gets impatient discussing a space character.

Friedl stresses both the scientific method and a less formal feel for context, calling the marshalling of regular expressions an art. It's not important whether his readers get the precise meaning of every sentence, because understanding comes on gradually over the course of many pages of examples, tests, and metaphors.

His repeated admonitions to pay attention to context--to the ground that supports the figure--may prove irritating to someone trapped by print culture, wanting the figure to be isolated from its ground, fixed, and broken down to atoms. Friedl's process may, in contrast, conform naturally to the expectations of someone who swims in electronic culture.

Thus, regular expressions confirm the thesis presented by McLuhan in Laws of Media: "When pushed to the limits of its potential...the new form will tend to reverse what had been its original characteristics." Text in the age of regular expressions reverses its most fundamental characteristics of division, isolation, and specialization. McLuhan would have been enthralled with regular expressions because they expose the whole within the parts. Their painstaking pursuit and cataloging of individual, discrete alphabetic characters leads to the dissolution of the figure/ground distinction that McLuhan attributed to alphabetic text.

What will the emerging culture look, feel, sound like?

The success of Mastering Regular Expressions should help assuage our McLuhan panic. We are not condemned to lose our reason and be caught up in a polyglot tele-babble of visceralism. We can have our media cake and eat it too.

McLuhan portrayed electronic media as an assault against reasoned choice. We swallow everything that comes across the radio waves; we can no more differentiate and filter television images than a newborn baby can distinguish what is put in its mouth. Infantilism reigns within mass media, as viewed from the vantage point of the 1960s. But with digital processing, we can become finicky eaters indeed. Now we analyze, we extract, we rotate and scale.

The key lies in choosing our tribe carefully. Instead of McLuhan's vision of a resurgent oral/tactile culture of television, we can embrace hacker culture. With the help of regular expressions and other digital processing, individuals can shape media into what they want.

Writing in the 1960s and 1970s, McLuhan can be forgiven for ignoring hacker culture. But this extension of human capability may, in classic McLuhanesque fashion, alter our relations with media and society. Hacker culture will attract malnourished seekers of oral community much more than the cynical cheeriness of television or the cell phones to which so many cling like a lifeline to community.

New media always make it easier for users to express themselves. That is inherent in their newness, for otherwise no one would bother to adopt them. The Internet extends the traditional human abilities to see, to speak, and to manipulate. The revolution is not so much one of content but of distribution. Computers allow the manipulation of old content and old media in unanticipated ways.

McLuhan says that media cause the world to change just as relationships between our senses change. And digitization certainly fits the model.

Given open standards, easy scripting languages, and cheap, versatile devices, digitization could allow users a degree of control over content never before imaginable in history. Conversely, given welded-case devices and access controls, they could allow the owners of content a degree of control over users never before imaginable in history.

A closed, unprogrammable device fits McLuhan's most dire assessment of automation and its numbing effect. But once a hacker breaks open the device and reprograms it, he reclaims not only the device itself but all media with which it comes in contact. We have seen the potential of new media. Let us now reach out and grasp it.

Andy Oram is an editor for O'Reilly Media, specializing in Linux and free software books, and a member of Computer Professionals for Social Responsibility. His web site is www.praxagora.com/andyo.

Return to the O'Reilly Network.

Copyright © 2009 O'Reilly Media, Inc.