ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

Two Lesser-Known Java APIs: Regular Expressions and JavaServer Pages

by Ian F. Darwin

Regular Expressions

In Java Cookbook, I expose you to APIs that you might not even know existed for Java. One example of this is the Regular Expression API. Regular expressions (or REs) are those cryptic but expressive little pattern-matching commands that you see Unix and Perl people flinging about with gay abandon. They generally are used for extremely powerful and fast searches of large volumes of text to find a particular pattern.

Here's an example from Java Cookbook: Suppose you have been on the Internet for a few years and have been faithful about saving all your correspondence, just in case you (or your lawyers, or the prosecution) need a copy. The result is that you have a 50MB disk partition dedicated to saved mail. And let's further suppose that you remember that there is one letter, somewhere in there, from someone named Angie or Anjie. Or was it Angy? But you don't remember what you called it or where you stored it. Obviously, you will have to go look for it.

But while some of you try to open up all 15,000,000 documents in a word processor, I'll find it with just one simple command. Any system that provides regular expression support will allow me to search for the pattern An[^ dn] in all the files.

The "A" and the "n" in An[^ dn] match themselves, in effect finding words that begin with "An", while the cryptic [^ dn] requires the "An" to be followed by a character other than a space (to eliminate the very common English word "an" at the start of a sentence) or "d" (to eliminate the common word "and") or "n" (to eliminate Anne, Announcing, etc.).

Has your word processor gotten past its splash screen yet? Well, it doesn't matter because I've already found the missing file. On Unix, I just typed the command grep 'An[^ dn]' * and found the answer. (There are several Grep programs for non-Unix systems including, of course, one written in Java, which I present in Java Cookbook).

Visit java.oreilly.com for a complete list of O'Reilly's books about Java technologies.

REs are widely used in Unix (including Linux and BSD) tools and utilities and in major scripting languages (including awk, Perl, and Python). But they have been missing from Java until recently. In fact, regular expressions were missing altogether from the standard API prior to Java SDK 1.4. The Regular Expression API discussed in Chapter 4, "Pattern Matching with Regular Expressions," of Java Cookbook comes from Jakarta, the Apache Foundation's Java projects page.

The Jakarta project has two different RE APIs, the RegExp one used in Java Cookbook and the more comprehensive ORA API. Sun Microsystems Java SDK 1.4 introduces yet another Regular Expression API in the package java.util.regexp. While java.util.regexp will likely predominate in the long term, SDK 1.4 is still in "Early Access," so for the next six to twelve months, many Java developers will be using 1.3, and can use the Apache package.

Related Reading

Java Cookbook
Solutions and Examples for Java Developers
By Ian F. Darwin

Almost everything you learn about REs from Java Cookbook will still apply if you switch to using java.util.regexp. Most of the changes are in the API, which differs somewhat. At the simplest level, you need only change code like:

RE r = new RE(patt);
if (r.match(line)) {
// line matches pattern ...


Pattern r = new Pattern(patt);
if (r.matcher().matches(line)) {
// line matches pattern ...

(And you need to change the import, of course.)

Most of the syntax of the pattern string is the same, though there are a few minor differences in the advanced features.

Regular expressions can save a great deal of time spent doing low-level coding, at a very slight cost in processing time. Regular expressions originated in mathematical theory and were first applied to text-matching by the early developers of Unix. Experience on Unix and with Perl has shown that REs make a big contribution to programmer productivity. It's time for Java developers to reap the same gain. Chapter 4 in Java Cookbook gets you started.

JavaServer Pages

Another API you might not have paid attention to is JavaServer Pages (JSP). You might not realize how easy it can be to build a Web page with Java actions embedded in it. Here is the traditional "hello world" program written as a JSP:

<body bgcolor="white">
<p>Hello from Java at <%= new java.util.Date() %>.</p>

A JSP is read and rewritten as a Java servlet, which is then compiled and run for you. Servlets, in case you haven't heard, are the original means of writing Java code for use in a Web server. Of course, there is more to JSP than this. There are several means of embedding Java code, from a simple expression like the Date shown above, (which will get printed), complete executable statements, and declarations of fields and methods in the generated servlet class. These combine to make it easy to get started with JSP.

However, as time goes by, the page becomes crowded with a lot of Java code; an HTML-aware Web designer (or HTML-aware software) may become confused by it. To solve this problem, JSPs allow you to use ordinary JavaBeans as embedded components. One of the neat features of JSPs is they allow you to copy all of the parameters from an HTML form into a JavaBean with a single statement:

<jsp:setProperty name="customerBean" property="*"/>

Many ordinary classes can be used as JavaBeans; the main requirement is that they have set and get methods.

There is one additional type of Java component that must be custom-written for use within JSPs. These are called JSP Custom Actions. Their API can appear a bit convoluted at first, but there is some sense to it.

To show the many features of JSP, I have written a complete Web site, called JabaDot, using servlets and JavaServer Pages. JabaDot's name is an homage to Java (but not called Java due to trademark reasons) and SlashDot, the original user-fed news Web site. The entire source code for JabaDot is included with the downloadable code on O'Reilly's Examples page for Java Cookbook. You'll also learn more about how the JabaDot site works in Java Cookbook.

Regular expressions and JavaServer Pages are only two of the APIs that I discuss in Java Cookbook. The book covers a wide range of standard and optional APIs, and it will help you (with explanations and code examples) to write Java programs on almost any platform: client, Web server, or stand-alone server. Hence the book's subtitle: Solutions and Examples for Java Programmers.

Ian F. Darwin has worked in the computer industry for three decades, with Unix since 1980, with Java since 1995, and with OpenBSD since 1998. He wrote the freeware file(1) command used on Linux and BSD, and he is the author of O'Reilly's Java Cookbook and Checking C Programs with Lint, as well as over 70 articles and several courses (both university and commercial) on C and Unix. In addition to programming and consulting, Ian teaches Unix, C, and Java for Learning Tree International, one of the world's largest technical-training companies.

O'Reilly & Associates recently released (June 2001) Java Cookbook.