A Beautiful Regex Matcher... In Haskell

by chromatic

I've been reading Beautiful Code, picking out chapters here and there as I have time. While reading Brian Kernighan's explanation of Rob Pike's regular expression program from The Practice of Programming, I had an idle thought. "Hey, that's a highly recursive program with complex behavior suitable for didactic purposes."

Of course, Kernighan says that almost verbatim in the text. He also says "It's a nicer example than Yet Another Fibonacci Sequence Generator."

So I ported it to Haskell. I don't promise it's necessarily great Haskell, and I wouldn't consider it entirely beautiful, but it appears to function.


2007-08-21 08:26:15
Little bug: by putting the empty list match before the '$' match, "Foo$" doesn't match "Foo".
2007-08-21 08:40:12
It appears the code has some bugs. For instance,

1) match "oo" "foo" returns False

2) match "foo$" "foo" returns False

3) match ".*" "foo" returns False

4) match "a*a" "aaa" returns False

Another complaint I would have is that the names "text" and "regexp" make the original C code much easier to understand. When reading the Haskell code I often got confused between what the x's and y's were supposed to represent.


2007-08-21 08:57:45
Tweaked version with bug fixes and more tests:


2007-08-21 09:31:47
It looks like this test fails:
checkMatch ".*" "Foo";
it evaluates as
match ".*" "Foo"
matchhere ('.':'*':[]) "Foo"
matchstar '.' [] ('F':"oo")

This is because you did not use guards on the first definition of 'matchstar'. Here is what may be a corrected version:

matchstar :: Char -> String -> String -> Bool
matchstar '.' xs (_:ys) = matchstar '.' xs ys
matchstar c xs (y:ys) | c==y = matchstar c xs ys
| otherwise = matchhere xs (y:ys)

2007-08-21 09:54:35
"Hey, that's a highly recursive program with complex behavior suitable for didactic purposes."

Do you really have thoughts like that? : )

2007-08-21 13:20:57
Updated version of ptolomys that correctly (I think?) handles ^.


2007-08-22 11:24:47
chromatic, also see http://www.nondot.org/~sabre/Projects/HaskellLexer/index.html
2008-01-02 02:03:20
I am reading the same book. I noticed that match ("a*bcd", "xyzbcd")=1, is it acceptable?
From my understanding to regular expression, it's not correct.
2008-01-07 13:18:36

Your example looks correct to me. The star metacharacter matches zero or more occurrences of the previous character, and there are indeed zero a characters in the string to match.