Free Regular Expressions

by Tony Stubblebine

Related link: http://www.regexlib.com/



The fine coders at RegExLib have
developed a collection of incomprehensible code snippets - and they're giving
them away for free. That's right, free regular expressions.



RegExLib is a library of 900+ regular expressions contributed by a
community of regex loyalists. Each is
organized by function (like URI or Email) and rated by visitors.



Your job isn't just cut-n-paste, you'll likely have to choose from several
similar regexes. Each regex comes with examples matches and non-matches so
that you can see the author's intentions, see how liberal the match is, and
see which edge cases are or aren't covered. These example matches should help
you find a regex that's close to your needs.



The example matches aren't exhaustive, so you should definitely test the regex
against your own data. You'll be aided by the site's helpful testing feature.
Each regex links off to a testing page where you can run the regex against
your own test matches.



Most of the contributors are also enthusiastic regex bloggers,
http://blogs.regexadvice.com.

Two more things to note. First, the regular expressions don't have an
explicit license, only this href="http://blogs.regexadvice.com/dneimke/archive/2004/12/07/1971.aspx">statement
of free use. Second, there's a .NET bias to the site, but since .NET uses
a Perl compatible regular expression syntax you should be able to reuse the
code in any other Perl compatible implementation.



5 Comments

merlyn
2005-03-08 21:00:26
Discovered this site last week - it's JUNK
I've subscribed to the RSS feed just so I can be the first to comment about how broken each regex is.


If anything, do the same, but don't use anything from the site. They're mostly broken, or don't work as advertised.

tonystubblebine
2005-03-08 22:01:36
JUNK like practically every other regex written.
I'd say that's true for most regex.


There's a couple things I really like about the site: listing of matches and non-matches, and encouragement for testing.


I see a lot of code like this:



if ($data =~ m/.../) { do_something(); }


There's no indication of what the regex is supposed to match, let alone why or what edge cases are important. That coupled with the number of broken regexes in the wild is killer.


As for testing, I'd trust an average coder to write 10 lines of simple code that ran correctly with minimal testing. I wouldn't trust a 10 character regex no matter who wrote it unless the testing was pretty thorough.


Perl programmers have a better regex library on CPAN, but most other programmers are better off with some inspiration. Plus I think a lot of our reader could make good contributions to the site.

bazzargh
2005-03-09 04:58:23
For the UK...
I had a look and a couple of pages in I spotted some for UK specific things, like postcodes, NI numbers, etc. For anyone passing through here who needs those, you should be looking at the regexps in the official schemas on the govtalk website:


http://www.govtalk.gov.uk/schemasstandards/schemalibrary.asp


also free.

uzziel
2005-03-14 17:31:39
Good, but . . .
When I saw this, I slapped my forehead and said "Duh! Why didn't I think of that?" It makes sense to maintain a regex dictionary just like you'd have a data dictionary for your application.


The problem is that regexes are so incredibly brittle, they're not extremely portable. What works in your .NET application will probably not work as a command line argument to sed.


I think it would make a lot of sense to maintain your own local library of regexes grouped by processor (grep, awk, Perl, etc.) and with several well-documented tests for each one.

merlyn
2005-03-30 18:14:09
JUNK like practically every other regex written.
No, that's true of most bad regex. This site is full of "what not to do". In the past few weeks, I've marked "1" on "1 to 5" for nearly every single regex that has come up as newly added.


For example, parsing HTML should not be done with a regex. Validating a valid email should not be done with a regex. Determining a valid date should not be done with a regex!


I don't know why the posters to this site continue to post amazingly bad regexen (broken) for things that really need to be done with some other technology. It's as if you were given a screwdriver, and told "go knock down that tree over there", and you don't just throw your hands up and say "no go", but instead sit there poking and poking at the tree.


Regex are not the only tool in the toolbox. Nearly every task on this site are tasks better done with some other tool.