What is a DSL?

by Jim Freeze

I recently started writing an article about Domain Specific Languages (DSL's) and Ruby. While conceptually I think I understand what a DSL is, I think a specific definition is elusive. After scrutinizing several articles and the Wikipedia, I can propose a limited definition of a DSL as a custom language designed to solve a specific problem.

Some examples of DSL's listed are unix mini (or little) languages, such as sed, awk, troff, m4 or make. The list is quite long, and actually, could go on forever, since most any language, computer based or not, could be considered a DSL. But for now, I'll limit this entry to languages used in the computer domain.

From my work, I see DSLs used for two main purposes. One, is as a friendly way to provide data (configuration or otherwise) to a program. The other is a friendly way to let users write business rules for a particular task. This is usually motivated by the desire to let the end user write code without realizing they are actually coding.

Except for very simple command lists, you see this second type less often because the complexity usually requires one to either build a real mini language or, if using a traditional general purpose language (GPL), the derived DSL is usually more cryptic or complex than just using the GPL itself.

When designing a DSL, the programmer has to weigh the options of taking the effort to build a full featured language (using the likes of YACC or Bison) or to make a simpler language that can usually be parsed with a hand built parser. It's the choice between 'simple and now' or 'full featured and later'.

But the danger is that 'simple and now' languages, if successful, tend to grow into ugly and complex later.

Consider make. I'm not intimately familar with the origin of this language, but here is my wild guess as to how it came about:

Programmer: Hmm, I'm tired of repeating these build steps over and over. I need to make a control file to do this for me. I also want to be able to take this with me to other platforms, so I'll need to use a portable language. Hmm, I know C, I'll use it.

Now lets see, I don't want to go to the effort of writing a real language, I just need some simple features, so I'll write my own parser.

I need a simple way to defined dependencies and a target, something like:

target : dependencies

Yeah, the ':' is good. You don't see colons used much in filenames.
Now I need to define a list of actions. Hmm, how about:

target : dependencies

Wow, this is hard. These blocks are killing me. Hey, wait a minute, I can get rid of these blocks if I just make the user use a tab as the first character on an action line, kind of like Fortran, but more sinister since you can't see the tab character (woohaahaaha). This way I can do a simple character test in C and don't have to do any complex parsing. The user shouldn't mind too much.

Having a critical syntax that depends upon an invisible character is just a horrible design -- for the end user. For the programmer, it was pragmatic and reasonable.

Instead of writing a homegrown parser, another alternative is to create a grammar and use a tool like YACC and create a parser. This definitely falls on the complex side of the scale. For someone who doesn't do this everyday, even simple tasks take a huge amount of brain power and one ends up focusing more on minutia of the DSL, and not on higher level usability issues.

Several years ago I needed to write a description file of geometrical stack. Several vendors had their own format, which were mostly line based, but a couple supported scoping inside a block. None, however, supported variables or constants. Their files were basically glorified configuration files.

I started to write my own using Racc. It was a great learning experience for me. We chose to write our own parser because we wanted to limit what could be done in the file (why, I don't know). I spent about three weeks on the project and things were progessing nicely. It almost looked like Ruby. But, it was tedious, and other things got prioritized over the project before I could finish.

Later I revisited the project. This time I thought, hey, why should I write my own parser, XML/XSLT and xmlproc will do this for me. So, within a couple of days, I had done what took me three weeks previously. I thought it looked readable and the time and was able to partially convince a colleage that it was readable. About a week later, when I came back and revisited the file, I realized, XML is not readable. Sure, if your brain is in XML mode, then it can filter out the syntax noise. But when one is concentrating on getting a particular job accomplished and thier brain is forced to task switch between their problem domain and mentally parsing XML, overloaded synapses are a certainty.

The third time around, after the Ruby DSL hype had been going around for a while, I decided to use Ruby. This time, I was able to create the DSL in about five minutes. It was readable, and I was able to focus on the end users frame of reference.

The moral of this story is, don't write a mini language if you don't have too. And, don't settle for a simple DSL when a full featured one is needed. Consider extending a GPL into a DSL. Particularly an expressive language that is good at creating a readable DSL -- like Ruby.

Back to the original question of a what exactly is a DSL. One can either write a DSL from scratch or use a GPL with a few added functions to create a DSL. But if any GPL can be made into a DSL, doesn't that make all languages DSLs?


Jeff Blaine
2005-12-28 15:32:49
A DSL: Functionality for a distinct set of related tasks.
A GPL: Functionality for a very wide range of unrelated tasks.

For personal work, making a new DSL doesn't always make sense.

If you're writing a DSL to be used by 10,000 people as part of your commercial product (say, a scientific computation DSL), it may make perfect sense to do it as its own DSL and not Ruby/Perl/Python + extension modules.

The difference is in the user interaction. If you're not writing it for others, then uh... just use Ruby/Python/Perl.

Point in case: Go write Java code to build a bunch of C files and assemble a binary (NOTE: The previous text does not say "Go build a DSL in Java to build a bunch of C files and assemble a binary."). Should be a real hoot :)

Jim Freeze
2005-12-28 15:56:03
Thanks Jeff for the definitions. Maybe it's just me, but I think that distinct and 'very wide range' can be subjective and relative terms. Some would say that Japanese is a Domain Specific language -- it is designed for people in Japan to communicate with each other. Others would say it is a general purpose language.

Yeah, I know it's a little on the edge, but it's a fuzzy edge. :)

I like your comment about the 10,000 people and a scientific DSL and agree with what you say. But, Ruby on Rails is really just a DSL for the web and I bet it will have more than 10,000 users soon.

2005-12-28 19:15:56
This is all interesting, but rather vague. Would it be possible for you to put up some specific details of the DSL you implemented in XML and then in Ruby?
Alex Miller
2005-12-28 19:42:40
If you haven't seen it already, I'd strongly recommend Jim Weirich's talk on DSLs from RubyConf. I think his triangle diagram describing different kinds of DSLs was the best summary description I've seen.



Jim Freeze
2005-12-28 20:52:18

You will see a specific example of this in the upcoming article on DSL's that I am writing. But basically we used an existing XML parser (like Rexml) to do the parsing for us. Once the data was inside Ruby in a data structure, we worked with it as usual.

Using XML as a DSL is a common practice for data based DSLs since the parser is already written and hierarchical data can be described. This saves quite a bit of time if one doesn't mind the readability issue, which kind of goes contrary to the original reason to create a DSL in the first place.

Norbert Ehreke
2006-02-12 01:19:35
Jim, I readily believe that Ruby is the language of choice when it comes to DSLs. Extending your question, what I am interested in is whether or not a Ruby DSL can be well integrated into development environments so that we will not encounter a system break that makes debugging very hard. Is this a question worth to explore?
Jim Freeze
2006-02-12 07:56:03

I think the answer is yes. Also, I assume you are talking about
debugging for the end-users. I know for myself, that my head use
to spin around when I had a problem with rake or rails.

Since I have written a few DSL's, rake is not so much of a problem
for me anymore, but rails can really be confusing, especially when it
spews a large stack trace. But, my trick for fixing the problem
(but not understanding it) is to know that it was my code that
caused the breakage. So, I just gradually erase what I typed
until the problem goes away. Once I learn what caused the
problem, I try to understand why from a conceptual point
of view. I rarely debug into someone elses DSL code to find
out why, and I don't think I should have to.

But, if the parser is a mini language instead of a DSL,
debugging further could be even more difficult. So, I
don't think writing a DSL in a GPL makes the end-user
problem more difficult.

For the DSL's that I am currently writing, I have Ruby report
all syntax errors in the users code. This make me feel much
better about handing the user a DSL, but I know it is not a
perfect solution.

Is this the kind of thing you are talking about or did I totally miss the point?

2006-04-07 04:51:11

Sorry, that I only came back to this after such a long time. I guess I did not expect an answer so quickly. Ah, the Ruby community *is* nice! :)

Anyway, I am not sure I understand your explanation, so let me rephrase what I am interested in. You have probably read Fowler and maybe some texts by Dimitrev of IntelliJ fame. These are the two I tried to follow so far. Fowler refers to the symbolic barrier that is created when we use external DSL versus internal DSLs. Ruby offers us a powerful way to create internal DSLs, which is great, but I wonder about the following scenario.

We have a number of DSLs out in production and they run in the same process. So far so good. Now, somewhere in our system we have bug and we need to trace it. How will that be possible with all the DSLs involved? Will we be able to debug our code with a plain old Ruby debugger, possibly in a nice IDE, or will we be forced to rely on debug output to stderr?

I am enthusiastic about DSLs because they offer a concise way to describe a problem in the domain it occurs in. But I am afraid that as easy the construction of DSLs might become the harder it proves to develop and maintain large systems that contain a number of those .