Trivial Scripting with Ruby

by Gregory Brown

I spend most of my time building relatively large applications with Ruby, and this makes me forget how easy the quick and dirty hacks are. In less than the time it'd take me to google the right UNIX tool for escaping HTML, here is my tiny script that I use for things like blog entries and mass spam emails.


#!/usr/bin/env ruby

require "cgi"
puts CGI.escapeHTML(ARGF.read)


Mmh,... sweet simplicity. If you've not worked with the CGI lib before, there are probably other goodies in there so have a look at the API docs.

UPDATE: Sam Aaron does a good job of explaining what this script actually does in the comments

11 Comments

Danno
2007-04-13 20:49:57
Now I just wanna know what the Unix tool is.
Sam Aaron
2007-04-14 02:55:05

Cool. It’s nice to see small, simple examples of Ruby goodness held up for all to see.


However, I sense that this might be a nice little opportunity to explain some of the more simple features of Ruby to newcomers.


So, if people are interested, here are the CGI docs:


http://www.ruby-doc.org/stdlib/libdoc/cgi/rdoc/index.html


where there’s an example usage of the method CGI#escapeHTML:


Escape special characters in HTML, namely &\"<>


CGI::escapeHTML('Usage: foo "bar" <baz>')
# => "Usage: foo &quot;bar&quot; &lt;baz&gt;"


It might also be worth pointing out that ARGF is slightly different to ARGV. Where ARGV represents the array of command-line arguments, and ARGF is a special file-like object that can be used to read all the input files specified on the command line (or standard input if there are no files).


So, in Ruby, a simple no-bells implementation of the unix command cat would be:


#!/usr/bin/env ruby
puts ARGF.read


which would print either the files you specify, or standard input directly to the standard output.


Hope this helps :-)


Amr
2007-04-14 03:03:43

cat input.html | sed -e s/\&/&amp;/g -e s/"/&quot;/g -e s/>/&gt;/g -e s/</&lt;/g


Note: Make sure you 'escape" the ampersands and the semicolons in the commandline above. I had to pass that snippet through itself to get the escaped version except that I am too lazy to escape the backslashes themselves just to post them in here.


Now, if you had to HTML-unescape, this could get hairy , so Ruby solution would scale better.

Amr
2007-04-14 03:05:30
i see that it ate my double-quotes as well. meh... you get the idea.
Gregory
2007-04-14 04:01:19
Sam:


nice explanations. You're right that these things are often a good chance to explain stuff to newcomers, I sometimes post things like that under the NubyGems series. However, I'm also a fan of the 'mostly code' blog post that's not really meant to be a tutorial but to just show something I thought was cool. :)


I'll add a link to the API docs in the article though.


Amr,


yeah, that's fine, but in that case just use ruby -e and the command above, or perl -e :)


It'd be good to know if there was a unix command for escaping / unescaping HTML that was fairly standard

Sam Aaron
2007-04-14 06:00:06

Gregory:


Yes, I thought that. I think that posting short, sweet snippets of snazzy code is a great idea. I also think that you're right to keep explanations from polluting the succinctness (and therefore digestibility) of the post. Happily for those that are interested, comments are also a good place to put further explanation. Thanks for explicitly mentioning me in the post itself :-)


I'm looking forward to more snippets of Ruby goodness...

Amr
2007-04-15 06:15:21
@Gregory: That is true, I just put that there so someone looking for an equivalent won't have to futz around with sed :) sed is always there, albeit a little more painful, for simple tasks like this.
Gregory Brown
2007-04-15 18:35:49
Amr:


Full ack. I'm just a recovering perl monger turned Ruby curmudgeon :)


2007-04-16 11:36:57
This is great, please post more stuff like this.
Johnny P
2007-04-17 17:42:15
I always liked gets to catch files on the command line and piped data, much like perl's <>


echo "hello
World" | ruby -rcgi -e 'while l=gets do puts CGI.escapeHTML(l) end'

Johnny P
2007-04-17 17:44:40
My example was:
<br>
between hello and world as the arg to echo, oops.