NubyGems: Symbolic Starvation

by Gregory Brown

Now, be forewarned my gentle Nuby friend, I am not going to be explaining symbols here. Though the question is asked on a weekly basis on RubyTalk and other places, and the general consensus is "They're named numbers... they're handy to work with, and they aren't the same thing as Strings, so just use 'em when it feels right", I am going to point out a simple little pitfall that you might want to be careful with.

If you want some primer reading on Symbols, there are already a few on this blog alone, and many more discussions in the RubyTalk archives. In fact, most of the top links from a google search for "Ruby Symbols" should get you on the right track.

So today's topic is memory management. Though Ruby's garbage collection makes it pretty easy for us to not even think of this topic day to day, occasionally, the topic must rear it's ugly head, If for nothing else, it is to remind us why we took that stupid "Intermediate C Programming" course or it's equivalant where seemingly innocuous things such as dynamic arrays were potential sources of memory leaks.

In Ruby, you see a lot of Hashes which use Symbols for their keys. It's mostly because


{ :my => :super, :duper => :hash, :looks => :cooler, :this => :way }
{ "than" => "it", "does" => "this", "way" => "..." }


and because symbols are really fast little deallies to be working with.

If your hash is going to be used internally, you never need to worry about anyone except programmers indexing your data, and if they want a persons[:phone_number] , they can just ask using Symbols. But what if you were using a Hash to store some data accessible via a web form, or a command line application? Then you'd have to convert the Strings to Symbols.

Of course, that is very, very easy. "hello".to_sym happily will generate you a Symbol :hello.

So it's very tempting to put something like this in your code:


do_something_funky_with(my_hash[some_string.to_sym])


High fives all around and you can happily use Symbols internally and let people type into text boxes or scribble bits of Cuneiform which get translated to strings which then are converted to symbols for indexing and you're getting at your data as easy as pie. Ruby's garbage collection will happily do away with those converted strings, won't it?

The unfortunate answer is no. Symbols are designed so that once they spring into existence, they never die. Not of a natural death, anyway. With sufficient validation and sparse use of these immortal little constructs, no problem will ever arise. However, leave the floodgates open and the flood will come.

Take a look at the memory usage in a simple little irb session I was running:

At start:


sandal@harmonix:~$ pmap 24683 | grep total
total 5980K


I then run this bit of code.


>> a = "a"
=> "a"
>> 10000.times { a.succ! }
=> 10000
>> a
=> "ntq"


You wouldn't expect substantial memory growth here and you don't find any at all:


sandal@harmonix:~$ pmap 24683 | grep total
total 5980K


A minor change is made. I convert the values to symbols after I iterate to the next letter. But I don't store the value of this anywhere, so you'd expect it to just disappear peacefully.


>> a = "a"
=> "a"
>> 10000.times { a.succ!.to_sym }
=> 10000
>> a
=> "ntq"


But, alas.. it is not so.


sandal@harmonix:~$ pmap 24683 | grep total
total 6244K


We've grown by 264k! Now this may seem tiny, but imagine this on the end of a long running high volume server process that accepted user input... even potentially that of spambots and malicious Skr1ptk1ddz trying to crack in to set up the l33t35t w4r3z site.

Now we've got ourself a memory leak, and that is generally considered A Bad Thing.

This memory doesn't get released, either. I ran these tests right before I started blogging, and with irb still running and unchanged, running pmap still shows me at exactly 6244K.

If you already have a grasp for symbols and what they are and how they should be used, it's not very hard to see why this problem is something that's not really a problem, but just something you need to be careful about. It's also important to note that symbols get mapped one to one to unique values, and if you call the same symbol again, it's not going to suck up more memory. This makes them completely safe to use if you just protect yourself a bit.

So that's your NubyGem for the day... If you're dealing with user input, it's probably better not to convert it to symbols. But if you do, be sure to validate before your conversion, to avoid creating something like :some_really_l33t_symbols_that_will_eventually_starve_this_process.

Next time... we'll get immediate with values. Anyone with a specific question or experience to share might want to email me, because I'm trying to find a good example for this upcoming article.