The Ruby VALUE

by Caleb Tennis

As you are probably aware, the Ruby interpreter and some of the core libraries are written in C. Over the next few weeks I plan to share a look at some of the internals of Ruby and how it achieves some of the things it does from the C side of things.

The first point of interest is the VALUE - Ruby's internal representation of its objects. In the general sense, a VALUE is just a C pointer to a Ruby object data type. We use VALUEs in the C code like we would use objects in the Ruby code.

some_function(VALUE arg_object)

One would expect that the VALUE is just a typedef to a C pointer and there's a lookup table as to which object it represents, and this would be partially correct. However, there's also some trickery involved.

Instead of implementing the VALUE as a pointer, Ruby implements it as an unsigned long. It just so happens that sizeof(void *) == sizeof(long) - at least on the platforms I'm familiar with. After all, what is a pointer? It's just an n-byte integer that represents a memory address.

But because of this, there's some tricks Ruby can perform.

First, for performance purposes, Ruby doesn't use the VALUE as a pointer in every instance. For Fixnums, Ruby stores the number value directly in the VALUE itself. That keeps us from having to keep a lookup table of every possible Fixnum in the system.

The trick lies in the fact that pointers are aligned in 4 byte chunks ( 8 bytes on 64 bit systems ). For example, if there was an object stored at 0x0000F000, then the next would be one stored at 0x0000F004. This jump from 00 to 04 in the lower nibble is important. Expanding out as bits, it is: 00000000 and 0000100. This means that if we use the VALUE as a pointer, the lowest two bits will always be 0s.

Ruby uses this to its advantage. It will tuck a 1 in the lowest bit, and then use the rest of the space (31 bits) to store a Fixnum. One of the bits will be used for the sign, so a Ruby Fixnum can be up to 30 bits in length.

irb(main):021:0> (2 ** 30).class
=> Bignum
irb(main):022:0> (2 ** 30 - 1).class
=> Fixnum
irb(main):024:0> (-(2 ** 30)).class
=> Fixnum
irb(main):025:0> (-(2 ** 30)-1).class
=> Bignum

Ruby uses the other bit to help distinguish other common types, like false, true, and nil. Symbols and their IDs are also stored with this bit on, so Ruby recognizes it as a special instance and interprets accordingly.

The rest of the time a VALUE is a good old fashioned memory address, which points to an object structure in memory.


So there you have it. I hope this little snippet was of some VALUE to you.


2006-01-25 16:57:12
Is this why the object_id of nil is 4? Does that mean object_id is just an address? What would the object_id of a Fixnum, true and false be in that case?
2006-01-25 17:44:04
It's posts like these that make me wish for a way of "tipping" the author so that in addition to a comment about how useful the post was, the author could derive some economic benefit from it and feel an incentive to write another along the same lines. Lacking that ability, I'll have to settle for offering praise at a great topic that there's not enough content on the web about. On the other hand, you appear to live in the Chicago area, so perhaps I can bribe you with beer...
2006-01-25 18:41:56
This helps clear up some issues with understanding symbol usage and the benefits. Thanks for the insight!
Caleb Tennis
2006-01-26 04:55:45
Justin: you're on to something, and I think I'll probably write about it in my next post.
Caleb Tennis
2006-01-26 05:10:17
Thanks for the praise and I appreciate the virtual tip. I plan to keep writing this kind of stuff as long as I have the material and people find it interesting. In the meantime, you can always tell the O'Reilly folks what you think about the entries - the Contact Us link at the bottom of this page has some e-mails of people who I'm sure would like to hear your feedback.

I'd take you up on the beer, but I'm actually located in the South-Central Indiana area. But I will be in Chicago for the Rails workshop!

2006-01-26 09:01:19
jperkins: Click on a banner ad - it's easy enough and does provide a little tip (tho in this case probably to O'Reilly rather than Caleb).
Juho Snellman
2006-01-27 02:09:05
>> It just so happens that sizeof(void *) == sizeof(long) - at least on the platforms I'm familiar with. <<

There are indeed platforms where this isn't true. As an example from the modern times, 64-bit Windows defines long and int as 32-bit values and pointers as 64-bit values.

2006-01-29 07:41:07
Thanks for the post, I've been looking for this kind of stuff about Ruby. I've heard there's a book that focuses on the internals of Ruby in (sigh, what else) Japanese, but my Japanese is non-existant, so that's out of my reach.

That must be the reason for this pattern too:

irb(main):001:0> 1.object_id
=> 3
irb(main):002:0> 2.object_id
=> 5
irb(main):003:0> 3.object_id
=> 7
George Moschovitis
2006-01-30 06:18:21
Very informative, keep up the great work ;-)