OSCON 4.4: Inside Ponie, the Bridge from Perl 5 to Perl 6

by Geoff Broadwell

Related link: http://conferences.oreillynet.com/cs/os2005/view/e_sess/6811

I suspect I'm not alone in watching the slow march of Ponie milestones and wondering "What the heck is going on under there?" There's clearly a pile of work getting done, but I had no idea why it was such a hard problem (other than a vague feeling that perl5's internals were rather . . . hairy, shall we say). Nicholas made the problems all too frighteningly clear.

The transition from Perl 5 to Perl 6 is fundamentally different from previous transitions:

  • The gap between major releases is at least three times as long as ever before.

  • The existing Perl 6 compilers are all clean reimplementations, instead of iterative improvements on a previous working version.

  • Unlike previous upgrades, Perl 6 is not automatically backward compatible with Perl 5; some kind of wedge is needed.

  • CPAN exists, and is one of the primary contributors to Perl 5's success; we cannot break it.

  • XS code exists, both in CPAN and in the wild; we really don't want to break that either.

These are big issues, and there are others (the desire to have less hackish Unicode and threads support, for example). To get us to the final happy place, several parallel projects exist (apologies if I have any of the below slightly wrong):

  • Pugs, a "top-down" Perl 6 compiler in Haskell, allowing us to experiment with and refine the Perl 6 language specification

  • Parrot, the new virtual machine in which the official Perl 6 compiler (and hopefully, compilers for many other languages) will live

  • Perl 6-on-Parrot, a "bottom-up" Perl 6 compiler written directly to the Parrot VM

  • P5toP6, Larry's automated translation tool, which reaches deep into perl5 to automatically convert Perl 5 source to equivalent Perl 6 (yes, with comments and formatting retained as much as possible!)

  • Ponie, a port of the Perl 5 VM to run on top of the Parrot VM, allowing both pure Perl 5 and Perl 5 + XS code to run natively on Parrot

Each of these projects addresses a different part of the problem, and thankfully they seem to work well together, often pushing each other's feature set.

The focus of this talk was Ponie, which as you can see above is the only project of the bunch working on XS compatibility. XS compatibility is a big issue. Not only is there a ton of XS code on CPAN that we want to be able to use automagically, but companies worldwide have tons of XS code floating around that we don't want to break, even as we convince them to migrate to Parrot and Perl 6.

It turns out that XS code uses the very same APIs that the Perl 5 VM uses internally. That's a problem, because those APIs have many issues:

  • The Perl 5 VM is full of bugs and quirks, and has no formal specification; in effect, the "right" (read: relied on by existing XS code) Perl 5 behavior is defined as whatever perl5 does.

  • Many "polymorphism" cases in Perl 5 are actually handled by pages of hardcoded if trees and other checks, spread throughout the perl5 source tree.

  • Tieing and overloading don't mix properly in Perl 5, because perl5 implements tieing as magic, which is limited and inconsistently checked for in numerous places.

  • The API is very thin, including lvalue macros that expand to direct pointer manipulations.

  • Unicode and threads were hacked on to the API instead of being fundamental parts of the design.

  • Numerous bugs exist in the perl5 core code that have never been understood before, but which get in the way of larger fixes and refactoring.

  • And the list goes on . . . .

Just to add a little more trouble to the pile, the Parrot VM made many fundamental design decisions exactly opposite of the way the Perl 5 VM works. For the most part, this is a good thing, as Parrot was built with the lessons of perl5 in mind; many bad design decisions have been fixed, but that doesn't make Nicholas's job any easier.

The obvious question is, aren't Inline.pm or the Pugs <-> Perl 5 symmetric embedding good enough? Do we really need to go through all this trouble? Sadly, the answer is that both of these partial solutions are fundamentally broken. The Pugs <-> Perl 5 linkage leaks memory like a sieve, and there's not much anyone can do about it, without doing most of the Ponie work anyway. Even an Inline.pm port can't do much better (even were it to magically fix the memory management issues inherent in cross-VM object handling), because exports (especially non-trivial import magic) are really hard to get right -- tied symbol tables are yet another thing that just doesn't work right in perl5. And in both cases, there's a lot of wrapping going on -- performance would be less than stellar.

That leaves the Ponie project, and Nicholas with a lot of work to do. I wish him luck!

How much XS code have you written that remains outside of CPAN today?