Open Source Thoughts: Parrot and Multicore

by Kevin Farnham

I am working on finding a way to enable developers working in a wide variety of languages to directly access computationally-intensive libraries written in C++, C, and Fortran. The libraries will have been multithreaded using Threading Building Blocks (TBB), the open source project for which I'm "community manager." TBB is a C++ template library (like STL). I don't expect to have much of a problem calling C and Fortran libraries from C++/TBB code. But, what's the best path to enable someone writing in Perl or Python or Ruby or -- whatever -- to call these multithreaded libraries?

This search has led me to reinvestigate some techniques I've looked at in the past -- for example, Perl's XS -- but the idea of having to create an interface for each individual calling language is unappealing. I looked at, and did some experimenting with, SWIG (Simplified Wrapper and Interface Generator). But before I got very far, Parrot was suggested to me by some people on the #tbb IRC channel (on FreeNode.net).

During my initial investigation of Parrot, I wrote a blog about my research. Parrot looked promising to me:

Hence, if we can wrap C++ libraries threaded using TBB, then the Parrot NCI should make it possible for all the languages that have Parrot support to call those libraries. Then, high level scripting languages such as Ruby, Python, and Perl will have convenient access to computationally-intensive libraries that have been threaded for optimal performance on multicore processors.

This post elicited an interesting response on another site: "Will Parrot Ever Truly Deliver?" The author acknowledges that "Parrot does sound like an interesting piece of technology", but wonders "will it ever be a platform suitable for serious, production usage?" The author's concerns include the length of time Parrot has been in existence (quite a long time), the instability of the code base (lots of significant changes), and the incompleteness of the support from other languages.

Does multicore change the Parrot equation?

Sometimes a technology is invented, and the time simply isn't right, the need at the moment for solutions that apply that technology is nearly non-existent, though many people readily admit it's a "wonderful" technology. I wonder if this might apply to a certain extent to Parrot prior to the age of many-core computing?

In a few years, inexpensive PCs will have 8, 16, or more processing cores. Some people doubt that the average home or office user is going to have any use for all these cores. I think that's like saying "no one will ever need more than 640K of RAM." Once it's possible for the average home or office user to apply algorithms and image analysis and video processing and stock market simulators that were previously available only on high-end workstations in data centers, you cannot tell me they won't want to do this.

It's going to take programming techniques like Threading Building Blocks, OpenMP, perhaps new languages such as Erlang, or Transactional Memory applied in Haskell, to multithread these computationally-intensive libraries. I doubt that applying conventional low-level threads is going to be an efficient way to accomplish this in terms of programming time (I've worked at this level for a long time).

But on the other side: no one is going to want to convert the mass of existing software platforms/applications that could potentially apply these computation libraries, into C++ or C. A convenient means to enable a broad spectrum of languages to call multithreaded C++, C, and Fortran libraries is going to be needed. Otherwise, again we face enormous software development inefficiency, as a separate interface has to be constructed for each library for each calling language. That's not a solution that is going to fly, in my opinion.

It seems to me that Parrot is an excellent candidate for addressing this problem. If this is the case, the Parrot team may soon find itself lent increasing support from independent developers, and possibly from companies who recognize the need for this capability with respect to their own applications.

I don't think this need was really there when PC performance could be improved simply through ever-increasing clock speeds. Single-threaded software that did a few simple calculations was fine then. Multicore, however, changes everything. As highly-scalable multithreaded computation / simulation libraries become available, and people realize they want them, and developers realize they need to be able to call these libraries from every language platform, Parrot's time may arrive.


7 Comments

Thomas Fee
2007-09-26 20:00:46
Referring to the opening paragraph's implication ... My question is: How would Intel's TBB help Ruby, Python, etc? It is necessary for these dynamic languages to be completely cross-platform. TBB requires an Intel processor, according to the TBB spec. AMD servers are not supported.
Kevin Farnham
2007-09-26 20:11:30
Thomas -- TBB is cross-platform. It does not require an Intel processor. I run it under Gentoo Linux on an AMD Athlon-64 system. I think the page of supported processors you're looking at is a page that defines the processors that Intel's commercial TBB customer service will help you with if you run into a problem. But TBB, now that it's an open source project, is intended to be akin to the C++ Standard Template Library. It should run on any processor that can run C++ applications. Hence -- TBB can help Ruby, Python, etc., by allowing them to access highly efficient computational libraries that take full advantage of multicore systems.
Patrick Logan
2007-09-30 19:46:39
Erlang has at least one advantage over these other approaches. Consider process A communicating with process B. (Erlang has shared-nothing processes.) In these other systems, thread A is communicating with thread B in a shared address space.


The programming model for Erlang looks the same whether process A is local or remote wrt process B. With these other systems, if A and B are threads they are "communicating" by manipulating shared data. To make A and B remote from each other requires a different programming model or an expensive non-uniform shared memory mechanism.


The future of many cores will also be many nodes, and so will benefit more from starting with the Erlang style today. Better shared memory mechanisms is fine for low-level systems programming. Most application level programming except for the rare situation, should be shared-nothing message passing.

Mike Hoskins
2007-09-30 21:37:01
"Multicore, however, changes everything.." Agree 100%. Am not sure how much of the mountains of legacy C/C++ (not to mention Fortran) code will be "ported" to TBB (sometimes much re-writing involved since it was written for the much simpler single-threaded world?) We at DataRush wrestle with this issue all the time. Having developed a 100% Java framework enabling highly parallel data-intensive applications on multicore systems (built from scratch in DataRush, or re-using their existing Java code), we often wish we could find a way to exploit all the great high-performance C/Fortran code already written, but as you well know, not so easy (maybe better after all, as you suggest, to use new languages/techniques/libraries to conquer multicore?).


Kevin Farnham
2007-10-01 22:19:14


Patrick: the Erlang capability you mention has been brought up in discussions among developers who are using Threading Building Blocks. What's difficult to foresee at this point is what a very high powered data center is going to look like in, say, 10 years. Will it have thousands of individual computers all working in sync (as is the case today)? Or, will it consist of a handful of computers each of which has a motherboard with hundreds of processing cores spread among many processors? If the latter is the case, the advantage of being able to run a single application across a network of computers will become less significant over time.



But that's not to say that what you point out regarding Erlang isn't important. As I said, Threading Building Blocks users have discussed this, wondering if TBB itself should be enhanced in some way to permit running on multiple machines. I think for TBB, this would have to happen through an added module that coordinates the operations on the networked machines. Whereas, Erlang has that functionality already built in, by design.

Multicore
2008-04-14 12:43:24
Referring to the opening paragraph's implication ... My question is: How would Intel's TBB help Ruby, Python, etc? It is necessary for these dynamic languages to be completely cross-platform. TBB requires an Intel processor, according to the TBB spec. AMD servers are not supported.
Kevin Farnham
2008-04-14 12:53:15
Hi Multicore: TBB works on AMD processors as well as Intel processors. I have run TBB on my AMD Athlon-X2 64-bit system running Gentoo Linux ever since the TBB project became open source.


The confusion comes from the fact that Intel talks about fully supporting TBB on Intel processors if you purchase the commercial edition. That is, if there's a problem, they'll invest their own resources to investigate it and correct the problem.


TBB is a C++ template library, akin in many ways to the Standard Template Library (STL). It runs on any platform that supports the GCC compiler or Intel's C++ compiler.