[TBray:Push||Pull] On Atom, APP, XMPP, RFC 4661, OpenSearch, and The Other Side Of Push (AKA, LLUP/Blip Messaging) -- Oh, and Smalltalk/Squeak, Too
by M. David Peterson
ongoing � Practical Transparency
Need For Speed � Even a nice clean well-known feed doesn't quite solve the whole problem. Your typical feed-reader is set up to poll every half-hour or even less often, and there are those in the financial community who are not going to be happy with a potential half-hour's latency in getting the news. �
I can think of one simple brute-force way to approach the problem, and another that's a little more sophisticated. The simple solution is, assume that everyone who really cares will want to poll that material-news feed every few seconds. So, you stage Jonathan's feed, not on the ordinary blogging infrastructure, but on a hyper-fast cache that can take that kind of transaction load; there might even be a business opportunity here for some infrastructure player to offer this kind of special-purpose staging.
If you want to get fancy, you could use something like the proposed new Atom-over-XMPP trick. The idea is that people who want ultra-low-latency feeds don't poll, but set up a persistent connection to the provider's server, which pushes entries down the wire the moment they become available. This is elegant in theory; in practice I'd bet on the brute-force polling approach, at least off the top.
And if you want it to really work well, take Atom, APP, XMPP, RFC 4661, OpenSearch, and LLUP (Blip Messaging)
and thats it -- Problem solved.
From my recent post
to the LLUP mailing list
I have been thinking about subscription for a while. I recently posted a blog entry referring to some older attempts to achieve this, and generally dissing XMPP's JEP-0060 and its lack of discussion about service guarantees.
I listed the following principles that I think need to be incorporated into an internet-scale subscription mechanism:
Summarisation. This is the organised discarding of information within an update stream to ensure that slow clients recieve as much information as their connection characteristics permit. This avoids overall system overload due to infinite buffering.
Differential flow control. A slow client should not prevent fast ones from getting updates, nor cause them to recieve the slow client's summarised stream.
Localised resynchronisation. A client need not reach back to the origin server for the current resource status if its immediate server is already handling the subscription.
Patch updates. For large resources (especially lists), the ability to deliver a message that indicates the change from last time, only. Not the whole state.
Security Measures. Pub/Sub can be a source of denial of service attacks. The subscrpition mechanism must be able to detect when its notifications are being treated as spam and end the subscription
|M. David Peterson
I agree with you wholeheartedly... These are all important aspects of a system such as this. In fact, this is something we've been working on in various forms for just under 3 years, and as such, have had gained some pretty eye opening insite into the areas of a system in which present some interesting challenges that I would have never considered had it not been for actually playing with the various scenarios, configurations, etc... to see where all of the holes exists, and how, if at all possible, to patch these holes.
I think we've come to a pretty solid foundation, and as various members of our development team finish up various book projects and such over the next month or so, hopefully we will be able to get the initial specification finalized and moving forward.
None-the-less, I will definitely take a look at the links you've provided. Thanks! :)
Interesting comments.. :D
Basically, you're talking about message queueing when you talk push or pull.
For AFP or AP, you'd need to be able to scale to a few million subscribers.
What XMPP really needs is a mirrorable flag on Pub/Sub. If you have that, then you solve the scaling and DoS problems.
The way this works, client subscribes to queue. The server creates a local queue, and subscribes the client to it. It lies to the client, tells it that its the subscription it asked for. No matter how many clients request the same remote subscription, they are all redirected to the local queue.
The server then establishes a single subscription on the remote queue. When a message is received, it pushes it onto the local queue, which in turn pushes it to the clients.
In a federation, the servers in the federation establish the connection on a subscription server within the federation, and that subscription server establishes the single subscription to the content provider.
The idea here is that the root nodes, closest to the content provider, do the least work. They get the least data. They generate the least messages.
The leaves, closest to the destination users, do the most work. They send the most data, and generate the most messages.
A denail of service can only be conducted against a leaf -- you can't do it against the trunk, federation as a whole, or network as a whole. That avenue of attack is completely cut off, with no loss in functionality.