A few months ago, I wrote about pubsub.com. The guy behind pubsub.com is Bob Wyman, who has been involved in the discussions on the direction of the pubsub JEP.
Recently, pubsub has been getting more attention again, and at one point Bob sent a link to the list about PubSub.com's use of XMPP PubSub. This is a document describing the experimental support of pushing Atom snippets using the pubsub JEP over Jabber. It is an interesting read, I must say. The comments I made in my previous post still stand, and I'll focus on the new stuff in the JEP.
First of all, pubsub.com focusses on content-based subscriptions. You subscribe to the results of a certain query based on content that becomes available. This is an interesting concept, which is also used in the Location Linked Information (LLI) project. In a recent mail by Bob, sent while I was already working on this blog entry, he talks about some of the problems he encountered while trying to implement such a system. He proposes some modifications to JEP 0060 to have the concept of subnodes. For LLI, however, I recommended an approach which might also be applicable for pubsub.com. It uses existing Jabber protocols, without alterations.
A user asks for a Data Form, using Jabber Search, in which he can specify the query he wants to subscribe to. This form is sent back to the service. If that particular query hasn't been used earlier, a new pubsub node is created to which the query results are published. The requesting user is automatically subscribed, and if someone else afterwards asks the same query, he can just be subscribed to the same node, without creating a new one. The node is then reported back in the result of the Jabber Search.
<iq type="get" from="ralphm@ik.nu/pubsub.com_client" to="pubsub.example.com" id="search1"> <query xmlns="jabber:iq:search"/> </iq>
<iq type="result" from="pubsub.example.com" to="ralphm@ik.nu/pubsub.com_client" id="search1">
<query xmlns="jabber:iq:search">
<x xmlns="jabber:x:data" type="form">
<title>PubSub.com query</title>
<instructions>Fill in the query to search in the pubsub.com feeds</instructions>
<field type="hidden" var="FORM_TYPE">
<value>jabber:iq:search</value>
</field>
<field type="text-single" label="Query" var="query"/>
</x>
</iq>
</iq>
<iq type="set" from="ralphm@ik.nu/pubsub.com_client" to="pubsub.example.com" id="search2">
<query xmlns="jabber:iq:search">
<x xmlns="jabber:x:data" type="submit">
<field type="hidden" var="FORM_TYPE">
<value>jabber:iq:search</value>
</field>
<field var="query">
<value>(SOURCE:pubsub.com AND "RSS")</value>
</field>
</x>
</iq>
</iq>
<iq type="result" from="pubsub.example.com" to="ralphm@ik.nu/pubsub.com_client" id="search2">
<query xmlns="jabber:iq:search">
<x xmlns="jabber:x:data" type="result">
<field type="hidden" var="FORM_TYPE">
<value>jabber:iq:search</value>
</field>
<field var="host">
<value>pubsub.example.com</value>
</field>
<field var="node">
<value>e09d3ead0285a6d20d211916a783b4a9</value>
</field>
</x>
</iq>
</iq>
The user should be allowed to request unsubscription and view the current subscriptions via the protocols specified in JEP 0060. The node configuration form can be used to let the user check back on the specifics of that node, like the original query.
The above is for the dynamic subscriptions. Pubsub.com still has a lot of topic based nodes, which I think can still be retained. The content providers publish the news to their node, and the pubsub.com backend then matches those to the user-defined queries, and republishes the data to the respective dynamic nodes described above. Of course this can be an internal action, not actually involving Jabber protocol, the user just receives the notification of the nodes they subscribed to.
Bob also shortly mentions the situation where a user might receive items more than once, because they match multiple queries. I'm not sure if you would have to solve that server side. Sure it generates a bit more traffic, but is immediately clear which query yielded a particular result. A client application could always filter the data on the unique identifier that is embedded in the payload, and leave out duplicates.
Secondly, I like the concept of using Atom for holding the news
entries. Basically, you get mini feeds with just one entry. I would want
to propose to use the entry element as the root, in
stead of feed. This is legal XML, and shouldn't really
pose a problem. I would want to keep the
ps:source-feed idea, but only have it contain the
title and link elements (but using
the Atom namespace). The author element is not needed,
as it can be put in the entry payload, if not already present. The
modified element holds no real value for the end user,
as we are not really dealing with files anymore.
Maybe it is desirable to be informed of changes in the feed's meta
data. For normal topic based subscriptions, I'd create one specifically
named node item (e.g. feed-info) that holds the data
normally found in the header of an Atom file. It is only updated as
something in that data changes (not sure about the
modified element there). For content based
subscriptions, I would want to propose to have a link
element there that points to the topic node that contains all entries,
using the XMPP
URI Format. That allows the client to request the feed's meta
data via Jabber, instead of having to refer to the original feed via
HTTP. If it is also desirable to keep informed of changes to the meta
data of feeds covered by a particular query, there could also be a
link element pointing to a sister node of the feed's
node that only gets the feed's meta data published.
For a while I had been having the idea of setting up a Planet for Jabber. Originally done for the GNOME project as Planet GNOME, a Planet is the aggregation of a set of weblogs of people in a particular community.
Planets are a great way to keep track of things that are going on in a community, all in one place. People reading weblogs on a regular basis have a large number of blogs on their blogroll, but a Planet also allows you to see entries of people who's blog you don't normally read, or just didn't discover yet.
As I said, I've been having this idea for the Jabber developers
community for a while now, and the start of Planet IM, where this blog is also
aggregated (thanks
Christian!), I just set
up Planet Jabber using
Planet, a feed
aggregator written in Python.
Why a Planet for just Jabber? Although I think Planet IM is very good idea, Jabber is much more than just IM. There'll be some overlap in the content, but that's ok. I hope many people will start blogging (more), so we read about all the cool things built using Jabber. If you want to appear on Planet Jabber, let me know.
Idavoll needs some loving. Apart from minor updates, it hasn't really moved forward. My luck is that pubsub has not really gotten any (visible) traction apart from being advanced to Draft status last October.
Recently, however, I got kicked by
temas and pgmillard, to fix some
small bugs in Idavoll, which they are hooking up to the
jabber.org server. They are playing with it a bit
now, with some serious plans for the near future.
In the mean while, I have been wanting to redo the implementation
of Idavoll in Twisted,
an event-driven networking framework written in Python. I've been talking
to
dizzyd, who wrote most of the
Jabber protocol code in Twisted, for some guidance to go about this.
Looking at Proxy65
and the nice step-by-step tutorial for implementing a full-featured,
scalable and extensible finger service, Twisted
from Scratch, or the Evolution of Finger, I have finally started
on the rewrite of Idavoll using Twisted.
The idea is to make the service allow for different, pluggable backends and also, in the philosophy of Twisted, make it possible to hook up other protocols for interfacing with this backend. This, hopefully, means, that it should become rather easy to hook up a web-based interface to help administer the component. Let's see how that evolves.
One of the bugs I needed to fix in Idavoll was to have the Jabber component return error stanzas to signal that unknown queries are not implemented. I did a quick fix on the current implementation, and then started to wonder how to do that using the Twisted framework. This proved to be impossible to do nicely. In Twisted, you can hook up handlers to XPATH like queries that match incoming XML elements. However, there is no ordering in these handlers, so although you can make a catch-all handler, and let other handlers signal whether they did or did not handle a certain stanze, the catch-all can easily be called before the more specific handler.
For now, I hacked xish, which does the XPATH matching and calling
of the observers, to have
XPathQuery objects contain a
priority attribute, and implemented the
__cmd__() and __hash__() methods to
be able to sort the objects. After modifying the methods for adding
observers, you can now give a priority to each XPATH query, much like
template matching in XSLT. This works nicely. I just give the catch-all
observer a priority of -1, with 0
being the default. This way, existing code is unaffected.
Now, on to the rest of the implementation. If only to prove boyd wrong.
Today, one day later than most years, it is my birthday again. For the 28th time in my life. Most normal people start to count at 1, though, so let me say it like this: I am now 27 years of age. Time sure flies.