ralphm.net

ralphm's blog

Sunday, 13 June 2004

pubsub.com and XMPP

Getting there...

A few months ago, I wrote about pubsub.com. The guy behind pubsub.com is Bob Wyman, who has been involved in the discussions on the direction of the pubsub JEP.

Recently, pubsub has been getting more attention again, and at one point Bob sent a link to the list about PubSub.com's use of XMPP PubSub. This is a document describing the experimental support of pushing Atom snippets using the pubsub JEP over Jabber. It is an interesting read, I must say. The comments I made in my previous post still stand, and I'll focus on the new stuff in the JEP.

First of all, pubsub.com focusses on content-based subscriptions. You subscribe to the results of a certain query based on content that becomes available. This is an interesting concept, which is also used in the Location Linked Information (LLI) project. In a recent mail by Bob, sent while I was already working on this blog entry, he talks about some of the problems he encountered while trying to implement such a system. He proposes some modifications to JEP 0060 to have the concept of subnodes. For LLI, however, I recommended an approach which might also be applicable for pubsub.com. It uses existing Jabber protocols, without alterations.

A user asks for a Data Form, using Jabber Search, in which he can specify the query he wants to subscribe to. This form is sent back to the service. If that particular query hasn't been used earlier, a new pubsub node is created to which the query results are published. The requesting user is automatically subscribed, and if someone else afterwards asks the same query, he can just be subscribed to the same node, without creating a new one. The node is then reported back in the result of the Jabber Search.

<iq type="get" from="ralphm@ik.nu/pubsub.com_client" to="pubsub.example.com" id="search1">
  <query xmlns="jabber:iq:search"/>
</iq>
Client requests form
<iq type="result" from="pubsub.example.com" to="ralphm@ik.nu/pubsub.com_client" id="search1">
  <query xmlns="jabber:iq:search">
    <x xmlns="jabber:x:data" type="form">
      <title>PubSub.com query</title>
      <instructions>Fill in the query to search in the pubsub.com feeds</instructions>
      <field type="hidden" var="FORM_TYPE">
        <value>jabber:iq:search</value>
      </field>
      <field type="text-single" label="Query" var="query"/>
    </x>
  </iq>
</iq>
Service returns form
<iq type="set" from="ralphm@ik.nu/pubsub.com_client" to="pubsub.example.com" id="search2">
  <query xmlns="jabber:iq:search">
    <x xmlns="jabber:x:data" type="submit">
      <field type="hidden" var="FORM_TYPE">
        <value>jabber:iq:search</value>
      </field>
      <field var="query">
        <value>(SOURCE:pubsub.com AND "RSS")</value>
      </field>
    </x>
  </iq>
</iq>
Client submits form
<iq type="result" from="pubsub.example.com" to="ralphm@ik.nu/pubsub.com_client" id="search2">
  <query xmlns="jabber:iq:search">
    <x xmlns="jabber:x:data" type="result">
      <field type="hidden" var="FORM_TYPE">
        <value>jabber:iq:search</value>
      </field>
      <field var="host">
        <value>pubsub.example.com</value>
      </field>
      <field var="node">
        <value>e09d3ead0285a6d20d211916a783b4a9</value>
      </field>
    </x>
  </iq>
</iq>
Service returns result

The user should be allowed to request unsubscription and view the current subscriptions via the protocols specified in JEP 0060. The node configuration form can be used to let the user check back on the specifics of that node, like the original query.

The above is for the dynamic subscriptions. Pubsub.com still has a lot of topic based nodes, which I think can still be retained. The content providers publish the news to their node, and the pubsub.com backend then matches those to the user-defined queries, and republishes the data to the respective dynamic nodes described above. Of course this can be an internal action, not actually involving Jabber protocol, the user just receives the notification of the nodes they subscribed to.

Bob also shortly mentions the situation where a user might receive items more than once, because they match multiple queries. I'm not sure if you would have to solve that server side. Sure it generates a bit more traffic, but is immediately clear which query yielded a particular result. A client application could always filter the data on the unique identifier that is embedded in the payload, and leave out duplicates.

Secondly, I like the concept of using Atom for holding the news entries. Basically, you get mini feeds with just one entry. I would want to propose to use the entry element as the root, in stead of feed. This is legal XML, and shouldn't really pose a problem. I would want to keep the ps:source-feed idea, but only have it contain the title and link elements (but using the Atom namespace). The author element is not needed, as it can be put in the entry payload, if not already present. The modified element holds no real value for the end user, as we are not really dealing with files anymore.

Maybe it is desirable to be informed of changes in the feed's meta data. For normal topic based subscriptions, I'd create one specifically named node item (e.g. feed-info) that holds the data normally found in the header of an Atom file. It is only updated as something in that data changes (not sure about the modified element there). For content based subscriptions, I would want to propose to have a link element there that points to the topic node that contains all entries, using the XMPP URI Format. That allows the client to request the feed's meta data via Jabber, instead of having to refer to the original feed via HTTP. If it is also desirable to keep informed of changes to the meta data of feeds covered by a particular query, there could also be a link element pointing to a sister node of the feed's node that only gets the feed's meta data published.