Friday, 29 April 2005
Two weeks ago, I upgraded Perl, and all dependent packages, on the
machine that runs Mimír, which
caused Mimír's news aggregator to not function correctly anymore. This
was the excuse to finally rewrite the aggregator,
something that I've been putting off for a while now. A bit of
Mimír is fed by publish-subscribe
notifications via Jabber. The idea is that any news source has a pubsub
node to which new news items are published. All subscribers of that
node get an instant notification of this. One of the subscribers is
Mimír's news bot, which uses the notifications as input for each news
channel in the system, and redistributes the new items among the
channel subscribers (if they are online) or mark the item as unread for
the subscriber. Unread items can be read at a later time, via the
subscriber's personal news page.
Unfortunately, not many news sources publish their news via Jabber
publish-subscribe. In comes the Mimír aggregator, a component that
polls the news from legacy news sources. It keeps a list of news feeds,
and periodically fetches these in search for new items. Once a new
item has been discovered, it publishes the item on behalf of the news
source to a pubsub node.
Up till now, Mimír's aggregator was based on Janchor. I modified it
to send pubsub publish requests, instead of the normal chat messages.
Janchor is written in Perl, and since the upgrade I couldn't easily get
it going again. Didn't really want to, either, because of my long-time
wish to replace it.
I wanted to have a well-behaving
aggregator, that uses the
Etag headers in HTTP, handles broken feeds, accepts
compressed encodings and strips out nasty HTML. The result is an
aggregator written in Python, based on Twisted and the really great
Universal Feed Parser by
Instead of the built-in, synchronous, feed fetching support in Feed
Parser, I wrote a fetcher using Twisted Web for fetching the pages
asynchronously and handling the HTTP result codes. I ended up writing a
custom class that mimics the interface of Feed Parser's build-in
fetcher because I needed to inject the received headers into the feed
parsing code. For example, Feed Parser can rewrite relative links in
the HTML found in the feed's items. Among other pointers, it uses the
received HTTP headers for this. Objects of this custom class are
created from the retrieved feed and headers.
Concluding, the new aggregator works like a breeze and I am really
happy with the result. As a side-effect of using Feed Parser, Atom
feeds are now supported, too.
On a side note: although it needs a rewrite too, the Mimír news bot
(written in Perl) got an update as well. I discovered an entity
encoding bug in
Jabber::NodeFactory today that
caused embedded SGML/XML/HTML tags to be unescaped one time too
Thursday, 21 April 2005
Where do I get stuff from?
Recently, there has been more and more discussion on the
jdev mailing lists about Publish-Subscribe,
in part because of updates to the User Avatar
JEP. One discussion is about keeping track of your pubsub
subscriptions. For clarity, the context is the use of pubsub in regular
Jabber IM deployments. Pubsub is also useful in other contexts, but the
IM context carries some inherent assumptions. I make three observations
here, and expand on this further below.
- No central pubsub service
What we will most probably see, is that entities will
have subscriptions on a multitude of pubsub services, scattered
troughout the Jabber universe.
- No subscription notification
When send an initial subscription request to a pubsub
node, you get a reply stating if you were indeed subscribed,
or that the subscription is pending and waiting for approval
of the node owner. Also, when your subscription is cancelled,
you have no way of knowing what happened.
- Subscriptions are bound to a JID.
This seems obvious, but what I want to point out here
is that if you use a bare JID (without a resource part) to
subscribe, notifications will go to the resource (client) with
the highest priority. If you subscribe using a full JID (with a
resource part), notifications will only go to the specified
A pubsub service usually allows you to query your existing
subscriptions, using the affiliations element. This
is similar to fetching your roster using
jabber:iq:roster. However, my first observation
above causes this to be troublesome. Since my subscriptions can be on
any pubsub service, I'd have to know which services I have
Unlike the normal roster, pubsub does not have a way to relay
changes in your subscriptions, as explained above. This means that if
my subscription changed since I last queried for my subscriptions, I
have no way of knowing. Sure, if I suddenly get notifications from a
node to which my subscription was previously pending, I could refetch
the list of subscriptions. But that amounts to polling, which is
cumbersome. It would be much nicer to be notified of such changes.
To counter these two problems, one could register with the pubsub
services, like with transports, and have the services in your roster.
The client could then simply look in the roster for the services to
query using the affiliations element. Like with
rosters, as soon as the affiliations were first fetched after getting
presence from the entity, the service could send out notifications to
the entity that represent changes in affiliations (not only
subscriptions) with that service. This could be done by sending
notifications from the root node, having an
affiliations element in the item body, or by
allowing such an element to be sent as a direct child of the
event element, similar to the notification of
deleted nodes. After sending unavailable presence, the notifications
My third observation above says that it makes no sense to subscribe
using your bare JID when running concurrent clients, because
notifications will only go to one of the connected resources. For
things like avatars, that is most likely not desirable. Adding
subscriptions for each resource (depending on what the client supports)
is one alternative. One other solution is to invent a new (boolean)
node configuration option, that states that, if set, the pubsub service
needs to check which of the resources support the namespace of the
payload of the node's items, and send out notifications to all of
those. This requires the service to subscribe to the presence of the
resources in question. Registering with the pubsub service, again,
solves this. The service could then use Entity
capabilities and Service
Discovery for checking the namespace support.
Wednesday, 6 April 2005
I was a little confused while reading stpeter's thoughts on
pubsub into Jabber servers. What he wrote is that every
Jabber user's JID would represent a pubsub node. But that's not what he
means. He meant each user having it's own, albeit virtual, pubsub
service as part of his Jabber server, holding
different nodes for moods, avatars, etc. Now
that's a nice idea.
Reading the discussion by stpeter
I can't help wondering if we aren't re-inventing XML all over again. So, I
wanted to throw the following into the discussion:
<x xmlns="jabber:x:data" xmlns:xdata="jabber:x:data" type="form">
<author xmlns="http://www.loc.gov" xdata:type="text-single">
As, can be seen, I annotate the child elements in the form with the
xdata:type attribute. This is an
extension to Data
Forms. The following example is equivalent to the previous
one, but uses a different annotation with respect to namespaces:
<xdata:x xmlns:xdata="jabber:x:data" type="form">
<author xmlns="http://www.loc.gov" xdata:type="text-single">Shakespeare</author>
XML attributes, without a prefix, don't belong to any
namespace, but depend on the containing element for providing their
context. So, although the default namespace in the second version is
jabber:x:data, this isn't
inherited for attributes.
Monday, 4 April 2005
Getting the most out of it...
I regularly get requests from people using Mimír, my Jabber Powered news
reader service with web interface, for features that make no sense to
me. That is, until I discover that they are not aware of all features
that are already there.
To remedy this, I have created the Mimír User Manual, a complete
guide to all features, with a detailed description of the preference
settings, and several screenshots. This should help people get the most
out of the service.
The manual is publicly available, so it can also be used to get an
idea of what Mimír has to offer! For those who don't use it yet, if you
get the idea that I copied Bloglines, Mimír predates it
by almost a year. My first commit to the CVS repository was on 4
Sunday, 3 April 2005
So very tiny...
Today, my sister-in-law Rian gave birth to Jeroen Schut, the brother of
Maarten and son of Ben.
My wife, Irma, is his godmother. Jeroen is a healthy boy of about 3Kg.
I'm so very proud!