Feed bot spec proposal

November 8, 2019 by Lucian Mogosanu

This article specifies feed bots1, namely: a. data; b. user inputs; c. feed inputs; and d. functions.

a: data:

  • user identifier: is provided by the communication protocol upon which the feed bot is built. In the particular case of IRC, users are identified by their nicknames.
  • feed identifier: feeds are uniquely identified by their URIs2.
  • feed: a record containing, minimally: i. a feed identifier; ii. a string denoting the feed's title.
  • feed item: also called a feed entry; a record containing, minimally: i. a string uniquely identifying the entry within the feed; ii. the entry title (string); iii. a link (URI) pointing to the entry.
  • feed database: a set (schema3) of relations between feeds, feed items and user identifiers; and data organized under this schema.
  • notification: a message containing, minimally, i. the URI of an entry; ii. the title of the associated feed; iii. the entry title4.
  • message queue: a set of pending notifications.

b: user inputs:

incoming messages from users, more particularly prefixed commands, as described in the manual.

  • subscribe: receives one parameter, a feed identifier; used to subscribe a user to a feed5, i.e. to add a unique user identifier-feed association to the feed database.
  • unsubscribe: receives one parameter, a feed identifier; used to remove a user's subscription to a feed.
  • list: used to display the user's current6 set of subscriptions.

c: feed inputs:

RSS7 files pointed to by feed identifiers. Feed inputs must contain, minimally, the following:

  • channel elements8: i. title; ii. link.
  • item elements: i. guid; ii. title; iii. link.

Furthermore, RSS channels may not contain textInput (or any other input) elements.

d: functions:

any feed bot must minimally:

  • interact with users, as described in section c.
  • for each feed, maintain a local set of associated feed items.
  • periodically, check the feed for updates, i.e.:
    1. download the feed;
    2. update feed metadata, e.g. its title;
    3. identify and add the set of feed items that are new9, i.e. previously missing from the feed;
    4. remove feed items that are no longer present in the fresh copy; and
    5. for each user identifier associated to the feed and each new feed item, add a pending notification to the message queue.
  • periodically, notify users, i.e.:
    1. pop a notification from the message queue; and
    2. send the notification to the associated user identifier10.

This specification is currently a draft. Questions, comments and revision proposals are thus more than welcome.


  1. In plural because the current document does not refer to a particular implementation, despite the fact that there is one I'm particularly interested in. 

  2. In their most naive implementation, URI objects are simply strings with some particular properties, e.g. beginning with http://. Note that while this is probably the simplest implementation, it's not necessarily the correct one, e.g. http://abc/feed and http://abc/feed/ usually point to the same resource, but they are not the same string. 

  3. This schema is left unspecced on purpose, since it might look different based on the data model used, e.g. SQL versus S-expressions. In particular re. Feedbot, there is already some discussion as to the design choices. 

  4. See for example

  5. This makes the user identifier an implicit second parameter then. 

  6. Yes, this makes time an implicit parameter to this function, among others. 

  7. The current Feedbot implementation also supports Atom. Perhaps a separate document could discuss in more detail the differences between the two syndication formats, since from a superficial look it would seem Atom feeds don't come with the same retardation RSS feeds do. 

  8. In RSS terminology, a "channel" is equivalent to a feed containing, among others, a set of "items", analogous to the "feed item" described in section a. 

  9. This is currently a central point of debate and discussion, and what led to this first spec proposal. Note also that it's the simplest way to specify (although not necessarily the simplest to implement) feed item management, which is why it was chosen for the current Feedbot implementation.

    The problem with this approach is that it's frail under structural feed modifications, e.g. changing the number of items displayed by a feed will lead to spurious notifications -- and who knows what other monsters lurk in there, really. Sure, structural feed modifications are rare, which is why it's not that unreasonable to believe they'd rather be solved on the (feed or feed bot) operator side. However, these corner cases are bound to wreak some havoc when they occur, which is why it's worth considering alternatives; here are a few of the known ones:

    • Mandating the use of "publication date" fields: given the issue with timezone arbitrariness and system time misconfigurations, this solution seems to actually move the problem in the neighbour's court rather than solving it. Mandating a Bitcoin block count RSS field may be nice, only this ties both feed bots and feed operators to Bitcoin nodes, which currently lies very much in the realm of ideals. For example Feedbot runs on shit hardware, for the very good reason that it's inexpensive both to keep running and to redeploy on need.
    • Mandating another monotonously-increasing feed item field.
    • Mandating feed item ordering: this currently seems like the simplest approach, given that known existing feed-generating software (read: MP-WP) is already compliant. Feed bots would then look only at the first item(s) in the list in order to determine whether there's any news.

    Readers are encouraged to propose other ways to skin this cat. 

  10. In particular as far as IRC is concerned, this introduces a degree of complexity. When a user disconnects from some server on the network, its nickname also magically goes away, regardless of whether it's registered or not. Thus it is possible to miss notifications by sending them to "non-existing" users; note that, fortunately, this isn't a problem for channels.

    The solution comes in the form of the ISON command. The current Feedbot implementation delivers private notifications in an asynchronous manner, by sending an ISON, then waiting for a RPL_ISON, which responds with a list of online nicks. Notifications are then popped from those nicks' message queues and sent via PRIVMSG.

    Note that this implementation is not correct, nor is a correct implementation possible using IRC as a basis. That is, there is an unavoidable race condition between RPL_ISONs and user QUITs, which in principle can lead to failure to deliver notifications. Hopefully this issue will be fixed once the fabled gossipd manages to replace the extant crap. 

Filed under: computing.
RSS 2.0 feed. Comment. Send trackback.

Leave a Reply