A feed parser for Common Lisp programs

087 March 6, 2019 -- (tech tmsr)

This is the second part of a series on building blocks for Feedbot.

Now that we have an XML parser on hand, we can use it to obtain RSS and Atom feeds in a structured format. Once again, as far as heathendom goes we are lucky -- one Kyle Isom has provided us with a feed parser, cl-feedparse, that is under three hundred lines of neat Lisp code, and quite importantly, it meets the spec, as can be seen below.

The code is available as a patch on the S-XML V tree.

The remainder of this post a. describes the structure and functionality of feedparse; and b. it provides some usage examples.

(cl-)feedparse provides the following simple structure for (RSS/Atom) feeds. A feed can be decomposed into: its title, kind (RSS or Atom), URL and list of feed items. A feed item has the following elements: an ID, a title, a (publication) date, a link to the actual item and a body/description.

In order to obtain a feed, feedparse:

To this, I have added bolt-on functionality (http-request-with-timeout)1 which performs the HTTP request on a separate thread, so that, for all the obvious reasons, the operation can be aborted after a user-configured timeout period expires.

The code, in all its depth, is very easy to understand; the reader is encouraged to peruse the linked V items, in particular s-xml-feedparse.vpatch, and in particular feedparse.lisp.

Now, for the usage bit. After we've pressed to s-xml-feedparse:

$ vk.pl p s-xml s-xml-feedparse.vpatch
$ cd s-xml

and imported all its dependencies2, we can now run our parser:

... fire up CL-tron, load dependencies, feedparse, e.g.
> (asdf:load-system :feedparse)
> (defvar *ttp-feed*
           (feedparse:parse-feed "http://thetarpit.org/rss.xml"))
> (feedparse:feed-title *ttp-feed*)
"The Tar Pit
> (defvar *ttp-latest*
           (car (feedparse:feed-items *ttp-feed*)))
> (feedparse:item-title *ttp-latest*)
"A feed parser for Common Lisp programs
> (feedparse:item-link *ttp-latest*)

In the next episode of this series, we will use feedparse to write a program that automatically checks for new feeds and populates a user-defined feed database.

  1. In a normal world, where everything is genesized, properly seated in its own place and performing its own duties, this function would be part of the Drakma curl library. As things stand, however, I've just implemented it directly in feedparse -- this will have to eventually be addressed, so that feed parsing code will lie with the feed parser, HTTP request code with the curl code and so on and so forth. See the following footnote for a few gory details.

  2. Unfortunately, feedparse depends on Lisp code which is not yet V-ified, namely Drakma and flexi-streams, which on their own depend on other packages. The full set of dependencies is: usocket(1), chipz(2), flexi-streams(3), trivial-gray-streams(4), chunga(5), cl-base64(6), cl-puri(7) and drakma(8):

    1. A so-called "portability layer" for TCP and UDP sockets, over various OS and CL implementations. Required by Drakma.
    2. A gzip library for Common Lisp. Required by Drakma, because apparently you can't have HTTP without compression nowadays.
    3. The implementation of a binary "stream" data structure. Required by both Drakma and cl-feedparse, because HTTP and arbitrary binary data.
    4. A so-called "thin compatibility layer" for gray streams. Required by flexi-streams, because compatibility layer on top of compatibility layer.
    5. "Chunked streams". Streams on top of streams on top of streams -- it's streams all the way down! Required by Drakma, because... well, some HTTP fuckery or another, I won't bother the reader with details.
    6. Base64 implementation for Common Lisp.
    7. URL parser for Common Lisp.
    8. Curl implementation for Common Lisp. Full of usefuls, but bloated with crap such as SSLisms, which, by the by, can be "disabled" (see how SSL is -- fortunately! -- not part of this dependency list). All that mess is still there though.

    All this just to grab a piece of XML serialized into text, transform it into an S-expression and further parse that S-expression into the structure described in this article. Now, far be it from me to debate the usefulness of all this stuff, but remember, whenever you run, e.g.:

    > (ql:quickload :feedparse)

    you are importing it, whether you see it or not and whether you like it or not. All this burden, you can't just wave it away.