Hunchentoot: further architectural notes; and usage examples

097 July 26, 2019 -- (tech tmsr)

This post is part of a series on Common Lisp WWWism, more specifically a continuation of ongoing work to understand the web server known as Hunchentoot and, as a result, produce a signed genesis to be used by members of the TMSR WoT.

In this episode we'll do another illustration of the Hunchentoot architecture; and we'll have fun documenting a running web server instance, thusly exploring for now a few of things that it can do1.

First, the architectural diagram, with bells, whistles and clickable stuff:

No, really, I'm not kidding: if you have a browser that implements SVG, clicking on the text should direct you to defgeneric pieces of code2. Anyway, the big squares are Hunchentoot components and, more specifically, the name of their CL classes, while the small squares found inside the big ones represent methods specializing on a given class. The green boxes are user actionable or defineable methods, so this is where you should start looking; while the arrows denote the "X calls Y" relation, with the exception of the dashed green arrow, that tells us that header-out is in fact a setf-able accessor used from somewhere within the context of acceptor-dispatch-request (e.g. from a request handler) to read and modify the header of a reply.

Now from this airplane view, Hunchentoot's organization looks quite digestible, which should give us a very good idea of how to start using it. So let's take a look at that, shall we?

Assuming we've loaded3 Hunchentoot into our CLtron of choice, we can now create an acceptor instance and start it:

> (defvar *myaccept*
    (make-instance 'hunchentoot:acceptor :port 8052))
> (hunchentoot:start *myaccept*)

... and now what? Say, for now, that we want to serve a static site -- I'm using The Tar Pit as my playground, but you can use whatever you fancy. Looking at acceptor-dispatch-request, we notice that it calls handle-static-file with the document-root as an argument. So let's set that, and additionally the error template directory, to our site:

(setf (hunchentoot:acceptor-document-root *myaccept*)
      (hunchentoot:acceptor-error-template-directory *myaccept*)

and now curl http://localhost:8052 should serve its contents.

But let's say we want to go one step further and serve some content (server-side) dynamically. The original Hunchentoot documentation actually provides a neat minimal example, which I'm going to steal, but not before explaining what we're going to do here.

Besides serving files off the disk, a web server can do other useful stuff, such as, in Apache's case, sending the file to a preprocessing engine (PHP or whatever), or as we're going to show, executing some other predefined action that depends on the request parameters (URL, cookies, HTTP method, variables and so on). For now, let's say that we want our server to respond to the URL "/yo" (where "/" is the site root) with the plain-text message "Hey!". Furthermore, let's say that we want to optionally parameterize requests to this URL by the variable "name", in which case the response will include the name: for example, if we do a GET request to "/yo?name=spyked", we want the server to respond with "Hey, spyked!".

We have a few possible ways of doing this. We could for example edit the current implementation of acceptor-dispatch-request, which is also the ugliest possible approach. On the other hand, Hunchentoot is built using Common Lisp's Object System mechanism (CLOS), which allows us to subclass the acceptor to a user-defined class and specialize the method above for our class. Let's try this out:

(defclass myacceptor (hunchentoot:acceptor)
(change-class *myaccept* 'myacceptor)

The change-class thing isn't something that we'd normally do, but if you've been following along, you'll notice that this didn't break our code, because well, Common Lisp is cool. Now for the dispatcher method:

(defmethod hunchentoot:acceptor-dispatch-request
    ((acceptor myacceptor) request)
    ((string= (hunchentoot:script-name request) "/yo")
     (let ((name (cdr (assoc "name" (hunchentoot:get-parameters request)
                             :test #'string=))))
       (setf (hunchentoot:content-type*) "text/plain")
       (format nil "Hey~@[, ~A~]!" name)))
    (t (call-next-method))))

In human words: this is an implementation of acceptor-dispatch-request specialized on myacceptor, that, upon encountering the URL (script-name) "/yo", takes the value of the GET parameter known as "name" and returns a response string (possibly containing this "name") as plain text. Otherwise it transfers control to the "next most specific method"4, implicitly passing to it the existing arguments.

We could stop here, but we won't, as there's a short discussion to be had, mainly related to the extensibility of our approach, i.e. what happens when we add other custom URLs to this recipe? The naive result will look ugly and will be a pain to maintain and debug; while the more elaborate approach, involving putting every "/yo" into its own function, will initially fill our implementation with cond/case conditions, eventually leading to a more civilized dispatch mechanism, in the form of a lookup table from URLs to handler functions.

Well, it so happens that Hunchentoot already has an implementation for this type of thing, going under the name of easy-acceptor. easy-acceptor defines a dispatch table whose only dispatcher is (initially) the dispatch-easy-handlers function, which looks up handlers for URLs in a global handler list, *easy-handler-alist*. As things usually go with these domain-specific languages, most of the handler maintenance work is piled up in the define-easy-handler macro.

So, in order to illustrate this easy-stuff, first let's undo some of our previous work and redo the very basics:

(hunchentoot:stop *myaccept*)
(setq *myaccept* (make-instance 'hunchentoot:easy-acceptor
                                :port 8052
                                :document-root "/home/spyked/thetarpit/site/"
(hunchentoot:start *myaccept*)

Notice how now we're instancing easy-acceptor. Now we can define an equivalent "easy handler" for our previous "/yo" work:

(hunchentoot:define-easy-handler (say-yo :uri "/yo") (name)
  (setf (hunchentoot:content-type*) "text/plain")
  (format nil "Hey~@[, ~A~]!" name))

which about sums up our exercise. Initially I had wanted to show an example doing some fancy prefix/"smart" URL lookup à la MP-WP, but by now this post is so large5 that it can't be eaten in one sitting. Alas, I will have to leave all my fancy examples for another episode. Thus, until next time...

Update, July 27: comments on this post are listed in footnote 6.

  1. Contrary to popular beliefs and expectations, the things that some particular X can do that are known to (some particular) me are not to be confused with the total set of things that said X can possibly do, nor with the set of things that it can't do. Take for example X = your average pointeristic slash buffer-overflowistic C barfola: you can identify some particular uses for it, sure, but meanwhile the average douchebag will exercise code paths that may make it usable for things you've never imagined, such as stealing your keys, wiping your disk and murdering your dog... and many other things, short of making you some french fries, which is something that e.g. a web server can't do.

    In other words, nobody gives a fuck about popular beliefs and expectations; and by the time I publish a signed genesis for this Hunchentoot thing -- good, bad, with or without warts or however we have it -- I will be entirely able to say what it does and doesn't do, which is exactly what I'm working on here and now.

    And now to be an asshole and leave this otherwise properly rounded footnote hanging: what about, say, usocket? and then what about SBCL or some other working CLtron? and what about Linux and its userland? This unfortunately is the curse of our postmodern times: our ability to run computing machines rests, for the time being, upon the promise of some shitheads.

  2. I don't write HTML and CSS for a living, so I might as well use this footnote to document the pain required to generate this, for later reference.

    Specifying the diagram in GraphViz is fairly straightforward: one simply has to list the clusters, the nodes and the edges within them in a text file -- see for example the final .dot file used for generating the illustration above. Adding links and colours and all that is also easy, as previously shown. The problem, however, with this GraphViz thing is that graph generation involves an automated step, i.e. node layout generation and edge routing, that can easily prove to be a pain in the ass for the user: not only do I want this diagram generated, but I also want it to be arranged like so, and not like that, because I want the viewer to be able to look at the components of the graph in some particular order.

    To add insult to injury, this automated step is almost entirely opaque to the user: in order to have that square near that one, I need to frantically shuffle nodes and edges about until I find the magic ordering that generates something close to what I want -- that is, the relationship between said ordering and the output is purely coincidental, and I'm stuck guessing based on the vague hints found in the spec. Anyway, this is the best diagram layout we've got here at The Tar Pit, sorry... do make sure to write in if I'm in the wrong.

    Now that I have a representation, I need to embed it in the blog post. One would expect that's also straightforward, wouldn't he? Well, no! You see, I got the idea that placing clickable links in generated SVG files is cool, only this doesn't work in the slightest when inserting the "<img>" tag, because completely counter-intuitively for a SVG, the browser displays an image, not a DOM sub-tree. So then I look at how Phf did it with his patch viewer, and it looks like he's inserting a HTML image-map in the HTML document, which kinda beats the purpose of having links in the SVG in the first place. I really, really don't want to copy-paste the whole diagram into the post, so what the fuck am I gonna do, use <object> tags?!

    So if by now you were curious enough to look at the page source, you'll notice that what I did was to insert an inline <svg> that then imports the content of my .svg file using the <use> tag, which works exactly the way I want it. And no, you won't find this anywhere on Google either, because Google doesn't fucking work.

    To sum this up: IMHO the result looks pretty cool, with the mention that I'm most likely going to write the SVG diagram "by hand" next time I'm doing anything non-trivial. At least then no magic tool will lie to me that it saves hours of my work, when it instead adds to it.

  3. Since I'm trying out the practice of documenting things, let's also put this here; although now that I think about it, I'm pretty sure I've dumped this somewhere else before.

    The preferred method of loading large programs among "Common Lisp enthusiasts" is Quicklisp, which is a sort of apt-get for CL, with centralized repositories and all that jazz. I've never used it, incidentally; and it's not that I'm denying its quickness or usefulness, but that process of automatically fetching dependencies from some arbitrary site obscures my understanding of the programs that I'm running and their real mass. Instead, I prefer going through the laborious job of writing down the entire dependency tree, then grabbing a copy of each dependency from the author's site*, putting them all in a directory and defining the path to that in my CLtron instance. Here's how this looks for Hunchentoot:

    (defvar *ext-dep-base* "/home/spyked/lisp-stolen/")
    (defvar *ext-deps* '("chunga/" "trivial-gray-streams/" "cl-base64/"
                         "cl-fad/" "bordeaux-threads/" "alexandria/"
                         "cl-ppcre/" "flexi-streams/" "md5/" "rfc2388/"
                         "trivial-backtrace/" "usocket/"))

    then I'll define a variable holding the path to my work-in-progress Hunchentoot code base:

    (defvar *hunchentoot-path* "/home/spyked/tmsr/hunchentoot/b/hunchentoot/")

    then I'm making sure I get rid of some useless dependencies, e.g. SSL:

    (pushnew :drakma-no-ssl *features*)
    (pushnew :hunchentoot-no-ssl *features*)

    and now I have to instruct ASDF to look for "systems", i.e. Common Lisp programs, in each of the directories in the paths above. Apparently we're not quite at the point where we can get rid of this particular piece, so:

    (loop for path in *ext-deps* do
         (pushnew (concatenate 'string *ext-dep-base* path)
                   :test #'string=))
    (pushnew *hunchentoot-path* asdf:*central-registry* :test #'string=)

    Oh, and by the way:

    > (length *ext-deps*)

    which are all the dependencies needed to run Hunchentoot given a Linux-and-SBCL installation. At this point we can tell ASDF to load our Hunchentoot:

    (asdf:load-system :hunchentoot)

    And after a second or so, we should be all prepped and ready to start our web server.

    *: Not that this makes much of a difference, mind you. By now I already have most dependencies commonly found in CL programs on the disk, so I'm e.g. using whatever version of usocket that I got whenever I got it from wherever. So as per the end of the first footnote: since I'm already using that shit although I haven't actually read the code, why haven't I published it already? The man makes a good point, I am using it. So how do I address the gray area of "I've been using this piece of code for a while because my program requires it, but I don't trust it enough to sign it just yet"?

  4. My CLOS-fu is somewhat lacking, but this "next most specific method" refers in principle to the method implementation of what other languages call "the direct superclass", i.e. in our case the acceptor class. This means that if our call to "/yo" doesn't match, the server will fall back to the default mechanism of serving static files from the document root.

  5. Around 2300 words to be more precise, current footnote excluded; of which the post body weighs a bit over one thousand, while the footnotes contain a bit over one thousand and two hundred. And look, footnotes 1 and 2, which grew organically out of elaboration and documentation requirements, could have been posted on their own, as a separate article each.

    On the other hand this is why I call them notes, so that I don't spend more time moving stuff around than I do writing. The reader will just have to live with my peculiar way of organizing thoughts.

  6. Comments so far:

    Comment #1: Mircea Popescu writes (and answers go inline):

    I have two fundamental objections : why is the class called "myacceptor",

    The name was chosen on a whim, merely for the sake of illustrating the acceptor mechanism at work. One also wouldn't usually name things like this, but I knew beforehand that I'm going to throw this example away when moving to the easy-acceptor thing.

    and why are you showing "thing isn't something that we'd normally do" ?

    Common Lisp boasts this ability to patch systems with zero downtime, so I thought it fun to show how to extend the web server instance while it's running. The downside to this is that (at least in my experience) it can easily get messy with the number of variables in the system, and one has to really know what they're doing, lest they nuke their Lisp instance.

    In any case, I don't think it hurts to at least know about the existence of this approach, if only for putting on paper this other weird thing that CL can do.

    I also have my reservations about a "define-easy-acceptor" macro etc, but we'll leave that for later. I'm very curious to see current MP-WP url schema interpretation in lisp tho.

    Interestingly enough (and to give a hint of what's next), the URL interpretation item doesn't use this define-easy-acceptor, although it is based on the same dispatch mechanism. The server requires some means of attaching request handlers to particular URLs, whether it's through some fancy macro or otherwise.

    Very nice schema thing tho!

    Thank you!

    Re "I've been using this piece of code for a while because my program requires it, but I don't trust it enough to sign it just yet" : for the default example, make a S-spyked-UBWS nick with its own key (standing for "Shit Spyked Uses But Won't Sign") and... sign it with that key. And rate it, from spyked, "-5 this asshole signs shitty code I use" or w/e. Then it's fucking clear, very specifically and most precisely WoT-wise : everyone's doing exactly their job, you're negrating the asshole, A+++ spyked would trade again ; and the asshole's being an asshole, A+++ asshole, would trade again also.

    In other words, there's no papering over the tulpas. You got multiple personality disorder when you're using and disawoving, your wot tree is stuck reflecting that. Can't just pretend it away. Besides, writing it down helps immensely in dealing with it, as any psychiatrist + lobbes can attest.

    This sounds all right, but let's take a more practical example: say I genesize Hunchentoot, signing it using Spyked's key, and Phf wants to give it a spin. However, when he attempts to run it as per the instructions in the genesis, he finds out that it doesn't play well with one of the 12 (ungenesized) pieces of coad it depends on. Then he complains and I naturally offer to give him my own version of shitty-yet-unread-code.

    The question then is, why would Phf trust S-spyked-UBWS' coad over a tarball, a ksum and a signature of the hash, denoting "that's the same piece of shit I've been using, although it comes with no guarantees"? I haven't read the entire Gutenberg archive either, and yet I did publish it that way.

    Which brings us back to the "what about SBCL, Linux and all those other programs we're using" problem mentioned in the post. If signing genesis with an alt-key is the correct approach, then we should be doing this like yesterday or so.

    PS. Holy shit, PPerlCRE ?!

    Aha, "Portable Perl-Compatible Regular Expressions"! I suspect it's used by Hunchentoot to implement its very own mod_rewrite.

    Comment #2: Mircea Popescu writes:

    mp_en_viaje: spyked, my two "objections", as i'm sure you noticed, tend towards a broad "omfg, bad habits" sorta something.

    mp_en_viaje: as far as your practical example : do not genesis parts ; include the 12 things in yoru genersis if they are needed.

    mp_en_viaje: and yes we should be doing this like yesterday or so. we should have been doing it ike yesterday or so, hence eg or etc
    a111: Logged on 2018-11-01 17:49 mircea_popescu: asciilifeform well, maybe your thread. my thread was re "are we fucking idiots ?! we have a foundation that wants to publish statements of the nothing as its only output, we have a bunch of smart people not helping our own industry avoid pitfals, and in this vein forever"
    a111: Logged on 2018-09-29 23:35 mircea_popescu: but as far as the foundation is concerned -- if all it does (ALL IT DOES!!!) is stand up to tell me "oh, we can't follow the keccak because reasons" ima put an end to it in short order.

    mp_en_viaje: but no, the querstion is not nor ever could be "why does lord prefer republican process over non-republican process", because of the very definitions of terms. if he doesn't prefer republican process he isn't a lord, irrespective what any text anywhere might be misleadingly stating ; and in any case any text anywhere will be brought into allignment with this theory sooner or later.