Notes on Hunchentoot architecture

096 July 20, 2019 -- (tech tmsr)

This post is part of a series on Common Lisp WWWism. Before continuing, I heartily recommend a review of prior communications on the subject.

As previously mentioned, Hunchentoot is quite big, that is, circa 6000-7000 LoC. However, despite its weight, our CL web server of choice is not just an amorphous pile of code dropped from the thought-prepuce of its original author; for one, it carries along with it a well-written documentation page which describes each of the pieces; and for the other, judging by the technical documentation, we can at least hope that the code is to some degree written by a sane mind.

Thusly proceeding from these two artifacts (documentation and code), the first step towards producing a genesis of Hunchentoot will be to write down my own notes describing its organization -- I will point at some coad, but I'm not diving into the functional details just yet; instead, I am looking at the abstractions and the interaction between components.

First, some HTTP basics: let's say we have an ideal item H representing a HTTP server. Our server H is a program that serves pages to us; more concretely, it a. binds to an address A; b. waits for incoming requests Rq on A; c. processes each Rq; and d. responds with a reply Rp for each Rq.

This model is all nice and dandy, isn't it? Except it doesn't lead us anywhere by itself. The problem is that HTTP imports the notion of "connection", and implicitly TCP, into its spec. This makes it very inconvenient for us, because in our new model H: a. binds to a (TCP) port P; b. listens on P for incoming connections C; then c. each C needs to be confirmed on H's end, i.e. it must be accepted by the server; then d. for each C, H waits for one or more incoming Rqs; then it e. processes each Rq; and finally, it f. responds to each Rq with a Rp1. In addition to the increased complexity of this model, one other problem is that the server has no way to tell when the requesting entity has finished sending Rqs, so who's going to end C then? And either way, H will now find itself in the position of having to manage Cs, which have nothing to do whatsoever with the notion of a HTTP request.

Now that we have the basics in place, let's take a look at the abstractions exposed by our particular H, which for historical reasons we've decided to christen Hunchentoot.

In Hunchentoot, the entity that makes a HTTP server (bound to a port, etc.) come to life is called an acceptor. Acceptors encapsulate the port, IP address, listening socket, etc. plus some state and some basic server configuration data, such as the document root for serving static files and paths to logfiles -- in other words, all the data needed to perform at least (a), (b) and (c) above. Moreover, the user can extend acceptor functionality to define custom handlers for URLs, as illustrated by the easy-acceptor subclass.

However, acceptors don't have any say in when connections (the Cs above) and requests (the Rqs above) are handled, i.e. how tasks are distributed among workers, and if there are any dedicated worker threads at all. Work management is done through the taskmaster abstraction. A very broad sketch of how this works: after listening (i.e. (b)) is complete, the acceptor calls the taskmaster via execute-acceptor, in order to establish on what thread are connections accepted (i.e. (c)) and where and when requests are handled (i.e. (d) to (f)). When the taskmaster is ready for (c), it calls the acceptor's accept-connections, which performs the accept and gives back control to the taskmaster (handle-incoming-connection), which at some point calls back into the acceptor (process-connection) to let it perform (d), (e) and (f).

The keen reader will by now wonder what's the point to all this dancing around between taskmasters and acceptors. For one, each acceptor has a taskmaster and the other way around; for another, all this "execute, then accept, then handle, then process" seems arbitrarily assigned to either the acceptor or the taskmaster, so really, what the fuck?

The main reasoning behind this acceptor-taskmaster separation is the following: acceptors do useful work, which is mainly accepting connections and handling the requests sent via the former; meanwhile, taskmasters are hooked immediately before this useful work occurs, so that they obey a decision made apriori by the user whether said work will be scheduled on a new thread or performed on the same one. In other words, we're given flexibility at the cost of extra lines of code. Given my lack of direct experience with Hunchentoot, I'm not sure yet whether this cost is worth it or not, but if it proves to be more trouble than it's worth, I will personally carve the thing out.

Moving on to other abstractions, the next on the list are request and reply objects. These, as the name suggests, encapsulate HTTP request/reply data, such as the URL, headers, cookies, return codes and so on. To continue on the previous thread: once the acceptor starts processing connections (i.e. (d)), it will create request objects and process each of them -- process-request will call handle-request (i.e. (e)), which will call acceptor-dispatch-request, which can be customized by the user via defmethod for the job of processing requests and, finally, step (f).

I will gloss over session objects for the moment, as they are less relevant to the overall architecture. It's sufficient to say that they serve as an abstraction for "stateful shit over this stateless protocol", which is something I'd be happy to see die a gruesome death.

I was going to make a diagram and show some examples of Hunchentoot at work, but I am well over the one thousand word limit, so I will stop this episode here. We can now put the next couple of weeks in perspective, though:

  1. Quick likbez on how Unix and TCP make the whole thing work:

    First off, readers will notice that the "A" in the simplified model turns into a "P" in the second one, and for a good reason: while the IP protocol specifies addresses for hosts, transport layer protocols (e.g. TCP and UDP) usually specify addresses for applications running on a given host, and that's precisely what our P is: an address used by a client (say, a web browser) to identify a server application (e.g. a web server) running on some host. And it's not only the server that binds to a port, the client also gets one, only this client-side port allocation is usually performed by the operating system.

    Second, from the Unix side, both the server's (passive) connection and each individual connection with a client get an object called a socket. That is, in (a), the operating system binds a socket S owned by H to P, i.e. it keeps a note somewhere that subsequent connections to P will be assigned to S and that H may accept said connections. In (b), H signals its availability to receive connections by performing the listen system call; this puts S into the "listening" state, as specified in RFC 793.

    At this point, whenever a TCP client wants to initiate a connection, it needs to go through the SYN-SYN+ACK-ACK three-way handshake hula hoop; so the client sends his SYN, and then in order for communication to move forward, the server must send his SYN+ACK, which only happens in the accept phase, i.e. (c). Then the accept function returns a new socket S' for the new connection, and only then can actual HTTP communication start.

    I won't even go into the details of why this is retarded, it's been beaten to death in the logs. Either way, there's no way around this pile of shit for computers talking to the heathen WWW... actually, if there is one, I'd very much like to hear about it.

  2. I remember reading the same words somewhere else, and I even know where. I'm not the first, nor even the second in line to look at large open sores coads, you see.