Adding a new socket option to SBCL; or, Common Lisp is the death of me

August 30, 2020 by Lucian Mogosanu

After leaving my previous writing on a dire note, I've decided to try out rolling some cleaner shit for future reference; or at the very least I've been attempting to make some baby steps towards that. Here are the results.

Perhaps you're wondering: given the sad state of "IRC libraries", how hard could it be to write some code in pure SBCL, that connects to a server and generates and parses some text-based messages? For example, Stan wrote his thing in Python in a week or so; heck, even C or Ada shouldn't take too long, despite all the string handling troubles they cast upon the programmer. So Common Lisp should be a breeze then, no?

Well, no.

SBCL 1.4.5, the one I based Hunchentoot on, is, I will admit, organized pretty neatly. The documentation lies in doc, for example, while the core code can be found in src. Moreover, there's a contrib folder, containing all sorts of useful code; some of that code implements an interface for e.g. POSIX functionality; while another module provides a means to manipulate sockets. And since I'm attempting to achieve the great feat of implementing my own IRC client, I'm interested in this latter piece first and foremost.

Unfortunately, as far as "interfacing with POSIX and sockets" goes, this is where the neat part ends. On one hand, Common Lisp implementations in general, and SBCL in particular interfaces with the outside world through a so-called "foreign interface", which allows the user to map data representation from outside, i.e. "aliens", to internal Lisp structures; on the other, it provides an interface to a language where any cast to a void pointer is a legit operation. So in order to support some of the more peculiar system calls, such as the {get,set}sockopt under question here, developers have to have it provide separate glue for all possible parameter types. This is absolutely insane, so of course nobody's done it exhaustively -- which is fine, except a. some of the common options in the manpage, e.g. SO_RCVTIMEO, are notably missing; and b. there's very little documentation on how to add new setsockopt options to SBCL, most of it being scattered throughout the code.

Fortunately for the reader, I have conquered this particular coad; so here I am, ready to save you the trouble of having to go through this yourself -- more than twenty hours' worth of research, to be more precise, not counting the time spent writing the article.

According to the manual, SBCL's gateway to the world of Unix socket options is implemented in the sb-bsd-sockets contrib; more precisely, sockopt.lisp provides the following macro:

(defmacro define-socket-option
    (lisp-name documentation
     level number buffer-type mangle-arg mangle-return mangle-setf-buffer
     &optional features info)
  (let ((find-level
         (if (numberp (eval level))
             level
             `(get-protocol-by-name ,(string-downcase (symbol-name level)))))
        (supportedp (or (null features) (sb-int:featurep features))))
    `(progn
       (export ',lisp-name)
       (defun ,lisp-name (socket)
         ,@(when documentation (list (concatenate 'string documentation " " info)))
         ,@(if supportedp
               `((sb-alien:with-alien ((size sb-alien:int)
                                       (buffer ,buffer-type))
                   (setf size (sb-alien:alien-size ,buffer-type :bytes))
                   (socket-error-case
                       ("getsockopt"
                        (sockint::getsockopt (socket-file-descriptor socket)
                                             ,find-level ,number
                                             (sb-alien:addr buffer)
                                             #+win32 size
                                             #-win32 (sb-alien:addr size)))
                       (,mangle-return buffer size))))
               `((declare (ignore socket))
                 (unsupported-socket-option ',lisp-name))))
       (defun (setf ,lisp-name) (new-value socket)
         ,@(if supportedp
               `((sb-alien:with-alien ((buffer ,buffer-type))
                   (setf buffer ,(if mangle-arg
                                     `(,mangle-arg new-value)
                                     `new-value))
                   (socket-error-case
                       ("setsockopt"
                        (sockint::setsockopt
                         (socket-file-descriptor socket)
                         ,find-level ,number
                         (,mangle-setf-buffer buffer)
                         ,(if (eql buffer-type 'sb-alien:c-string)
                              `(length new-value)
                              `(sb-alien:alien-size ,buffer-type :bytes))))))
                 new-value)
               `((declare (ignore new-value socket))
                 (unsupported-socket-option ',lisp-name)))))))

Long story short, define-socket-option defines a function whose name is conveniently given by the user -- e.g. sockopt-receive-buffer, or the missing sockopt-tcp-quickack -- which calls getsockopt on level and number and receives a parameter of the type buffer-type, which is further processed by mangle-return. It also defines the co-function, a setf whose effect is a setsockopt on the same, only with different parameter processing functions. I'll let the reader examine the macro in all its details.

Assume we want to extend the array of {set,get}sockopt calls with our own. And for the sake of easing our journey into this codepile, let's take an easy one: the aforementioned TCP_QUICKACK.

I'm guessing that the SBCL developers tried to be helpful towards naïve students such as yours truly, so they provided a few wrappers for define-socket-option for the more commonly-used types. For integers:

(defun foreign-int-to-integer (buffer size)
  (assert (= size (sb-alien:alien-size sb-alien:int :bytes)))
  buffer)

(defmacro define-socket-option-int (name level number &optional features (info ""))
  `(define-socket-option ,name nil ,level ,number
     sb-alien:int nil foreign-int-to-integer sb-alien:addr ,features ,info))

and for booleans:

(defun foreign-int-to-bool (x size)
  (if (zerop (foreign-int-to-integer x size))
      nil
      t))

(defun bool-to-foreign-int (val)
  (if val 1 0))

(defmacro define-socket-option-bool (name level c-name &optional features (info ""))
  `(define-socket-option ,name
    ,(format nil "~@<Return the value of the ~A socket option for SOCKET. ~
                 This can also be updated with SETF.~:@>"
             (symbol-name c-name))
    ,level ,c-name
    sb-alien:int bool-to-foreign-int foreign-int-to-bool sb-alien:addr
    ,features ,info))

which are ugly on their own, but allow us to do something simple such as:

(define-socket-option-bool sockopt-tcp-quickack :tcp sockint::tcp-quickack)

This change alone won't get our code to compile, though. The compiler doesn't know who sockint::tcp-quickack is, and there's no obvious way to add it. Fortunately, we have e.g. sockint::tcp-nodelay1 to serve us as reference, and grepping for it in contrib/sb-bsd-sockets will reveal to us, among others, a file called constants.lisp. The file begins with the following comment:

;;; This isn't really lisp, but it's definitely a source file.  we
;;; name it thus to avoid having to mess with the clc lpn translations

I'm mostly illiterate when it comes to SBCL internals, so I have no idea what a clc or a lpn is, but what I do understand from my attempt to comprehend the code there is that there's some artifice at work that allows us to define mappings between C and CL types, which will generate all the glue code behind the scenes and will allow us to address the foreign type as a Lisp type. Neat, huh?

So since we have, say:

 (:integer tcp-nodelay "TCP_NODELAY")

for the same money we can add:

 (:integer tcp-quickack "TCP_QUICKACK")

to the same place, which will then be exposed as sockint::tcp-quickack to sockopt.lisp. And this alone is all that's needed to import the TCP_QUICKACK socket option into the SBCL world.

Now comes the hard part, though. Say we want to get or pass a structure (or rather, a pointer to it) to the kernel in our sockopt call; say we want to implement the aforementioned SO_RCVTIMEO:

Sets the timeout value that specifies the maximum amount of time an input function waits until it completes. It accepts a timeval structure with the number of seconds and microseconds specifying the limit on how long to wait for an input operation to complete. If a receive operation has blocked for this much time without receiving additional data, it shall return with a partial count or errno set to [EAGAIN] or [EWOULDBLOCK] if no data is received. The default for this option is zero, which indicates that a receive operation shall not time out. This option takes a timeval structure. Note that not all implementations allow this option to be set.

where "a timeval structure" is defined in Linux as:

struct timeval {
    time_t      tv_sec;     /* seconds */
    suseconds_t tv_usec;    /* microseconds */
};

where time_t and suseconds_t are integers of certain sizes, I'll spare the reader the details. Say what we will, we absolutely need to redefine this structure in the domain-specific language in constants.lisp; in fact, it would be really weird if SBCL's CFFI didn't already contain an implementation for this, since it provides its CL wrappers over e.g. gettimeofday. As it happens, contrib/sb-posix has its own constants.lisp, which contains precisely what we need:

 #-win32
 (:structure alien-timeval
             ("struct timeval"
              (time-t sec "time_t" "tv_sec")
              (suseconds-t usec "suseconds_t" "tv_usec")))

where time-t and suseconds-t are defined, respectively, as:

 (:type time-t "time_t")
 ...
  #-win32
 (:type suseconds-t ; OAOOM warning: similar kludge in tools-for-build
        #+os-provides-suseconds-t "suseconds_t"
        #-os-provides-suseconds-t "long")

This doesn't look too pretty to me... if only the problem where confined to aesthetics. I found no way of accessing these structures from outside of sb-posix, so without thinking twice I copy-pasted the code above to the constants.lisp in contrib/sb-bsd-sockets. And well, this solves the first part of the problem for us.

The second part is nastier, but I'll admit that studying it was somewhat interesting, since it allowed me to understand how "alien types" work -- sort of, I'm still not touching the machinery behind that constants.lisp with a ten foot pole. In principle we need to provide a define-socket-option to our SO_RCVTIMEO, so let's review this:

(defmacro define-socket-option
    (lisp-name documentation
     level number buffer-type mangle-arg mangle-return mangle-setf-buffer
     &optional features info)
  ...
)

where:

  • lisp-name: pretty self-explanatorily, it's the name we'll use to reference the socket option in the sb-bsd-sockets namespace.
  • documentation: a docstring.
  • level: the level argument in e.g. setsockopt.
  • number: the optname argument in {set,get}sockopt.
  • buffer-type: a Lisp type, the (alien) type of data being passed (by either kernel or users) through the sockopt interface.
  • mangle-arg: a function taking one parameter of an arbitrary type t1 and returning a value of the type t2; used by setf to sweeten the deal for the user, so that we can e.g. pass an integer value to the setf and have it converted to struct timeval/alien-timeval before calling in to the kernel.
  • mangle-return: a function taking two parameters2: a foreign pointer to the buffer containing getsockopt data; and the size of said data; the other side of the deal sweetener, it's what we'd use to e.g. convert the returned alien-timeval value into an integer.
  • mangle-setf-buffer: a function taking as parameter the buffer type (t2), that must return a value of type (* buffer-type) (alien pointer to buffer-type), which will be then passed directly to the system call; used to perform additional processing on input data, its purpose will become clearer later.

This all sounds pretty complicated, so let's try to piece it together, resulting in a tentative sockopt-rcv-timeout:

(define-socket-option sockopt-rcv-timeout nil sockint::sol-socket
                      sockint::so-rcvtimeo sockint::timeval
                      ? ?? ???)

where sockint::sol-socket is already defined in constants.lisp:

 (:integer sol-socket "SOL_SOCKET")

and sockint::so-rcvtimeo and sockint::timeval we define as per above:

 (:integer so-rcvtimeo "SO_RCVTIMEO")

 ...

 ;; sb-posix is not accessible here, so we redefine timeval
 (:type time-t "time_t")

 #-win32
 (:type suseconds-t ; OAOOM warning: similar kludge in tools-for-build
        #+os-provides-suseconds-t "suseconds_t"
        #-os-provides-suseconds-t "long")

 #-win32
 (:structure timeval
             ("struct timeval"
              (time-t sec "time_t" "tv_sec")
              (suseconds-t usec "suseconds_t" "tv_usec")))

The question marks are there to denote confusion: what am I supposed to do with those functions? The least effort one could do would be to replace each of ?, ?? and ??? with the identity-1 function defined in sockopt.lisp, which would allow the user to call getsockopt:

> (sb-bsd-sockets:sockopt-rcv-timeout *socket*)
#<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP #X7FFFF5227FE0
 :TYPE (SB-ALIEN:STRUCT SB-BSD-SOCKETS-INTERNAL::TIMEVAL
         (SB-BSD-SOCKETS-INTERNAL::SEC
           (SB-ALIEN:SIGNED 64) :OFFSET 0)
         (SB-BSD-SOCKETS-INTERNAL::USEC
           (SB-ALIEN:SIGNED 64) :OFFSET 64)
         (SB-BSD-SOCKETS-INTERNAL::PADDING-0-16
           (ARRAY (SB-ALIEN:SIGNED 8) 0) :OFFSET 128))>

and I'll omit the setsockopt for now. Instead, let's recall that we'd like to be able to do something more along the lines of:

> (sb-bsd-sockets:sockopt-rcv-timeout *socket*)
0
> (setf (sb-bsd-sockets:sockopt-rcv-timeout *socket*) 42.5)
42.5
> (sb-bsd-sockets:sockopt-rcv-timeout *socket*)
42.5

To achieve this level of sweetness, first we must get some basic level of understanding of how alien types work. Section "9.2. Foreign Types" of the SBCL manual gives all the details, but we'll discuss here a few: as observed above, CL's "alien typing" mechanism allows us to give a Lisp representation to any C data structure. This sounds sexy enough on the surface, except now SBCL also needs to implement a bunch of operations specific to those types, e.g. pointer dereferencing, or more depressingly for the Lisp programmer, variable allocation semantics. Not sure where one may begin here other than the manual, but I can sum up my research in the following:

  • local scoping is provided using "local foreign variables" i.e. a variable allocated on the C stack and referenced from Lisp; while
  • C heap allocation is possible via make-alien.

However, we're in Lisp, so what does "allocating on the C stack" even mean?

This form is analogous to defining a local variable in C: additional storage is allocated, and the initial value is copied. This form is less analogous to LET-allocated Lisp variables, since the variables can’t be captured in closures: they live only for the dynamic extent of the body, and referring to them outside is a gruesome error.

Referencing locals outside of scope is undefined behaviour, yes, but how do we work with it? The manual tells us nothing about the return value of (the expansion of) with-alien, and yet again I'll spare you the details, here's how it works for socking::timeval objects:

> (sb-alien:with-alien ((tv sockint::timeval))
    tv)
#<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP #X7FFFF5227FE8 :TYPE
  (SB-ALIEN:STRUCT
    SB-BSD-SOCKETS-INTERNAL::TIMEVAL
    (SB-BSD-SOCKETS-INTERNAL::SEC
      (SB-ALIEN:SIGNED 64) :OFFSET 0)
    (SB-BSD-SOCKETS-INTERNAL::USEC
      (SB-ALIEN:SIGNED 64) :OFFSET 64)
    (SB-BSD-SOCKETS-INTERNAL::PADDING-0-16
      (ARRAY
        (SB-ALIEN:SIGNED 8) 0) :OFFSET 128))>

In other words, it works exactly like a progn, only it's bad juju to return a pointer to a variable allocated within this scope. However, constants.lisp automagically provides accessors for (* sockint::timeval) objects, that is, alien pointers to sockint::timeval, which requires us to get the address (sb-alien::addr) of the tv we allocate:

> (sb-alien:with-alien ((tv sockint::timeval))
   (let ((tvp (sb-alien:addr tv)))
     (setf (sockint::timeval-sec tvp) 42
           (sockint::timeval-usec tvp) 1)
     (format t "~s ~s~%"
             (sockint::timeval-sec tvp)
             (sockint::timeval-usec tvp))
     tv))
42 1
... ; sockint::timeval data

This example then provides us with a basis for handling SBCL "alien" data, which allows us to provide the user with sockopt calls in a digestible manner. In particular, for setf we need to convert CL numbers (expressed in seconds, and/or fractions thereof) to sockint::timeval objects. So if we truncate, for example, 42.1:

> (truncate 42.1)
42
0.099998474

the integer part (42) will be the value of timeval-usec, while the fractional part converted to microseconds will represent the value of timeval-usec. This gives us:

(defun inttime-to-timeval (time)
  (multiple-value-bind (sec usec/mil)
      (truncate time)
    (sb-alien:with-alien ((tv sockint::timeval))
      (let ((usec (truncate (* usec/mil 1000000)))
            (tvp (sb-alien:addr tv)))
        (setf (sockint::timeval-sec tvp) sec
              (sockint::timeval-usec tvp) usec))
      tv)))

which reduces the boilerplate to:

(define-socket-option sockopt-rcv-timeout nil sockint::sol-socket
                      sockint::so-rcvtimeo sockint::timeval
                      inttime-to-timeval ?? ???)

There's a very nasty (but subtle to the untrained eye) problem with this code: inttime-to-timeval returns sockint::timeval objects, while the alien call to setsockopt expects (* sockint::timeval).

This is also the only practical purpose I found for the previously-described mangle-setf-buffer parameter: when applied to locals, sb-alien:addr only works within the sb-alien:with-alien scope of those variables, so we need to get the address of our sockint::timeval object right before calling into setsockopt. It's either this or dynamically (via sb-alien:make-alien) allocate the structure and free it somehow after the system call, but there seems to be no support for this, so... anyway, after many hours of fiddling around with the options available, I arrived to the following solution:

(define-socket-option sockopt-rcv-timeout nil sockint::sol-socket
                      sockint::so-rcvtimeo sockint::timeval
                      inttime-to-timeval ?? sb-alien:addr)

This leaves us with the "get" part of sockopt. Obviously, mangle-return should help us with that, except there's depth in there as well. In short, mangle-return should, for us: a. receive a foreign (void) pointer value x and a(n integer) size; b. use x and size to decode a sockint::timeval, or better yet, a (* sockint::timeval) tvp; then c. convert the data in tvp to a number (of seconds) and d. return that number. For the first part, we mentioned x is an untyped pointer, so we need to cast it to our type, which gives us something along the lines of:

(assert (= size (sb-alien:alien-size sockint::timeval :bytes)))
(let ((tvp (sb-alien:cast x (* sockint::timeval))))
  ... ; read data referenced by tvp etc.

The assertion is there for good measure, it being triggered would most likely signal a bug in SBCL. The rest is easy now, which leads us to:

(defun timeval-to-inttime (x size)
  (assert (= size (sb-alien:alien-size sockint::timeval :bytes)))
  (let* ((tvp (sb-alien:cast x (* sockint::timeval)))
         (sec (sockint::timeval-sec tvp))
         (usec (sockint::timeval-usec tvp)))
    (+ sec (/ usec 1000000))))

and, in final:

(define-socket-option sockopt-rcv-timeout nil sockint::sol-socket
                      sockint::so-rcvtimeo sockint::timeval
                      inttime-to-timeval timeval-to-inttime sb-alien:addr)

This, as you might imagine, works, and I'll gladly take any suggestions regarding the code. Perhaps I could even make a patch on top of a genesis -- recall that I've already added SBCL on top of Hunchentoot last year. Actually, one year later, this brings me back to precisely the same discussion, namely of what the hell I'm doing here.

My more recent adventures began, as they did then, with something else, only this time the "something else" was something different: I wanted to fix the TCP connection maintenance code of an IRC library that I was using; upon cursory inspection, I decided that it'd be simply easier to roll my own, on top of my battle-tested CL implementation, SBCL. So I thought: hey, how hard could it be to configure a socket using SBCL... meanwhile that blew up into a lengthy research into SBCL internals, meaning that now I'm to either maintain SBCL or... take another stab at rolling out my own? I guess I'm already doing one or the other, only this imposes a significant burden upon my brain; or as you can see, I'm 3369 words in and still not done.

For one, let's admit that SBCL is a luxury item and thus actively maintaining it is a luxury of sorts. There's absolutely no value and conversely, there's tremendous pain driven by the amplification of e.g. that old bootstrapping problem: building SBCL using itself would be absolutely no issue were we on a CL machine, except I don't have any Lisp machines around. I have plenty of ARM/Linux and x86-64/Linux machines, and in particular the idea was to be able to run superior Lisp bots, and perhaps other CL stuff on just about any hardware I could get my hands on. Sadly, this is not possible today, and I'm neither smart, nor resourceful nor crazy enough to dive into a megalith every time one of these "minor" issues appear.

As for the other, I'm pretty confident that the so-called "C machine" can only be reliably used by actually taking out the C altogether. More generally, writing your Lisp in C or Ada isn't any different from writing it in Lisp; either way, you'll eventually stumble upon having to import yet another "necessary evil" into your world, so since everything you own runs Linux and Linux is built using GCC, you can either have that or less. Not more, that "more" is always bound to be contaminated by problems with existing shit. Thusly, what remains among the readily-available-yet-so-far-away options is writing a Lisp compiler in (some) assembly language, why not one that can bootstrap SBCL while we're there3. So what remains is stupid computers eating me ever again.

Sure, diving into all this sea of code is nice and besides, it's generated a reference to sizeable chunks of sb-alien and sb-bsd-sockets in the process. Still, the current issue has been occupying my brain for a good part of this month and time ain't growing on trees. Judging by past experience, I'm not so sure this entire endeavour is worth the trouble at all, but I won't pass that judgment yet. Instead, I will leave this open:


  1. The sockint stands for sb-bsd-sockets-internal, which contains some of the constants required to interface to Unix's socket option interface.

    "Some", because I'm pretty sure it doesn't contain all of them. 

  2. To my knowledge this piece isn't given a proper description anywhere in the SBCL documentation; but hey, go learn some CL macros and then maybe you'll get it! 

  3. I am told by the internets that Picolisp shares many of the characteristics of the item I'm looking for. I haven't looked into it yet, so hey, who knows. 

Filed under: computing, lisp.
RSS 2.0 feed. Comment. Send trackback.

10 Responses to “Adding a new socket option to SBCL; or, Common Lisp is the death of me”

  1. #1:
    Michael Trinque says:

    Regarding the symbol you couldn't use outside some package, did you use `::` to reference it?

    I've been concerned about SBCL ever since a core maintainer ridiculed the idea of signing releases with a GPG key years ago. I wouldn't be surprised if SBCL is as haphazard as any other open source volunteer-supported turd. I'm curious whether a spelunk into the Clozure CL internals would leave you with the same bad taste in your mouth. It appears to be written by a consulting firm, i.e. folks that actually have to make money by being productive on the damned thing.

    Lately I trust nothing that doesn't have good money flowing through it.

  2. #2:
    spyked says:

    > Regarding the symbol you couldn't use outside some package, did you use `::` to reference it?

    Huh, it seems that did the trick, thanks! So now the only fiddling one needs to do with constants.lisp is adding the actual socket options.

    > I'm curious whether a spelunk into the Clozure CL internals would leave you with the same bad taste in your mouth.

    That sounds like a great idea, I'll take a look. In fact since I'm there, I might as well look into the other ones, at the very least to get an idea how the networking interface changes across CL implementations; though I'm not sure that's the issue. To my eye, the problem is that the system call interface should be a first-class citizen in any CL running on Unix, and preferably not through C library calls. That may be too much of me to ask, but even a sane 1-to-1 mapping between implementation-specific CL constructs ("socket streams" or whatever) and the actual POSIX interface would be a great feature.

  3. @spyked/#2 :

    > To my eye, the problem is that the system call interface should be a first-class citizen in any CL running on Unix, and preferably not through C library calls.

    The annoying bit here is that the kernel expects platform-specific structs and magicnumbers for many syscall parameters. See e.g. my UDP lib for a concrete example.

    It is certainly possible to construct these on the fly and without the use of the C stdlib, or for that matter GCC per se (see e.g. example in my "M" experiment) but then must determine the magicnumbers manually, per platform.

  4. #4:
    spyked says:

    "M" was precisely what I had in mind when I alluded to assembly, I think it's a fine example of clean interfacing to the kernel.

    I think what I'm trying to go towards is that reimplementing the socket interface at all the possible levels (POSIX, CL, usocket etc.) half-assedly like that is a bad idea, and that the (Common?) Lisp implementation shouldn't burden the user with supporting yet another such layer. I understand they wanted to make it easier, but in the process they've somehow managed to achieve the opposite, I'd much rather work with naked pointers than with what SBCL has now.

  5. @spyked/#4:

    AFAIK there's no obstacle to adding raw syscall support to SBCL and passing in raw octet arrays for params. Could then handle it a la "M". But will still have to get the constants and struct layouts somewhere, per platform.

  6. #6:
    magicmike says:

    The amount of cluelessness on display here defies belief. No wonder nothing you idiots touched or worked on went anywhere.

  7. #7:
    spyked says:

    Ahahahaha, welcome, @magicmike! Your comment has about the same substance as that other dude's, so as a consequence, "your" opinion about what went were doesn't amount to much... huh?

  8. #8:
    spyked says:

    For the record, the magic mike arrived here most likely from this comment on MP's piece.

    One's "excursion in «intelligent» impudence" is another's striving to understand the internals of some system or another. Hate is especially cheap when it's gratuitous, what can I say.

  9. [...] been almost two years since my last article in the Lisp category and although I gave it a good beating last time I approached the subject, [...]

  10. [...] This article is part of a series on eating the Unix world using Lisp. If you're not genuinely interested in the subject, then you're more than welcome to fuck off. [...]

Leave a Reply