It's hard to pinpoint the exact moment in time when I started looking at this -- it's probably been around three years of fiddling with the subject on a more or less consistent basis, and of practicing in order to get some grounding. In any case, I think that my current understanding is sufficient for me to begin putting my thoughts in written form. By the way, there are plenty of books on SuperCollider out there, so check them out if you're looking for some actual study material1.
SuperCollider is at least a couple of things. For one, it encompasses a sound server, which deals with abstractions such as waveforms, signals, events, timing and so on and so forth. I very much suspect that some of its real-time functionality can be overloaded for other applications, but by and large, the point of this server is to interface at the lower layers of abstraction, on top of the system sound providers, e.g. ALSA or whatever. As for the other, the sound server exposes an interface through a protocol called Open Sound Control, or OSC2 for short. This protocol is driven by the user-facing component, an interpreter which translates text-form structured input -- in other words, computer programs -- into server commands.
At its core, this looks very much like an evolved form of the C64 programming environment. SCLang is in fact a general-purpose multi-paradigm language, but fit for the particular scope of sound design and music composition, much like what R is for statistical programming or what MatLab is for mathematical modelling. This is a complete departure from the rigid form of traditional "industry standard" DAWs, at the cost of time spent building custom abstractions. So just as is the case for the C64 and SID, the distinction between code and data is thin -- functions are a first-class citizen, for example -- and it's up to the user to draw the lines as he sees fit.
I'd even go as far as to say that SCLang is in fact made up of two almost distinct languages. Take for example the following (somewhat convoluted) snippet:
(
{
x = SinOsc.kr(2, mul: pi);
y = SinOsc.ar(440, x, mul: 0.2) ! 2;
z = Decay.kr(Impulse.kr(1), 2);
y * z
}.play;
)
Let's unpack this from the top down:
- the parentheses mark a block of multiline code that can be evaluated; this is a stylistic gimmick: had I written the entire expression on a single line, they would have been unnecessary;
- the curly braces denote a function definition, and whatever goes between them is the function body;
- indeed, the user can "play" a function, i.e. instruct the server to play back the signals that were defined as a result of function evaluation;
- the function body contains four statements, separated by semicolons; this syntax has more or less the same meaning as in the C family of languages, i.e. the statements are evaluated in a sequential fashion, from the perspective of the interpreter;
- the last statement bears the special role of a return value; assuming that this return value is a UGen (see below), then the server uses it to generate an audio signal;
- a "unit generator", or UGen, is an object that describes a digital sound signal in the sense of a data flow, i.e. UGens are composable and it's the relationship between them that describes the sound generation and processing path, not the sequentiality of interpreter evaluation; notice for example that the variable
xis defined in the first statement and used in the second: this signal flow will be processed in real time when the sound is played back; SinOscandDecayare both UGens;- there is a clear distinction between "control" and "audio" rate processing3, as observed from the
krandarspecifiers; - the first UGen, labeled
x, is a control rate sine signal running at 2 Hz; the amplitude of this signal -- labeledmulonly because in the SuperCollider universe, amplitude modulation and multiplication are the same operation -- is set to pi; - the second UGen, labeled
y, is also a sine, but running at audio rate at the very arbitrary frequency of 440 Hz; the phase of this sine signal is modulated usingx; its amplitude is 0.24; the!is a sugary operator that distributes the same signal over multiple audio channels (in our case, two); - the third UGen,
z, is a decaying envelope running at control rate; this decay signal is excited using anImpulseUGen running at 1 Hz; the decay time (to 60 dB, as per the spec) is 2 seconds; - the
yandzsignals are multiplied, which in this case means thatyis modulated in amplitude byz; the result is returned; and finally, - when
playis called, the interpreter instructs the server to swallow a compiled form of the resulting UGen, and to feed it as a waveform to the computer's DSP; the resulting sound is an A4 note playing once per second, each note fading in 2 seconds.
Compare and contrast to this more complex -- yet nowhere near the upper boundaries of the system -- example:
(
Task(
func: {{
var freq = 440.rand + 440;
var synth = Synth(\default, [\freq, freq]);
0.5.yield;
synth.set(\gate, 0);
}.loop },
clock: TempoClock.new(160 / 60);
).play;
)
Similarly, let's try to understand this piece by piece:
- at the upper layer, we have something called a Task; in short, a Task is an object that can be used to schedule high-level "events"5 in a timely fashion;
- the Task is instantiated with two arguments: a function and a clock;
- in this particular case, the clock is used to provide tempo control; a TempoClock object is instantiated, running at 160 BPM (160 / 60 beats per second);
- the interaction between the clock and the function is determined by this conversion from beats to seconds; within the function, time is expressed in beats, and its conversion to seconds depends on the tempo (see below);
- at the top level, the function contains an infinite loop, i.e. it wraps another function which is called using the
loopmethod, thus the inner function will be called ad infinitum (or until playback is stopped); - the body of the infinite loop is similar to the function in the previous example, in that it is used to communicate with the server; but this communication is achieved through higher-level constructs, more precisely through Synth objects;
- in the first statement, a random frequency is randomly chosen in the 440-880 Hz interval, i.e. A4-A5 octave
- in the second statement, a Synth object is instantiated, i.e. as a result, a sound is be played by the server;
- the so-called "instrument" associated with this Synth instance is a UGen wrapped in a standard high-level object called a SynthDef, which provides a convenient way to reuse sounds and to control when they start/stop; for the sake of brevity, the default6 SynthDef is used;
- the
\freqparameter of the synth instance is set to the previously-defined random frequency7; it's expected that the frequency of this default oscillator is set to the same value when the sound is produced; - in the third statement, the method
yieldis invoked on the number 0.58; this is to be interpreted as such: the function invocation performs a coroutine-style control flow transfer for a time of 0.5 beats; as this method is called, an internal conversion from beats to seconds is performed based on the tempo clock -- 0.5 * 160 / 60 is about 1.33 seconds -- and as a result the function will pause for this period of time; then, control flow is transferred back to the function; - in the fourth statement, the
setmethod is called on the synth object; this sets the\gateparameter of the synth to 0, which causes the synth to be released; thus, to summarize: - this Task schedules a background thread which fires a synth sound of a random pitch between 440 and 880 Hz, each half a beat at 160 BPM.
Both examples are self-contained, so paste them into the SuperCollider IDE and evaluate them to hear the result.
Anyway, at this second point we're no longer in the "dataflow programming world", but we've somehow crossed into asynchronous programming, which is quite important when one wants to program any kind of polyphony into his song. This goes further into loops, quantization and, finally, towards live coding techniques which I won't explore here9. Eli Fieldsteel has published a bunch of recordings of his courses which go into much more depth over the course of a university semester.
As for myself, I'm just a beginner who has (so far) composed a few simple songs. I'm not very familiar with either the domain-specific craft nor with the programming style. As usual in programming, everything seems possible if you pour enough work into building the right abstractions, but the jump in complexity from function A to functions A+B is linear only very rarely, assuming, say, the naïve lines-of-code measure. For example, once you near the modest 500 LoC mark, you really need to know what you're doing, otherwise debugging your musical programs is going to be one hell of a nightmare. But if you know what you're doing, then the same level of information density as what went into those beautiful C64 soundtracks, can also be achieved here.
By the way, there's a public code repository available, go check it out.
-
See for example A Gentle Introduction to SuperCollider or The SuperCollider Book; I think there are plenty others, although I found the embedded documentation to be quite useful for reference and examples, as well as for the few step-by-step tutorials that I used for learning back when I started this. ↩
-
Since we're here, it's worth mentioning that OSC isn't specific to SuperCollider. In fact it was created as an alternative to MIDI and it's used by many projects, including open source DAWs and some that are more similar in scope to SuperCollider.
By the way, SuperCollider isn't unique, nor is it (I think) the first of its kind. There are quite a few like it, this just happens to be my tool of choice. I don't have any time or space to make an exhaustive review, but look e.g. at PureData or VCV Rack for starters. Or maybe check out Strudel, hipster kids seem to love it. ↩
-
"Control" rate usually refers to slowly evolving signals, such as a pulse signal controlling a clock that runs at 10 Hz, or a LFO that modulates the frequency cutoff of filter. "Audio" rate usually refers to signals that operate in the audible spectrum, i.e. 20 Hz-20 KHz, such as the operators of a FM synth.
This distinction has always existed in the field of electronic musicmaking, mainly for economic reasons. For analog devices, a control rate oscillator circuit could for example be built using lower-quality components. For digital devices, control rate signals have less stringent latency requirements than audio rate signals; the latter will run on the audio processing unit, while the former may run on the CPU, in the same environment where you run your browser, say. The 20 Hz cutoff has a definite meaning here as well: a latency of a 20th of a second, i.e. 50 miliseconds, is what your operating system can reasonably do, and really, the timer accuracy here should be, per Shannon-Nyquist, 25 miliseconds, so that signal sampling can occur accurately. Most operating systems nowadays may allow processes to run in "real-time" mode, i.e. guaranteeing a latency of, say 10 miliseconds. Which means that at best you can do control rate at 40 Hz, but it's safer to stop at 20.
In other words, it's not just the human ear that reacts differently to these frequencies, it's your computer as well. And contrary to popular belief, computing ain't cheap. ↩
-
The authors of SuperCollider took advantage of the blight that is floating point numbers and they chose to normalize the value of many variables, including amplitude, to the unit. Panning, for example, is expressed as a value between -1 and 1, amplitude is in the 0-1 interval. Anyways, it's not arbitrary, it's just that the authors had the minds of mathematicians, not engineers -- in the "audio industry", amplitude is expressed in decibels, innit? ↩
-
Events, in this sense, are roughly analogous to DAW automation, including but not limited to stuff such as looping a track or inserting a marker for fading out the song. It's very likely -- in the sense that I haven't done a thorough experiment, but I've seen quite a few examples which confirm this thesis -- that all the "higher-level" functionality can also be achieved at the "low level" discussed in the first example, albeit with much more coding effort. So in a sense, yes, I believe Tasks and similar abstractions are just sugar on top of the base language, i.e. a "standard library", as it were. ↩
-
SynthDefs -- I suspect that mainly for the sake of performance -- have their own namespace, comprised of bindings of symbols to UGens. A symbol is an alphanumeric string prepended by a backslash, e.g. as in
\default. This default Synth is present on each freshly booted SuperCollider instance. To create a new SynthDef, one would do, e.g. as per the documentation:( SynthDef(\SimpleSine, { |freq = 440, out| Out.ar(out, SinOsc.ar(freq, 0, 0.2)) }).add; x = Synth(\SimpleSine); )But this post is already getting huge, so I won't insist on all the details. ↩
-
At this point this is veering a bit into conventionalism. All the so-called "high-level" constructs in SuperCollider may operate on this
\freqparameter, and furthermore, they can convert between, say, MIDI notes and frequencies, as well as between degrees in any sort of standard or custom scale -- folks in the so-called microtonal world use this at great length to model non-standard scales -- and frequencies. There are other conventionalisms like this one, such as the\gateparameter in the fourth statement of the loop. ↩ -
Methods are objects; problem? ↩
-
To be honest, I'm not even interested particularly into live coding. I might dive further into it at some point, but... it's way more fun to hook up a controller to the PC and then straight into SuperCollider. I made this exercise at some point, I just lack any of the super-schmancy Akai controllers that everyone seems to use.
In principle you can connect any I/O device to SuperCollider. One of the examples in the documentation demonstrates how to use the mouse as a makeshift theremin. But I've seen all sorts of crazy stuff, such as sounds generated based on particular aspects of video feeds. I'm not particularly interested in that sort of stuff either at this point; but regardless, the level of experimentation that this platform allows is astounding.
If anything, I guess it was created for these sorts of folks. It is certainly a nice device for learning music, although I really do suggest learning to play at least an instrument as an orthogonal activity. It doesn't particularly matter which one, just pick one -- this exercise may at least serve as a comparative insight as to the interpretative nature of live coding. ↩
A side note for the programming language folks: SCLang, as can be oberved from the examples in this article, is largely inspired from Smalltalk. Thus it features a particular type of object-oriented system, where function invocation is a form of message passing. As a consequence, the server, despite being a remote process, is just another object in the SCLang namespace. So, for example, configuring the sampling rate is just a matter of setting the right parameter, while playback is started by sending a play message, as illustrated above.
This level of programmability is way beyond anything one can achieve in a traditional DAW.