How to make music in 44100 really easy steps

May 15, 2021 by Lucian Mogosanu

So... as my previous article on the demise of music-makin' ended on an overly optimistic note, not even a week passed and one anonymous reader made a point of informing me that no one reads code anymore.

That's their loss, I guess, but as a consequence of this realization, I am now even less inclined to show any of the stuff that I've worked on in the past year or so, despite the one or three good things keeping me busy. So next time I stop publishing useful stuff, don't blame me; blame marketing, "democracy", covid -- in short, blame anything you'd like but me, as I didn't put this blog here for any single purpose in the first place, nor do I intend to steer it that way. I'll just write what, when, if I feel like it and that's that.

For example, right now I feel like using the clickbait title above, which I hope you understand makes for great SEO, to walk the reader through a thought experiment consisting of a simplified, yet at the same time detailed account of Joe's problem. As I've said, "making music" is more than just "playing musical notes"; the precursor of music is sound, which itself is preceded by physical phenomena, which, if we let the mathematics aside, aren't that much of a deal for the empiricist.

Let's say that we have lying somewhere an object made from a definite material with known physical properties (elasticity, density etc.); and let's say that at some point we decide to tap that object with our finger. The result will be a movement propagating through air and into the room, in particular into our ears, followed after a definite amount of time by the processing and interpretation which yields a definite abstract object called "a sound". For now let's observe that the properties of this object map directly to, or, as the mathematician would say, they are isomorphic to the underlying physical phenomena; and furthermore, the distinguishing factors of this particular object that we have observed with our mind's eye are directly influenced by the material, as well as our finger, the room, the air in it and ourselves.

All that aside, the sound in question has a few interesting properties; such as a duration over which it will be heard; or an amplitude, i.e. "loudness"; or a pitch, i.e. "gravity" and its counterpart "acuteness" -- these so-called properties are partial simplified views of the same phenomenon which, when placed in a frame of reference, leads I believe to a more refined precursor of music, the thing which English speakers call rhythm. So let us for a moment visualize the axis of time, with our hit at moment zero; and let's, around the half-second mark, produce a second tap: now we get two sounds with space between them, which thusly become related in some way, while at the same time remaining distinct objects!

Now, for our next proto-musical measure, occuring precisely a second after our previous "moment zero", let's say that we've doubled the number of taps to four, but this time they go as such: the first tap is produced at zero; the second at the quarter-second mark; the third at the half-second mark; and finally, the fourth at three-quarters of a second. This gives us yet another frame of reference which has quite an interesting impact upon our internal interpretation. For example, we notice that by doubling the number of taps-per-second, we intuitively perceive the taps in the second measure to be "twice as short" as those in the first one, despite the sound itself being about the same length in every instance of the six we've mentioned. Also very importantly, we observe an overlap between both the first sounds in each measure and between sounds 1.2 and 2.3, where the first number is the measure -- in other words, measure 2 preserves the structure of 1, but at the same time it adds some information to the initial rhythm, such as the quality of 2.2 and 2.4 as accents. From here on one may wonder for example how the sequence composed of one half-note followed by two quarter-notes sounds; and from here on, the possibilities in terms of rhythm are just about endless, as we haven't even touched any other particular properties of sounds -- there is a reason, for example, why in rock the bass drum goes first and the snare goes on the accent, but let's not bore ourselves with such details just yet.

In order to arrive from "rhythm" to "music", all we need to do, in what may perhaps turn out to be a very unpopular opinion, is vary the frequency of our ideal taps in orders of magnitude. Anything up to twenty taps per second is too low, and even those twenty don't quite amount to what we would characterize as a "musical note". Somewhere between this twenty and two-hundred and twenty, however, there lies in the human mind a soft-threshold which will switch the perception from that of an utterly boring sequence of taps to something which faintly resembles a musical note, the Western note A3 to be more precise. Of course, a real finger can't normally tap at that frequency; but a vibrating string, or a breath of air vibrating through a reed, or some other device can easily generate waves with a given amplitude and frequency. Since we're here, let's take a break to visualize this:

This is an ideal wave, i.e. a sine, that is oscillating at 8Hz. On the vertical axis we have the amplitude of our sound, normalized to the interval -1:1; while on the horizontal axis we have time. If you look carefully, there are eight peaks and eight lows and the amplitude starts at zero at time zero and ends at the same point after one second. It's... something. Let's take a look at how the same wave looks when we set the frequency to 220Hz:

This is how the A3 note, represented as a sine wave, looks like when you open it in a sound processing program. It also sounds pretty boring; in reality these sounds, as produced, say, by Joe's horse-gut string aren't ideal sound waves. In practice, a string plucked to A3 will vibrate at different frequencies in different portions and thereby will produce a packet composed of a wave oscillating at 220Hz, one oscillating at 440Hz (note A4) and another one oscillating at 660Hz (about the same pitch as E5, i.e. A4's fifth) and so on. In musical jargon, the first tone is the fundamental, while the second overtone is the first harmonic, the third one is the second harmonic and so on. For the sake of visual comfort, let's move back to our 8Hz basis and notice how this superposition of sine waves looks something along the lines of:

Also, did you notice that A3 is twice the frequency of A4? Leaving aside the fractal shape of rhythm-turned-music-turned-octaves, beyond this point the whole craft explodes towards oscillators, Fourier analysis and well, in other words, more than just a mere thought experiment thrown about on a piece of paper. For example, and as a conclusion to this already too long article, in order to synthesize one of the waveforms above on a digital computer, it's not merely enough to know the mathematical functions that model the sound. An analog computer would have taken them quite easily, in fact, but a digital one requires a means to convert that lovely continuous shape above into numbers; and it so happens that this problem wasn't solved by computer-people, but by information theorists.

Take the initial 8Hz sine wave above for example. The simplest way to represent it using a computer is as an array of numbers, each number representing the value of the amplitude at a given moment. But... at which moment? and how many of these "moments" can we have at a minimum, such that, when feeding our number-array to a digital-analog converter, the result will be music? Getting four samples at equal intervals is obviously not going to be enough; and neither are eight samples, since remember, we have eight peaks and eight lows, and we could risk sampling only the peaks or only the lows, or worse, only the silence. Referring to Shannon and Nyquist, let us spare ourselves the trials and errors and recall that the sampling rate needs to be at least double the frequency; so in our case, we need at minimum sixteen samples per second in order to get any meaningful information at all out of the 8Hz-sine above. In practice we'll need much more than that, since we've already established that the human musical ear is acquainted to a spectrum of frequencies (very) roughly situated between 0.1KHz and 10KHz. Today's average Joes would have you believe that 44100 samples per second are enough to represent anything that the human ear could possibly hear. I seriously doubt this, but my doubts aside, this is what the average sound processor on a PC will give you and this space is where everyone so arduously tries to fit notes played at any pitch between, say, A3 and E8.

And that's about it, really. Unlike yours, my clickbaits actually deliver.

Filed under: asphalt.
RSS 2.0 feed. Comment. Send trackback.

One Response to “How to make music in 44100 really easy steps”

  1. [...] music is obviously more and more on my mind these days, I had spent a few of weeks researching emulated pianos, or [...]

Leave a Reply