Friday, December 30, 2016

Sounds Like F#

Last time, I thought that I was going to have to do some special research to account for multiple sources of sound. Reading this short physics snippet I can see that multiple sources is really as simplistic as my initial thinking was inclined to. You can ignore all of the decibel stuff and focus on the statements about amplitude, because, of course, I am already working in terms of amplitude. What can I say, I let the mass of voices on the internet trouble my certainty unduly? Sort of.

There's two important issues to consider:
  1. Offsetting of the signals.
  2. Psychological issues regarding the perception of sound.
Regarding number 2, here's a quick hit I found: http://hyperphysics.phy-astr.gsu.edu/hbase/Sound/loud.html. It seems credible, but I'm not sufficiently interested to pursue it further. And so, my dear Perception, I salute you in passing, Good day, Sir!

Offsetting of the signals is something that is implicitly accounted for by naively adding the amplitudes, but there is a question of what the sineChord() function is trying to achieve. Do I want:
  1. The result of the signals beginning at exactly the same time (the piano player hits all his notes exactly[?] at the same time—lucky dog, guitar player not as much[?]. Hmm...)?
  2. The average case of sounds which may or may not be in phase?
Regarding case 2, Loring Chien suggests in a comment on Quora that the average case would be a 41% increase in volume for identical sounds not necessarily in phase. At first glance, I'm inclined to think it is a plausible statement. Seeming to confirm this suggestion is the summary about Adding amplitudes (and levels) found here, which also explains where the the 41% increase comes from, namely, from \(\sqrt{2}\approx 1.414\dots\). Good enough for me.

All such being said, I will pass it by as I am approaching this from a mechanical perspective. I'm neither interested in the psychological side of things, nor am I interested in getting average case volumes. I prefer the waves to interact freely. If would be neat, for example, to be able to manipulate the offset on purpose and get different volumes. The naive approach is little more subject to experimentation.
I would like to go at least one step further though and produce a sequence of notes and chords and play them without concern of fuzz or pops caused by cutting the tops of the amplitudes off. We need two pieces: a) a limiter and b) a means of producing sequences of sounds in a musically helpful way.
A limiter is pretty straight-forward and is the stuff of basic queries. I have the advantage of being able to determine the greatest amplitude prior to playing any of the sound. The function limiter() finds the largest absolute amplitude and scales the sound to fit.


Here is an example usage:


One concern here is that we may make some sounds that we care about too quiet by this approach. To address such concerns we would need a compressor that affects sounds from the bottom and raises them as well. The general idea of a sound compressor is to take sounds within a certain amplitude range and compress them into a smaller range. So, sounds outside your range (above or below) get eliminated and sounds within your range get squeezed into a tighter dynamic (volume) range. This might be worth exploring later, but I do not intend to go there in this post.

Last post I had a function called sineChord(), which I realize could be generalized. Any instrument (or group of identical instruments) could be combined using a single chord function that would take as a parameter a function that converts a frequency into a specific instrument sound. This would apply to any instruments where we are considering the notes of the chord as starting at exactly the same time (guitar would need to be handled differently). So, instead of sineChord(), let's define instrumentChord():



Next we will create a simple class to put a sequence of notes/chords and durations into which we can then play our sounds from.

Reference Chords and Values

Here are a few reference chord shapes and duration values. I define the chord shape and the caller supplies the root sound to get an actual chord. The durations are relative to the whole note and the chord shapes are semi-tone counts for several guitar chord shapes. The root of the shapes is set to 0 so that by adding the midi-note value to all items you end up with a chord using that midi note as the root. This is the job of chord().



Next, InstrumentSequence is an accumulator object that knows how to output a wave form and play it. For simplicity, it is limited to the classic equal temperment western scale. Note that it takes a function has the task of turning a frequency into a wave form, which means that by creating a new function to generate more interesting wave forms, we can have them played by the same accumulator. If we only want it to produce the wave form and aggregate the wave forms into some other composition, we can access the wave form through the routine WaveForm.



A sample use of the class which follows a very basic E, A, B chord progression follows:



To get the same progression with minors, use the EmShape, etc., values. But, at the end of the day, we've still got sine waves, lots of them. And, it's mostly true that,

                           boring + boring = boring.

I will not attempt the proof. You can make some more examples to see if you can generate a counter-example.

There is also some unpleasant popping that I would like to understand a bit better—for now I'm going to guess it relates to sudden changes in the phases of the waves at the terminations of each section of sound. Not sure what the solution is but I'll guess that some kind of dampening of the amplitude at the end of a wave would help reduce this.

Thursday, December 29, 2016

The Sound of F#

I'm doing a bit or reading and dabbling the in the world of sound with the hope that it will help me to tackle an interesting future project. One of the first steps to a successful project is to know more about what the project will actually entail. Right now I'm in exploratory mode. It's a bit like doing drills in sports or practicing simple songs from a book that are not really super interesting to you except for helping you learn the instrument you want to learn. I'm thinking about learning to "play" the signal processor, but it seemed like a good start to learn to produce signals first. And sound signals are what interest me at the moment.

Sound is pretty complicated, but we'll start with one of the most basic of sounds and what it takes to make them in Windows, writing in F#. Our goal in this post is to play sounds asynchronously in F# Interactive.

The first iteration of this comes from Shawn Kovac's response to his own question found here at Stack Overflow. Below is my port to F#. I have used the use keyword to let F# handle the Disposable interfaces on the stream and writer to take care of all the closing issues. It's important to note that the memory stream must remain open while the sound is playing, which means this routine is stuck as synchronous.



To get an asynchronous sound we can shove the PlaySound function into a background thread and let it go. This looks like the following:



A limitation of the PlaySound() approach given above is that it limits your method of sound production. The details of the sound produced are buried inside the function that plays the sound. I don't want to have a bunch of separate routines: one that plays sine waves, one that plays saw tooth, one that plays chords with two notes, one that plays chords with three notes, one that include an experimental distortion guitar—whatever. The sound player shouldn't care about the details, it should just play the sound, whatever it is. (Not a criticism against Kovac, he had a narrower purpose than I do; his choice was valid for his purpose.)

I want to decouple the details of the sound to be produced from the formatting and playing of the sound. Here we have the basic formatting and playing of the sound packaged up that will take any wave form sequence we give it (sequence of integer values):



I haven't decoupled this as far as we could. If we wanted to handle multiple tracks, for example, you would need to take this abstraction a little further.

Now, we also need some functions for generating wave forms. Sequences are probably a good choice for this because you will not need to put them into memory until you are ready to write them to the memory stream in the PlayWave function. What exactly happens to sequence elements after they get iterated over for the purpose of writing to the MemoryStream, I'm not quite clear on. I know memory is allocated by MemoryStream, but whether the sequence elements continue to have an independent life for some time after they are iterated over, I'm not sure about. The ideal thing, of course, would be for them to come into existence at the moment they are to be copied into the memory stream (which they will, due to lazy loading of sequences) and then fade immediately into the bit bucket—forgotten the moment after they came into being.

Here are a few routines that relate to making chords that are not without an important caveat: the volume control is a bit tricky.



Here is a sample script to run in F# Interactive:



The above plays a version of an A chord (5th fret E-form bar chord on the guitar).

Note that I have divided the volume by the number of notes in the chord. If we don't watch the volume like this and we let it get away on us, it will change the sound fairly dramatically. At worst something like the "snow" from the old analog TVs when you weren't getting a signal, but if you just go a little over, you will get a few pops like slightly degraded vinyl recordings.

We could include volume control in our wave form production routines. This wouldn't necessarily violate our decoupling ideals. The main concern I have is that I want the volume indicated to represent the volume of one note played and the other notes played should cause some increase in volume. I want the dynamic fidelity of the wave forms to be preserved with different sounds joining together. Even after that, you may want to run a limiter over the data to make sure the net volume of several sounds put together doesn't take you over your limit.

However, I need to do more research to achieve these ideals. For now, I will break it off there.

Friday, December 23, 2016

Unicode in LispWorks

Background

I have a really basic vocabulary quiz program that I have been dabbling with, partly as a means of learning to use LispWorks and partly as a way to get a little different way of reviewing my vocabulary in my personal study of NT Greek. I obtained the basic vocabulary from Mounce's Flashworks program. There are some text files for vocabulary for a few different languages, including Greek and Hebrew. These files can be read as UTF-8 Unicode.

Unicode support in Common Lisp is non-uniform, owing to the fact that the language standard was established before Unicode arrived [Seibel]. So, what works in one implementation may not work in another. I have found this in working with LispWorks and Closure Common Lisp. Since I wanted something easy, not something clever, I have not attempted to bridge these. I just want the "easy" answers for the basic problems of reading and writing Unicode in LispWorks.

Reading and Writing Unicode

If I want to read a basic tab delimited Unicode file, line by line, I use get-text-data (below). keepers allows me to take a subset of the columns in the file



The first awful thing you will notice here is the special-read function. This is not always necessary, but I did have a case where I had a leading Byte-Order-Mark (BOM: 0xFEFF) that I needed remove from the file. This, somewhat oddly, but understandably, is called #\Zero-Width-No-Break-Space in LispWorks [WikiPedia]. If memory serves, putting the BOM in (which I did on purpose at one point) made reading the file work without specifying a format to read in. But the BOM is optional for utf-8.

Writing out to a file is quite straightforward, but the question is whether it will be possible to read it in correctly. The answer is that it depends on how you will read it in later. If you are only going to read it into the same program or into programs that will assume UTF-8, then there's no issue. But if you want to be able to read it in without thinking about whether the source file is Unicode or not, you can deliberately output the BOM so that it will read successfully in LispWorks even if you did not specify the :external-format. Below is a sample of doing just this.



The function colleciton->plist is my own function which turns my object into a plist which can be read by the built-in function read. (This is my way of textifying CLOS objects to write them to file—the details are not relevant to my topic.)

Now, I can read in the plist that I wrote out, without specifying the :external-format, as in:



However, if I didn't manually output the BOM, I would need to write:



The long and the short of it is that you can use :external-format all the time and use a consistent format for files that pertain to your program and that's probably the best practice. If you want to be sure another LispWorks program is going to be OK with your UTF-8 file, whether it is consistently using that format or not, putting a BOM at the start for good measure may be a good idea. On the other hand, maybe the reader will choke on the BOM because it doesn't filter it out when reading a file it already knows is utf-8.

So, I didn't solve the problem, but if you're head butting a format issue in Unicode, maybe this says enough about it that you can find something to do with the particular data you can't seem to read right now—because, I feel your pain.

For more information on IO for Unicode in LispWorks, see the manual.

Tuesday, December 6, 2016

Packing Spheres into a Cube

In this post I compare three different alignment strategies for putting spheres into a cube. Our main variable will be \(n\), the ratio of cube side to sphere diameter. The question is, how many spheres can we get into the cube? We denote the number of spheres as \(S\).

Matrix Alignment

The simplest alignment of spheres to think about is a matrix alignment, or row, column, layer. If \(n=10\), then we can fit 1000 spheres into the cube this way. Or, in general, \(S=n^3\).
Fig. 1. Matrix alignment.
One way to summarize this alignment is that each sphere belongs to its own circumscribing cube and none of these circumscribing cubes overlap each other. For small values of \(n\), this arrangement is optimal.

Square Base Pyramid Alignment

An improvement to \(S\), for larger values of \(n\) is had by making the alignment of subsequent layers offset. Start with a layer in a square arrangement (\(n\) by \(n\)). If you look down on this first layer, it's awfully tempting to nestle the next layer of spheres in the depressions between the spheres. We will make better use of the interior space, although we won't make as good a use of the space near the edges. But if we can get enough extra layers in to make up for the losses near the outside, that would carry the day.

Fig. 2. A few red dots show where we're thinking of putting the next layer of spheres. We won't fit as many on this layer, but the second layer isn't as far removed from the first.
In the matrix alignment, consecutive layers were all a distance of 1 unit between layers (center to center). With this current alignment, we need to know the distance between layers. We need to consider a group of four spheres as a base and another sphere nestled on top and find the vertical distance from the center of the four base spheres to the center of the top sphere. (Implicitly, the flat surface the four base spheres are resting on constitute horizontal. Vertical is perpendicular to that.)

Fig. 3. Top view of a square base pyramid packing arrangement.
We can see from Fig. 3., that line segment ab is  of length \(\sqrt{2}\). We take a section parallel to this line segment, shown in Fig. 4.

Fig. 4. We recognize a 45°-45°-90° triangle, abc.
We can calculate the vertical distance between layers, line segment cd, as \(\sqrt{2}/2 \approx 0.7071\dots\). Note that the spheres in this second layer are in basically the same configuration as the bottom layer. We can see this will be the case because the top sphere in Fig. 3., has its outer circumference pass across the point where the lower spheres meet. So adjacent spheres in the second layer will be in contact. And, further, second layer spheres will follow the rectangular pattern of depressions in the bottom layer which are spaced apart the same as the centers of the spheres in the bottom layer. So we will end up with the same organization of spheres in the second layer as the bottom layer, just one fewer row and one fewer column.
We can now calculate the number of spheres that will fit in a cube using this alignment as
$$S = n^2 \lambda_1 + (n-1)^2 \lambda_2,$$
where \(\lambda_1\) is the number of full layers and \(\lambda_2\) is the number of "nestled" layers. (The third layer, which is indeed "nestled" into the second layer is also returned to the configuration of the first layer and so I refer to it as a full layer instead. The cycle repeats.) To find \(\lambda_1\) and \(\lambda_2\), we first find the total layers, \(\lambda\) as
$$\lambda = \left \lfloor{\frac{n-1}{\sqrt{2}/2}}\right \rfloor+1$$
The rationale for this formula is: find how many spaces between layers there are and compensate for the difference between layers and spaces. Imagine a layer at the very bottom and a layer at the very top (regardless of configuration). The layer at the bottom, we get for free and is the 1 term in the formula. (The top layer may or may not be tight with the top when we're done but hold it up there for now.) The distance from the center of the bottom layer to the center of the top layer is \(n-1\). The number of times we can fit our spacing interval into this number (completely), is the number of additional layers we can fit in. (Additional to the bottom layer, the top layer is the last of the layers we just found room for in the \(n-1\), or, technically, \(n-1/2\), er, whatever, do some examples.)
The break down between \(\lambda_1\) and \(\lambda_2\) will not be exactly even, with \(\lambda_1\) having one more if \(\lambda\) is odd. We can show this neatly using the ceiling operator 
$$\lambda_1 = \left \lceil{\lambda/2}\right \rceil$$
$$\lambda_2 = \lambda - \lambda_1$$
As early as \(n=4\) we get \(S=66\), which is better than the matrix arrangement by 2 spheres. I modeled this to make sure I didn't let an off by one error sneak into my formula (see Fig. 5.).
Fig. 5. Somewhat surprisingly, rearrangement of spheres (relative to matrix alignment) produces gains in volume coverage even at \(n=4\).
Here it is in Maxima:



Tetrahedral Alignment

Another rearrangement is similar to a honeycomb. A honeycomb is an arrangement of hexagons that meet perfectly at their edges. Pictures of this are readily available online. This Wikipedia article suffices. Imagine putting a circle or a sphere in each of these honeycomb cells. The will produce a single layer which is more compact, although it will push the distance between layers apart. 

So, what do we need to know to make a formula for this one?
  1. Find the distance between successive layers.
  2. Formula for the number of layers.
  3. Formula for the number of spheres in each layer.
    1. This depends on knowing the range of layer configurations (we will only have two different layer configurations).
  4. Some assembly required.
This is the same procedure we followed already for the square based pyramid alignment, but we are being more explicit about the steps.

Fig. 6. We need to find the distance of line segment ad.
We recognize we have an equilateral triangle and so we can identify a 30°-60°-90° triangle on each half of altitude ab. We have drawn all of the angle bisectors for this triangle which meet at a common point (commonly known property for triangles). We can recognize/calculate the length of bd as \(\frac{1}{2\sqrt{3}}\). We can then calculate the length of ad as \(\frac{\sqrt{3}}{2} - \frac{1}{2\sqrt{3}} = 1/\sqrt{3}\). From the Pythagorean theorem, we have a length for cd of \(\sqrt{2/3}\).
The number of spheres in the first layer will require us to use the distance for ab. We now need to find the number of rows in the bottom layer. We have
$$L_1=n \rho_1 + (n-1) \rho_2,$$
where \(\rho_1\) and \(\rho_2\) are the number of each kind of row in the layer and \(L_1\) is the number of spheres in the first layer. Since rows are spaced by \(\sqrt{3}/2\), we have a number of rows, \(\rho\), of
$$\rho = \left \lfloor{\frac{n-1}{\sqrt{3}/2}}\right \rfloor+1$$
and \(\rho_1\) and \(\rho_2\) can be broken out similar to our \(\lambda\)'s earlier.
$$\rho_1 = \left \lceil{\frac{\rho}{2}}\right \rceil$$
$$\rho_2 = \rho - \rho_1.$$

Fig. 7. We're looking for the distance of line segment cd which can be found using ac and ad.
Now, there is a little matter of how nicely (or not) the second layer will behave for us. It is noteworthy that we are not able to put a sphere into every depression. We skip entire rows of depressions. (The Wikipedia article on sphere packing referred to these locations as the "sixth sphere".) These skipped locations may play mind games on you as you imagine going to the third layer. Nevertheless, the third layer can be made identical to the first, which is what I do here.

Fig. 8. The first layer is blue. The second layer is red. Are we going to have trouble with you Mr. Hedral?
There's a couple of things that happen on the second layer. First, instead of starting with a row of size \(n\), we start with a row of size \(n-1\). The second issue is that we may not get as many total rows in. We could do the same formula for rows again but now accounting for the fact that we have lost an additional \(\frac{1}{2\sqrt{3}}\) before we start. However, we can reduce this to a question of "do we lose a row or not?" The total distance covered by the rows, \(R\), for the bottom layer is
$$R = (\rho-1) \frac{\sqrt{3}}{2} + 1$$
If \(n-R\ge\frac{1}{2\sqrt{3}}\), then we have the same number of total rows. Otherwise, we have one less row. We let \(r_L\) represent the number of rows lost in the second layer, which will either be 0 or 1. Noting that the order of the rows is now \(n-1\) followed by \(n\), we can give an expression for the number of spheres, \(L_2\), in our second layer now.
$$L_2 = (n-1)(\rho_1 - r_L) + n\rho_2$$
We have a very similar formula for the total number of spheres in the box.
$$S = L_1\lambda_1 + L_2\lambda_2,$$
where
$$\lambda_1 = \left \lceil{\frac{\lambda}{2}}\right \rceil$$
$$\lambda_2 = \lambda - \lambda_1,$$
where
$$\lambda = \left \lfloor{\frac{n-1}{\sqrt{2/3}}}\right \rfloor+1.$$
My Maxima function for this is


Comparison Between Methods

It isn't too hard to tell that one of the two second approaches is going to produce better results than the first after some value of \(n\) (Fig. 9.). What is more surprising is that our latter two alignments stay neck and neck up into very high values of \(n\). If we divide them by the volume of the containing cube, both appear to be a round about way to arrive at the square root of 2! (Fig. 10.)
Fig. 9. Both of the latter approaches give much better packing densities than the matrix alignment.
Fig. 10. Square base packing in blue, tetrahedral packing in red, both divided by total volume (# of spheres/unit volume). Unclear if there is a distinct winner and we seem to be getting closer to \(\sqrt{2}\).

Let's see which one wins more often for positive integer values of \(n\) in Maxima.

So, there's no run away winner, but if you're a betting man, bet on square base pyramid packing out of the available choices in this post. Regardless, it appears that both of these packing arrangements approach optimal packing (see Sphere packing). My density calculation (allowing the \(\sqrt{2}\) conjecture) for volume coverage comes out to
Gauss, old boy, seems to have approved of this number.