Friday, May 24, 2019

Recursion and Accumulators

Working through Paradigms in Artificial Intelligence Programming, I have worked to wrap my head around accumulators and I have implemented one that worked surprisingly quickly. My example is a factorial design experiment. I have 4 reference implementations of a factorial design on an experiment with 7 factors, with each factor having 10 treatments in it. This is a bit of a ridiculous size, but it gave me clear separation of performances. I have split the code up into sections so you can focus on the parts of the program that matter you.

Supporting Code



Main Functions



Speed Testing

To make the testing as fair as possible, I started emacs fresh for each test run. Under these conditions, factorialize took 2.5 seconds, factorialize-2 took 3.7 seconds, factorialize-3 took 22 seconds, and factorialize-4 took 12 seconds.

A large amount of the time spent in these examples is spend in garbage collection. The factorialize-4 example could probably be improved by allocating a large chunk of memory in a 2-dimensional array and then the rest of the work involves a lot less memory allocation time. The land slide difference that marks factorialize off from the other methods appears to be the amount of shared memory it involves. Some reassignments of various locations of the resulting list, produce changes in other lists since there is shared cons cells among the lists. For my particular application this was proving to be a problem and so I have changed over to the factorialize-4 version of the function. Since my actual likely data is much smaller than that used in the speed tests of this post, I'm not overly concerned about the difference in speed.

An advantage of the method of factorialize-4, which uses an enumerator is that this method could be used to output data to a database or file in a way that didn't need to use very much run-time memory. Very likely in these cases, the time spent sending and saving data to persistent storage is much greater than the computational effort. The factorialize and factorialize-2 methods do not lend themselves to this as they do not generate complete tuples until all of the tuples are created. factorialize-4 creates complete tuples one at at time and these can be acted on immediately without waiting for all of the output to be generated.

Monday, May 6, 2019

Common Lisp Pipes→Enumerators

I've been plodding along through Peter Norvig's Paradigm's in Artificial Intelligence Programming and finding some helpful tips along the way. I especially enjoyed the discussion of pipes in chapter 9. I was intrigued by the pipes implementation of the sieve or Eratosthenes and wondered what else I could do with pipes.

Something that I didn't like about pipes is that they keep around a list of the results, when maybe you are actually done with those results. I was taking combinations of factors to produce a factorial experimental design and I decided to do so using something I have called multiple-radix-counting. Multiple-radix, meaning the radix (or base) for each digit is different. Imagine an odometer where each spinner has a different number of marks on it. Instead of always going from 0 to 9, some of the spinners might be 0 to 1 or 0 to 5. The sequencing is the same. When you get the end of the right most spinner, you wrap around on the first and advance the next spinner. Here's some sample output the shows the pattern:



I have taken the definitions from PIAP and modified them to avoid hanging on to the part of the pipe that has gone past. I have renamed certain functions/macros and made minor changes to them to produce a result that behaves differently. Instead of make-pipe, I have make-enumerator, although the macro is identical. It's important that I have a new name for this macro that suits the different usage in case I decide to implement enumerators differently in the future. The function get-current and get-next are the replacements for head and tail, respectively.  The representation is very similar to that used for pipes. I use a cons cell where the car is the current value and the cdr is a function call.



The purpose of the extract-by-multiple-radix is to take a selection out of source (an array of arrays) based on the indices that were specified. If you were doing multiplication of multiple term factors, you would take the terms by index from each factor and multiply them. To get the factors based on index, you would use extract-by-multiple-radix.