Thursday, September 6, 2012

Get Your Text Data Into Maxima

Comma Separated Value (CSV) files are often a convenient way to move your text data between a variety of software products. The makeup of a CSV file is straightforward and can be observed by opening up a CSV file in a text editor. Excel provides the ability to save a worksheet to a CSV file.

Once your data is in a CSV file, you can read it into Maxima—with a little bit of code.

But, first, some house-keeping. In order to have convenient access to routines that you or someone else has written, you'll want to put the files that contain this code on Maxima's search path. The variables file_search_maxima and file_search_lisp, indicate the paths that Maxima will look through to find code to load. You can affect these values by modifying these during start-up (or any other run-time). Place a file called maxima-init.mac in the location indicated by the environment variable MAXIMA_USERDIR and place some code to the effect of the following in it:

file_search_maxima: 
    append(file_search_maxima, 
           ["fullpathto/Useful/Maxima/Functions/$$$.{mac,mc}"]);
file_search_lisp: 
    append(file_search_lisp, 
           ["fullpathto/Useful/Maxima/Functions/$$$.{o,lisp,lsp}"]);

(See here for more settings and see chapter 32. Runtime Environment.)

To see more about how to interoperate between Lisp and Maxima, see chapter 3 of the Maxima manual. The downloadable sample code shows how to use some of these techniques.

Download the following two files and place them in the directory you have added to your search path.
  1. CSV.lisp
  2. CSV.mac
Start up Maxima and then type and execute:

load(csv);

You will now have access to 4 functions to help you get your text data into Maxima.


Function:
CSVread(filename)
CSVread(filename,linesToSkip)

CSVread will read in the data in the file and turn it into a list of lists of string data. Each line of data becomes a separate list. Optionally, you may specify a positive integer value indicating how many rows to discard, which is handy if your data file contains (possibly multiple) header rows.


Function:
CSVNumberColumns(filename, skips, columnsToKeep)

CSVNumberColumns reads data in from the file specified by filename and skips skips rows of data (which may be 0). Additionally, it will excerpt the file by only retaining the columns requested in the list columnsToKeep. To keep all of the columns, pass in an empty list ([]). If you want the first and second column only, pass in [1,2]. All data in the columns which are to be retained is expected to be a valid Maxima expression in string form. Numeric or algebraic expressions qualify. Specifically, the data items must be something which the Maxima function parse_string can interpret.


Function:
MakeMatrix(d)

MakeMatrix creates a matrix with the data d, which is a list of lists. This is handy when you need your data in matrix form to pass along to a least squares function in Maxima, such as lsquares_estimates. Note that this function does not create a "deep" copy of the data in d. It is really just a separate "view" of the same data. As a note, I found that the display of the matrix results would sometimes be transposed. Only seemed to happen in output cells in Maxima and actual data format was as expected anyway (display glitch in my version of Maxima but not a consistent one).



Example:

load(csv)$ /* only needed once per Maxima start-up */
d: CSVNumberColumns("D:/path/circledata.csv",1,[2,3]);
m: MakeMatrix(d);

(%o9) [[1007.265919151515,1000.932075151515],[1008.902086,1002.038759],[1010.969034,1002.554592],[1013.1443,1002.268315],[1015.598881,1000.804232],[
1016.837804,998.9016388],[1017.35833,996.4303561],[1016.548809,993.5742165],[1014.687232,991.6508387],[1012.069706,990.6322456],[1009.539441,990.933599],[
1007.03337,992.4675972],[1005.545131,995.07334],[1005.520783,997.8698301]]
(%o10) matrix([1007.265919151515,1000.932075151515],[1008.902086,1002.038759],[1010.969034,1002.554592],[1013.1443,1002.268315],[1015.598881,1000.804232],[1016.837804,998.9016388],[1017.35833,996.4303561],[1016.548809,993.5742165],[1014.687232,991.6508387],[1012.069706,990.6322456],[1009.539441,990.933599],[1007.03337,992.4675972],[1005.545131,995.07334],[1005.520783,997.8698301])

Local Variables, Function Parameters, and Blocks In Maxima

One of the things that was a real hang-up for me when I started programming in Maxima was its seeming lack of variable scoping rules.  It made some irritating and non-intuitive things happen.  But, because I was always in “get-it-done” mode, I didn’t stop to figure out if there was a better way.  I would resort to adding letters to the name that I wanted to use.  If my functions were going to use a, I would instead name those aa, so that my variables in the “true” global realm wouldn’t be overwritten after the next function call.  My convention would then be, “nice names in the global realm, ugly names in the local realm.”  This is far from ideal.  The fact is, Maxima defaults to global variables, but makes it easy to declare local variables.

Maxima’s primary scoping mechanism is the “block.”  A block starts with block( and ends with ).  Each statement within a block is separated by a comma (,).  The first statement in your block should normally be [v_1, v_2,…,v_n], where v_1, v_2, etc., are variables that you wish to be local (don’t literally type v_1, etc.).  If you don’t want any local variables, then omit the local statement.

What Maxima actually does is not quite the same as in most programming languages, although the apparent runtime behaviour will be the same, if the [v_1, v_2,…,v_n] statement is used as above.  Maxima saves the current values of the variables in the [v_1, v_2,…,v_n] statement and essentially “zeros out” (removes their properties) those variables.  They become blank slates.  When Maxima exits the block() in which the variable was declared as local, its current properties are removed and the saved values/properties are restored.  As far as I know, Maxima does not support multi-threading.  This is probably a good thing, as temporarily preventing a variable from “existing” (as itself, that is; a variable with the same name is not the same variable itself, philosophically) is problematic if that variable could be accessed at any time by a separate thread.

This behaviour is different from most other programming languages which use “dynamic scoping” in that there are no save and restore steps of the kind we are talking about here (in most other languages).  In Maxima, variables have names at run-time and are referenced according to their name.  In most languages, the name of a variable is solely a convenience for programmers.  At run-time, values will move to various parts of the program memory space and the memory addresses will be determined on the fly (e.g., "placed on the stack").  Just because the same name is used in the source code for two variable references, doesn’t mean the compiled code will have references to the same memory location.  For example, t declared in function1() has nothing to do with t declared in function2().  These memory locations will be on the stack and exist only while each function is being executed.  This, again, is what is true in most programming languages, not in Maxima.

When it comes to function parameters, however, Maxima`s behaviour is not unlike most other languages.  That is, the parameters are formal.  They are not considered to constitute a variable declaration.  When I write

f(x) := x^2;

x is not declared as a variable.
Even if I write

RotationalEquivalenceSet(n, bits) := block(
    local(res, size),
    res: {n},
    size: 2^bits,
    if n >= size then return({}),
    n: 2*n,
    if n >= size then n: n - size + 1,
    do (
        if elementp(n, res) then return(res),
        res: union(res, {n}),
        n: 2 * n,
        if  n >= size then n: n - size + 1
    )   
);


n is not declared as a variable.  This is a very helpful thing.  If n was assigned some value before the function call, nothing that I do inside this function call (which uses n as the name of a parameter) affects that value.  The value of n will remain the same after this function has been invoked.  Here is an input/output example of the behaviour:

/* [wxMaxima: input   start ] */
n: x^2;
RotationalEquivalenceSet(3,5);
n;
RotationalEquivalenceSet(5,5);
n;
/* [wxMaxima: input   end   ] */

(%o21) x^2
(%o22) {3,6,12,17,24}
(%o23) x^2
(%o24) {5,9,10,18,20}
(%o25) x^2


Note that I am using := to declare these functions.  You may end up with different (and undesired) behaviour if you don’t use the “delayed evaluation” assignment statement.  If you want the usual programming paradigm with your function declaration, use := as per my sample code above.