ΑΡΙΘΜΟΣ: 2020

Thursday, December 24, 2020

CLSQL in LispWorks: Varchar Fields

I was attempting to use CLSQL in LispWorks and had trouble reading in strings. I could even output the strings using insert, but couldn't read them in properly. I had two main issues that I needed to make changes for (running LispWorks 7.1.2 64-bit running on Windows 10, update, now running version 8.0.1 and this is working).

First, LispWorks didn't like a return type of :unsigned-long-long in "...\quicklisp\software\clsql-20160208-git\uffi\clsql-uffi.lisp". Replace these references to :unsigned-long-long with '(:unsigned :long :long) and then it will compile properly. I can't speak for whether this 100% right and I don't think that any of my current code depends on getting this right. But, it does let me compile.

The next problem was in "...\quicklisp\software\uffi-20180228-git\src\strings.lisp", the function fast-native-to-string. There's two such definitions which are split across different versions of lisp. LispWorks 7.1.2 is in the first such definition and there is a reference to base-char that is specifically used for LispWorks.

Edit: Based on comment below from a reader, the changes to make here are:

line 500: #+(or (and allegro (not ics)) lispworks4)
line 513: #+(or (and allegro ics) (and lispworks (not lispworks4)))

You will end up with:

I made a simple table with data that contains both varchar(50) and nvarchar(50) to test with, defined below. I put (mostly) the same data in each of Name and Name1 to see if there was any difference.

Using clsql to get all of the tuples that were entered is as simple as (clsql:query "select Id, Name, Name1 from TableWithStrings").

I'm not able to find who to contact regarding this as I see that the website for the project may have been abandoned (last change on the change list was in Jan 2016, specifically for LispWorks support). I note however, that someone more recently made a change for uffi as the folder name has the date in it suggesting Feb 2018.

Saturday, May 30, 2020

BMI Table with Undue Precision

The undue precision will be on my part because this post isn't really about your health, it's about doing a little mathematics. In a spreadsheet, because sometimes it's more fun there.

I have a little spreadsheet with a few examples of interesting formulas in Excel. The thing that is a bit pitiful about it is that it isn't an inline formula but a dependency which makes it hard to copy and paste. I did a piece-meal copy and paste job from here (NIH) into a table.

If you just want a spreadsheet that does this stuff for you and don't care about my blathering, download it here.

The thing that makes this table a beast is that the general usage requires you to first choose only one row of the table based on your height and then find which two weights you fall between and look up to find the BMIs you are between. And since this is a post about interpolation, you do straight interpolation to get your BMI. So, if I'm 5' - 10 1/4", and I'm dead sure to rights I'm not a 1/4" shorter, how do I proceed to get a more accurate value.

Of course, it is eye-roll worthy to care about such precision when the BMI number is a ballpark notion, anyway. But, from the stand point of data and formulas, can we do it anyway? (And, what might we learn if we do?) Here is what I have right now in all of its glorious ridiculousness:

Fig. 1 - Interpolation on steroids. Or maybe, interpolation without boundaries. Well, ok, interpolation without reason.

The simplest way of combining linear interpolations I could think of was to determine boundaries and look up the corresponding values. So we start with finding the low and high lines, in this case the lines for 70 and 71. Looking up the BMI weight included array formulas on the right hand side of the table that reference the given weight.

On the far right of the table I have formulas like the following in columns AL to AO:

=MAX(IF(B4:AK4<=B$25,B4:AK4, 0))

find the largest value in B4:AK4 that is still smaller than the weight in B25

=MIN(IF(B4:AK4>=B$25,B4:AK4, 100000))

find the smallest value in B4:AK4 that is still bigger than the weight in B25

=LOOKUP(AL4, $B4:$AK4, $B$2:$AK$2)
=LOOKUP(AM4, $B4:$AK4, $B$2:$AK$2)

It is important to use the <ctrl> + <shift> + <enter> key combination to make the first two of these formulas work as this is how you create array formulas out of formulas that are not inherently array based. (Formulas that are expecting ranges by default are inherently array based.) Taking the first formula as an example, the meaning of the syntax is something like this: "For every value in the list B4:AK4, apply this formula." The formula being applied is the IF. Anywhere you see B4:AK4 you can read, "any x that is in B4:AK4." If you C#'d this, you have something like:

(B4:AK4).select(x => if(x < B25, x, 0)).max()

LOOKUP is listed as available for backward compatibility. Sounds like it is not really desired and probably the VLookup and HLookup functions are mostly preferred, but LOOKUP has a leg up on them in one respect. I don't have to modify the layout of my table to accommodate my present purpose. I need the search value to be the current row and the return value to be the top value of a column. If I use HLookup, I need the searched value to be the top row (as far as I can see, at least). If I didn't have LOOKUP, I still have a way, of course. Copy the BMI to the bottom and return the bottom row. To my mind, the design of LOOKUP communicates purpose much more effectively: "Here is what I'm searching--shown first, Here is what I want returned from, respectively to the matching item in the search space."

The rest is just repeated application of linear interpolation. Do linear interpolation for 70 inches (in my case) and then for 71 inches. Then interpolate between those resulting BMIs to get a final BMI. In some cases I omit denominators and/or calculations where I know the result will give me 1 due to the way the spreadsheet and data set is constructed and so these get simplified out of the formulas.

Saturday, February 8, 2020

Cluster / Hierarchical Hashing in Common Lisp

I had some application ideas where I wanted to group data by several keys. Sometimes I care about the hierarchical relationship and sometimes I don't. As an example, let's take my FitNotes exercise data. Here's a few records from way back in the day (I can hardly believe the numbers, ugh...a few years and many pounds ago):

Date	Exercise	Category	Weight (lbs)	Reps
2015-03-10	Deadlift	Back	130	10
2015-03-10	Deadlift	Back	130	10
2015-03-10	Deadlift	Back	130	10
2015-03-10	Chin Up	Back	205	4
2015-03-10	One-Arm Standing Dumbbell Press	Shoulders	25	12
2015-03-10	One-Arm Standing Dumbbell Press	Shoulders	25	12
2015-03-10	One-Arm Standing Dumbbell Press	Shoulders	25	12
2015-03-10	Flat Dumbbell Fly	Chest	30	10
2015-03-10	Flat Dumbbell Fly	Chest	30	10
2015-03-14	Deadlift	Back	135	10

If I wanted to show a history of data and provide the user with the option of seeing a graph of their progress (similar to what the FitNotes application does) or perhaps a list of dates to select from to see the details on that date, then I would group first by Exercise and then by date. Depending on the purpose of the grouping, I might want to group by Category first. But let's go with a graphing application for a particular exercise. I want to group by Exercise and then by date.

I create hash tables at each level except the bottom level where I create a queue. The main thing that interested me here was the setf function, since the syntax bamboozled me the first time I saw in Practical Common Lisp (it's not hard, just different). I ended up not needing it for my current application, but may want it another day. For the kind of data I'm working with you need to use the equal test in order to get your hash tables to work with strings. If you compare (setf get-cluster-hash) with add-to-cluster-hash, you'll notice they are very similar. I was tempted to make add-to-cluster-hash out of the setf and get-cluster-hash functions but realized that that entails evaluating the hashes twice for adding the first record to a cluster-key. There might be a way to eliminate the code duplication, but I gave in to copy-paste-modify. (Please don't hate me.)

(defun make-queue ()
  (let ((li (cons nil nil)))
    (setf (car li) (cons nil nil))
    (setf (cdr li) (car li))
    li))

;; we add to the end and maintain the pointer
(defun add-queue (queue item)
  (setf (cddr queue) (cons item nil))
  (setf (cdr queue) (cddr queue))
  queue)

(defun queue->list (queue)
  (cdar queue))

(defun (setf get-cluster-hash) (value keys hash)
  (let* ((f (first keys))
         (r (rest keys)))
    (cond
     ((null f)
      nil)
     ((null r) 
      (setf (gethash f hash) value))
     (t
      (let* ((f-hash (gethash f hash)))
        (unless f-hash
          (setf f-hash (make-hash-table :test #'equal))
          (setf (gethash f hash) f-hash))
        (setf (get-cluster-hash r f-hash) value))))))

(defun get-cluster-hash (keys hash)
  (loop for k in keys
        for k-hash = (gethash k hash) then (gethash k k-hash)
        while k-hash
        finally (return k-hash)))

(defun add-to-cluster-hash (keys hash value)
  (let* ((f (first keys))
         (r (rest keys)))
    (cond
     ((null f)
      nil)
     ((null r) 
      (let ((q (gethash f hash)))
        (if q
            (add-queue q value)
          (let ((nq (make-queue)))
            (add-queue nq value)
            (setf (gethash f hash) nq)))))
     (t
      (let* ((f-hash (gethash f hash)))
        (unless f-hash
          (setf f-hash (make-hash-table :test #'equal))
          (setf (gethash f hash) f-hash))
        (add-to-cluster-hash r f-hash value))))))

Here is a simple demonstration REPL session with some of these functions:

Since I have the data of interest in a CSV file exported from FitNotes, I can just read every line of the CSV file and use add-to-cluster-hash for each line, thus grouping my data without loosing the order of the items for each (exercise, date), and all without thinking much about the details of the data structure used to do the grouping.

Friday, January 3, 2020

Macro Expansion Time: Connection Strings

Connection strings bother me. I'm not saying that aren't a good solution or that they shouldn't be, but I don't like having to look up special values to fill in and so on. I think the reason connection strings exist is that the available options for drivers, servers, and several other settings is unknown. There is no definitive list of all the potential types of things that need to go into a connection string. There are, however, common patterns of connection strings and values that are normally required. There's a website devoted to connection strings (https://www.connectionstrings.com/) and as far as being a very generic resource for helping you with your connection string needs, this goes a long way.

But it would nice to be able to maintain a list of connection string values that apply to various connections or connection types that matter to me so that after I have learned the values I need for certain types of connections I can represent them with a keyword and if I use a bad keyword (e.g., typo), get a warning at compile time of that fact. In the process of starting a very basic macro framework for doing this, I ran into a bump in my understanding of the macro expansion process and I thought that made it worth sharing more so than the connection string macros themselves. I'm learning to use the clsql package (available with quicklisp), which comes into play with one of the macros.

Two notable features that these macros will demonstrate are:

the explicit expansion of macros within code that produces code
conditional code production based on the type of data supplied

In particular, variables versus literals versus keywords

The first thing I want is to have my own repository of special values for each type of information that I care about. For my present interests I am concerned with drivers and servers. You might add to this specific databases. We don't want to get stuck with having to use keywords only because this would require having every case we will ever care about in the repository or require the repository to be updated before you can call the macro, which largely defeats the intended convenience of the macros. We also don't want the look up to happen during run-time where we can avoid it so that our convenience function doesn't add operations to run-time that would have been unnecessary if we had used a less convenient literal connection string. (Sometimes we have to decide between clarity/convenience of expression on the one hand and efficiency on the other. In this case, we will try to have our cake and eat it too.)

The way we handle our repositories is with a plist and the macros for each category of thing could be similar. I show only the drivers one since the server version is not materially different. (That probably means I could take the abstraction another level to remove the repetition.)

If a literal keyword is supplied, we will get the magic string that it corresponds to. We assume that if the user has supplied a keyword that isn't in the list, that they have made a mistake and therefore issue an error. This error is raised during macro expansion time which happens prior to compilation. If it is not a keyword, then we don't try to determine anything about its validity—we let the caller take their chances. It might be a literal string or it might be a variable, but that isn't this macro's problem.

Our next macro has some more interesting features, although it is basically a glorified concatenation. One notable feature is that we will pass on requiring keyword arguments and simply use &rest and recognize what we can and let everything else pass through without a lot of scrutiny. This is a choice to avoid doing research to support cases for my abstraction that I may never use. As the maintainer of this code for myself, I can always update this macro to support additional cases in the cond statement when I come across a new requirement that I actually want to use. In general, we are expecting keywords followed by either values or keywords that apply to the keywords, but we can accept strings in place of any of the keywords. We take consecutive pairs of parameters and consider the first of a pair to be the left hand of the = and second of the pair to be on the right of the = in our connection string.

(defmacro make-connection-string (&rest rest)
  (destructuring-bind (bound-list unbound-list)
      (loop for pair on rest by #'cddr
            for a = (car pair)
            for b = (cadr pair)
            for (a-name b-value) =
            (cond
	     ((eql a :driver) (list "Driver" (macroexpand `(typical-drivers ,b))))
	     ((eql a :server) (list "Server" (macroexpand `(typical-servers ,b))))
	     ((eql a :trusted) (list "Trusted_Connection"
				     (if b "Yes" "No")))
	     (t (list (format nil "~a" a) b)))
            if (or (not (symbolp b-value))
                   (boundp b-value)) collect (cons a-name b-value) into bound
            else collect (cons a-name b-value) into unbound
            finally (return (list bound unbound)))
    (let ((bound-part
	   (apply #'concatenate 'string
		  (mapcar #'(lambda(u) (format nil "~a=~a;" (car u) (cdr u)))
			  bound-list))))
      (cond
       ((null unbound-list) bound-part)
       (t
        `(concatenate 'string
                      ,bound-part
                      (concatenate 'string
                                   (mapcar
                                    #'(lambda(u) (format nil "~a=~a" (car u) (cdr u)))
                                    ,unbound-list))))))))

Here's a sample usage:

(Note that you can use the function write-line to convince yourself about the rectitude of the escaped backslash.)

First note the explicit calls to macroexpand with the macro. The key to understanding why this is necessary it to think about the order of execution. Suppose instead of (list "Driver" (macroexpand `(typical-drivers ,b))) we wrote simply (list "Driver" (typical-drivers b)). When the macro is expanded, it expands the macro typical-drivers without having a specific value for b supplied and as such, it is an unbound variable called b which gets passed into typical-drivers. So, the code (typical-drivers b) gets expanded to be just b. Not very useful, is it? The macroexpand wrapper allows us to delay the execution of the expansion until b is bound to a value and the , notation puts the bound value of b in here and we get the desired result.

As we go through the list of pairs, we restructure them into an association list of (left side, right side) cons cells. We want to distinguish between two kinds of pairs. Bound and non-bound. The reason this distinction matters to us is that the bound items already have values and we can put them in place right away so that we don't have to wait until later to do the final concatenation for this part of the code. If there is no unbound part, then we will have a simple, complete connection string that will in our compiled code. If there is an unbound part, then we need to return code that concatenates the pieces. It is the responsibility of the caller to provide a way for the supplied variables to be bound to a value before entering into the code that is thus generated. A wrinkle in my terminology is that bound is something that applies to symbols. If I pass in something that is not a symbol (which I will do often), then I am going to assume, for simplicity, that it is a concrete and directly usable value and hence "bound" in the sense of this macro.

Finally, my last macro is the usage of this connection string stuff to connect to a database using clsql, where my own original exercise began:

ΑΡΙΘΜΟΣ

Pages