Saturday, February 8, 2020

Cluster / Hierarchical Hashing in Common Lisp

I had some application ideas where I wanted to group data by several keys. Sometimes I care about the hierarchical relationship and sometimes I don't. As an example, let's take my FitNotes exercise data. Here's a few records from way back in the day (I can hardly believe the numbers, ugh...a few years and many pounds ago):

Date Exercise Category Weight (lbs) Reps
2015-03-10 Deadlift Back 130 10
2015-03-10 Deadlift Back 130 10
2015-03-10 Deadlift Back 130 10
2015-03-10 Chin Up Back 205 4
2015-03-10 One-Arm Standing Dumbbell Press Shoulders 25 12
2015-03-10 One-Arm Standing Dumbbell Press Shoulders 25 12
2015-03-10 One-Arm Standing Dumbbell Press Shoulders 25 12
2015-03-10 Flat Dumbbell Fly Chest 30 10
2015-03-10 Flat Dumbbell Fly Chest 30 10
2015-03-14 Deadlift Back 135 10

If I wanted to show a history of data and provide the user with the option of seeing a graph of their progress (similar to what the FitNotes application does) or perhaps a list of dates to select from to see the details on that date, then I would group first by Exercise and then by date. Depending on the purpose of the grouping, I might want to group by Category first. But let's go with a graphing application for a particular exercise. I want to group by Exercise and then by date.

I create hash tables at each level except the bottom level where I create a queue. The main thing that interested me here was the setf function, since the syntax bamboozled me the first time I saw in Practical Common Lisp (it's not hard, just different). I ended up not needing it for my current application, but may want it another day. For the kind of data I'm working with you need to use the equal test in order to get your hash tables to work with strings. If you compare (setf get-cluster-hash) with add-to-cluster-hash, you'll notice they are very similar. I was tempted to make add-to-cluster-hash out of the setf and get-cluster-hash functions but realized that that entails evaluating the hashes twice for adding the first record to a cluster-key. There might be a way to eliminate the code duplication, but I gave in to copy-paste-modify. (Please don't hate me.)



Here is a simple demonstration REPL session with some of these functions:



Since I have the data of interest in a CSV file exported from FitNotes, I can just read every line of the CSV file and use add-to-cluster-hash for each line, thus grouping my data without loosing the order of the items for each (exercise, date), and all without thinking much about the details of the data structure used to do the grouping.