ΑΡΙΘΜΟΣ: February 2020

I had some application ideas where I wanted to group data by several keys. Sometimes I care about the hierarchical relationship and sometimes I don't. As an example, let's take my FitNotes exercise data. Here's a few records from way back in the day (I can hardly believe the numbers, ugh...a few years and many pounds ago):

Date	Exercise	Category	Weight (lbs)	Reps
2015-03-10	Deadlift	Back	130	10
2015-03-10	Deadlift	Back	130	10
2015-03-10	Deadlift	Back	130	10
2015-03-10	Chin Up	Back	205	4
2015-03-10	One-Arm Standing Dumbbell Press	Shoulders	25	12
2015-03-10	One-Arm Standing Dumbbell Press	Shoulders	25	12
2015-03-10	One-Arm Standing Dumbbell Press	Shoulders	25	12
2015-03-10	Flat Dumbbell Fly	Chest	30	10
2015-03-10	Flat Dumbbell Fly	Chest	30	10
2015-03-14	Deadlift	Back	135	10

If I wanted to show a history of data and provide the user with the option of seeing a graph of their progress (similar to what the FitNotes application does) or perhaps a list of dates to select from to see the details on that date, then I would group first by Exercise and then by date. Depending on the purpose of the grouping, I might want to group by Category first. But let's go with a graphing application for a particular exercise. I want to group by Exercise and then by date.

I create hash tables at each level except the bottom level where I create a queue. The main thing that interested me here was the setf function, since the syntax bamboozled me the first time I saw in Practical Common Lisp (it's not hard, just different). I ended up not needing it for my current application, but may want it another day. For the kind of data I'm working with you need to use the equal test in order to get your hash tables to work with strings. If you compare (setf get-cluster-hash) with add-to-cluster-hash, you'll notice they are very similar. I was tempted to make add-to-cluster-hash out of the setf and get-cluster-hash functions but realized that that entails evaluating the hashes twice for adding the first record to a cluster-key. There might be a way to eliminate the code duplication, but I gave in to copy-paste-modify. (Please don't hate me.)

(defun make-queue ()
  (let ((li (cons nil nil)))
    (setf (car li) (cons nil nil))
    (setf (cdr li) (car li))
    li))

;; we add to the end and maintain the pointer
(defun add-queue (queue item)
  (setf (cddr queue) (cons item nil))
  (setf (cdr queue) (cddr queue))
  queue)

(defun queue->list (queue)
  (cdar queue))

(defun (setf get-cluster-hash) (value keys hash)
  (let* ((f (first keys))
         (r (rest keys)))
    (cond
     ((null f)
      nil)
     ((null r) 
      (setf (gethash f hash) value))
     (t
      (let* ((f-hash (gethash f hash)))
        (unless f-hash
          (setf f-hash (make-hash-table :test #'equal))
          (setf (gethash f hash) f-hash))
        (setf (get-cluster-hash r f-hash) value))))))

(defun get-cluster-hash (keys hash)
  (loop for k in keys
        for k-hash = (gethash k hash) then (gethash k k-hash)
        while k-hash
        finally (return k-hash)))

(defun add-to-cluster-hash (keys hash value)
  (let* ((f (first keys))
         (r (rest keys)))
    (cond
     ((null f)
      nil)
     ((null r) 
      (let ((q (gethash f hash)))
        (if q
            (add-queue q value)
          (let ((nq (make-queue)))
            (add-queue nq value)
            (setf (gethash f hash) nq)))))
     (t
      (let* ((f-hash (gethash f hash)))
        (unless f-hash
          (setf f-hash (make-hash-table :test #'equal))
          (setf (gethash f hash) f-hash))
        (add-to-cluster-hash r f-hash value))))))

Here is a simple demonstration REPL session with some of these functions:

Since I have the data of interest in a CSV file exported from FitNotes, I can just read every line of the CSV file and use add-to-cluster-hash for each line, thus grouping my data without loosing the order of the items for each (exercise, date), and all without thinking much about the details of the data structure used to do the grouping.

ΑΡΙΘΜΟΣ

Pages

Saturday, February 8, 2020

Cluster / Hierarchical Hashing in Common Lisp

Blog Archive

Most Viewed