Friday, January 3, 2020

Macro Expansion Time: Connection Strings

Connection strings bother me. I'm not saying that aren't a good solution or that they shouldn't be, but I don't like having to look up special values to fill in and so on. I think the reason connection strings exist is that the available options for drivers, servers, and several other settings is unknown. There is no definitive list of all the potential types of things that need to go into a connection string. There are, however, common patterns of connection strings and values that are normally required. There's a website devoted to connection strings (https://www.connectionstrings.com/) and as far as being a very generic resource for helping you with your connection string needs, this goes a long way.

But it would nice to be able to maintain a list of connection string values that apply to various connections or connection types that matter to me so that after I have learned the values I need for certain types of connections I can represent them with a keyword and if I use a bad keyword (e.g., typo), get a warning at compile time of that fact. In the process of starting a very basic macro framework for doing this, I ran into a bump in my understanding of the macro expansion process and I thought that made it worth sharing more so than the connection string macros themselves. I'm learning to use the clsql package (available with quicklisp), which comes into play with one of the macros.

Two notable features that these macros will demonstrate are:
  1. the explicit expansion of macros within code that produces code
  2. conditional code production based on the type of data supplied
    1. In particular, variables versus literals versus keywords
The first thing I want is to have my own repository of special values for each type of information that I care about. For my present interests I am concerned with drivers and servers. You might add to this specific databases. We don't want to get stuck with having to use keywords only because this would require having every case we will ever care about in the repository or require the repository to be updated before you can call the macro, which largely defeats the intended convenience of the macros. We also don't want the look up to happen during run-time where we can avoid it so that our convenience function doesn't add operations to run-time that would have been unnecessary if we had used a less convenient literal connection string. (Sometimes we have to decide between clarity/convenience of expression on the one hand and efficiency on the other. In this case, we will try to have our cake and eat it too.)

The way we handle our repositories is with a plist and the macros for each category of thing could be similar. I show only the drivers one since the server version is not materially different. (That probably means I could take the abstraction another level to remove the repetition.)



If a literal keyword is supplied, we will get the magic string that it corresponds to. We assume that if the user has supplied a keyword that isn't in the list, that they have made a mistake and therefore issue an error. This error is raised during macro expansion time which happens prior to compilation. If it is not a keyword, then we don't try to determine anything about its validity—we let the caller take their chances. It might be a literal string or it might be a variable, but that isn't this macro's problem.



Our next macro has some more interesting features, although it is basically a glorified concatenation. One notable feature is that we will pass on requiring keyword arguments and simply use &rest and recognize what we can and let everything else pass through without a lot of scrutiny. This is a choice to avoid doing research to support cases for my abstraction that I may never use. As the maintainer of this code for myself, I can always update this macro to support additional cases in the cond statement when I come across a new requirement that I actually want to use. In general, we are expecting keywords followed by either values or keywords that apply to the keywords, but we can accept strings in place of any of the keywords. We take consecutive pairs of parameters and consider the first of a pair to be the left hand of the = and second of the pair to be on the right of the = in our connection string.



Here's a sample usage:



(Note that you can use the function write-line to convince yourself about the rectitude of the escaped backslash.)

First note the explicit calls to macroexpand with the macro. The key to understanding why this is necessary it to think about the order of execution. Suppose instead of (list "Driver" (macroexpand `(typical-drivers ,b))) we wrote simply (list "Driver" (typical-drivers b)). When the macro is expanded, it expands the macro typical-drivers without having a specific value for b supplied and as such, it is an unbound variable called b which gets passed into typical-drivers. So, the code (typical-drivers b) gets expanded to be just b. Not very useful, is it? The macroexpand wrapper allows us to delay the execution of the expansion until b is bound to a value and the , notation puts the bound value of b in here and we get the desired result.

As we go through the list of pairs, we restructure them into an association list of (left side, right side) cons cells. We want to distinguish between two kinds of pairs. Bound and non-bound. The reason this distinction matters to us is that the bound items already have values and we can put them in place right away so that we don't have to wait until later to do the final concatenation for this part of the code. If there is no unbound part, then we will have a simple, complete connection string that will in our compiled code. If there is an unbound part, then we need to return code that concatenates the pieces. It is the responsibility of the caller to provide a way for the supplied variables to be bound to a value before entering into the code that is thus generated. A wrinkle in my terminology is that bound is something that applies to symbols. If I pass in something that is not a symbol (which I will do often), then I am going to assume, for simplicity, that it is a concrete and directly usable value and hence "bound" in the sense of this macro. 

Finally, my last macro is the usage of this connection string stuff to connect to a database using clsql, where my own original exercise began: