## Saturday, September 19, 2015

### Circle Fitting in Maxima

In previous posts, I've considered the problem of fitting a circle to 2D data points and made some remarks about 3D cases. Fitting problems such as this require:
1. An equation form to fit to.
2. A function to quantify the error, which is to be minimized.
For a circle, the form of equation we want to fit to is
$r^2=(x-a)^2 + (y-b)^2.$
Maxima's lsquares package removes the burden of item number 2 for us. But for interest's sake, the error function we want to minimize can be given as
$\mathrm{r}\left( a,b\right) =\frac{\sum_{i=1}^{n}\sqrt{{\left( {x}_{i}-a\right) }^{2}+{\left( {y}_{i}-b\right) }^{2}}}{n}$
$\mathrm{SSE}\left( a,b\right) =\sum_{i=1}^{n}{\left( \mathrm{r}\left( a,b\right) -\sqrt{{\left( {x}_{i}-a\right) }^{2}+{\left( {y}_{i}-b\right) }^{2}}\right) }^{2}.$
Recalling that $$\sum_{i=1}^{n}(\bar{x}-x_i)^2 = \sum_{i=1}^{n}{x_i}^2 - (\sum_{i=1}^{n}{x_i})/n$$, we see that
$\mathrm{SSE}\left( a,b\right) =\sum_{i=1}^{n}\left({\left( {x}_{i}-a\right) }^{2}+{\left( {y}_{i}-b\right) }^{2}\right)-\frac{{\left(\sum_{i=1}^{n}\sqrt{{\left({x}_{i}-a\right) }^{2}+{\left( {y}_{i}-b\right) }^{2}}\right)}^{2}}{n}.$

### lsquares_estimates()

Given a set of points in matrix form (m), we need only to use the equation of a circle and indicate which are the data point variables and which are the "solve for" variables. In our problem,

lsquares_estimates(m, [x,y], (x-a)^2 + (y-b)^2 = r^2, [a,b,r]);

will suffice. The following function can be used to check the SSE.

sse(a, b, r, pts) := sum((r-sqrt((pts[i][1]-a)^2+(pts[i][2]-b)^2))^2,i,1,length(pts));

### Testing for Reasonableness

It is reasonable to ask whether least squares (or other data fitting methods) will reliably produce good results with a particular quality of data for a particular application. This is true for any problem. What if the math works fine, but the likelihood of being able to "math filter" the  likely errors in the data to reach an accurate answer is not so high? In other words, how much inaccuracy in the collected data can you tolerate and still have a reasonable expectation of getting to the right answer? I'm not going to attempt to deal with probabilities (per se) or confidence intervals, but look at a simulation approach that may be able to give you an idea of the reasonableness of the circle fit based on an (intuitively) estimated amount of random error in your data.

### Simulation of Circular Arc Data

We will consider the problem of data collected, using a total station, of a circular arc, with an "as-built" scenario particularly in view. As such, auto-generated sample points should simulate the expected behaviors of a rod/prism person stepping out regular sample locations from start to end of an ostensibly circular arc. This can be implemented as random normal perturbations of a circular arc.

One of the first pieces to a realistic perturbed circle is to be able to apply randomization that follows a normal distribution given an expected value and a standard deviation. Maxima has a function that behaves this way. random_normal() can be called with an expected value and standard deviation and optionally with a number of values to generate. Here is a sample use and output of the function, pictured using the histogram function:

load(distrib)$load(draw)$
histogram(
random_normal(50.0, 1.0, 1000),
nclasses = 10,
xrange = [40,60]);

 Fig. 1. Normally distributed data
The standard deviation should be chosen such that roughly two-thirds of the time (≈68%), the sample value will be within one standard deviation of the mean; that is, on the interval $$(\bar{x}-\sigma, \bar{x}+\sigma).$$

Where the radius is concerned, we have the relatively predictable variations of the prism person's actions and the relatively unpredictable variations of the ostensible circular arc, which could turn out to not be an arc at all. In the absence of any sort of meaningful solution to this latter variation, we could possibly just bump up the standard deviation. But I'm not clear on whether this is a meaningful thing to do in assessing the acceptability of the data collection precision. The error should be "small" if the thing we measured conforms well to a circle. In order to check for whether this is "small" you need to produce a standard deviation from the SSE: $\sigma_{SSE} = \sqrt{\frac{SSE}{n-1}}$ In the scenario we're dealing with here, someone would probably draw the computed arc in a CAD program and compare it to the measured data points and in an "arm-waving" sort of way decide whether it looks good or not.

Where the point sampling distances are concerned, we will assume that the rod person can step out distances (or angle sweeps) within some percentage of that intended. We will also make the pragmatic assumption that the prism person has decided a definite number of samples per given arc and will try to make them evenly spaced. The prism person will try to subdivide the remaining space at each sample point, and two thirds of the time, they will be within some percentage of the distance (or angle sweep) they intended.

Here is the perturbedArc() function with a sample usage:

 Fig. 2 - This is a perturbed circular arc. Doesn't look very perturbed does it? If we bumped up the standard deviation of the radius enough, we'd get something more clearly perturbed.
To produce the matrix form of the data I use a simple function I created (see here) called MakeMatrix(). Beyond that, it is a simple as

m: MakeMatrix(pts)\$
lsquares_estimates(m, [x,y], (x-a)^2 + (y-b)^2 = r^2, [a,b,r]);
subst(%[1],sse(a,b,abs(r),pts));