There's more than enough information freely available to enable you to make your own FFT. You can review the pseudo-code from Wikipedia to make it yourself. I have been raving about this bit of Mathematics for several posts now. I knew that the FFT (\(O(N \log {N})\)) was much faster than computing the DFT (\(O(N^2)\)) from the definition. But how much faster is it really when you compare the methods with realistic data sets? Sorting algorithms have this difference but with small data sets (even thousands of points) like most of us work with, it doesn't matter that much.
Whenever you go about to make something that is somewhat complex, you want a more than casual check to verify that you have the right answer. One way of doing this is to create a simple method that doesn't have any complexities related to speeding up the algorithm or computation in it and use it as a reference against the more complex method you want to test. You can also find the results of someone else's work and compare against that.
I made both a DFT and FFT (dit-fft2) method and tried them out. I noticed while testing these out that I get results that agree with each other, but they differ from the results I got in Maxima in two ways: 1) the overall values appear to be factored in the Maxima FFT and there was a difference in sign on the imaginary component. I suspect Maxima is normalizing by length and using positive roots of unity instead of negative roots of unity. (Signal processing people like negative roots of unity and some other people like positive roots of unity. Go figure.) To test for consistency I used create-test-data for a small data set of 32 points and applied both dft and dit-fft2 to it. They come out the same, so I'm reasonably certain that the calculation was done correctly.
Next, I tried a larger data set, \(2^{14}\) data points. With a sampling rate of 44100 Hz, this is about 0.37s worth of PCM data. The DFT function ran in 2:41 min and the dit-fft2 function ran in 0.125s. Even at 0.125s, that is a long time to process if you have much data to churn through. A brief review of the dit-fft2 function will show that there is a lot of data copying involved in this version and so methods which avoid the need to copy data around by using an indexing scheme are likely to increase performance significantly.
Enough already—code me!