Practical Fast Polynomial Multiplication

Efficient FFT Algorithm and Programming Tricks

Just a collection of notes on FFT technology papers.

Practical Fast Polynomial Multiplication

Efficient FFT Algorithm and Programming Tricks

Practical Fast Polynomial Multiplication

Efficient FFT Algorithm and Programming Tricks

Listening around on 6 metres this morning and hearing lots of signals all over the midwest.

Yesterday, thanks to my friend Eldon WA0UWH I had an opportunity to do a little welding both on steel and aluminum. I have not done any welding in over 40 years and have only done Oxy-Acetylene and stick arc welding previously. While I certainly would not hire out as a welder, I did ok for a first attempt. It was sure fun and appreciate Eldon offering the use of his equipment and expertise.

We then turned our attention to his non-functioning Bus Pirate and were able to determine that the CLK output from the PIC device on his board is in fact non-functional. The board additionally has firmware self-test code and it indicated a failure of the CLK output as well. The rest of the circuit checked out. Eldon is going to order up a new PIC chip from Mouser and I think he will be back in business.

The last project for the afternoon (other than lunch at a local Mexican restaurant that I paid less for both of us to eat lunch that I do for just myself normally) was to lower Eldon's long wire antenna that we put up back in January of 2012. The line we use to raise it had gotten stuck probably where it had cut into the hemlock tree's branches when we pulled it up in the first place. After loosening the loop of line to the top of the tree, we wrapped the line around a length of 2x6 and the two of us took an end of the board and walked away from the tree putting increasing pressure on the line until it either broke the line (unlikely) or brought down the branch. In fact, neither happened as the line finally broke free and we were able to lower the antenna normally.

Eldon then installed a length of plastic tubing over the line and pulled it to the top of the tree where it remained when he pulled the line back down, so hopefully it is providing some protection for both the tree and the line. It was a bit comical to watch us try to stuff 3/8" line through this 6-8 foot length of plastic tubing just slightly larger inside diameter than the rope outside diameter. We were able to stuff a wire thru the hose, but only after running water through the hose slick things up a bit. Pulling the wire back with the rope connected by tape resulted in them parting ways with the rope about 2 feet from the end. Blue tape hates water and refused to stick to the rope, and now the rope was a bit saturated. We found some string that we could tie to the rope and still have it all fit in the hose, but were now faced with the problem of how to get the string through the hose to pull the rope back thru. In the end, the air compressor came to the rescue. I tied a knot in the end of the string and used an air nozzle to blow it through the hose (after blowing out all the water).

Finally we had success and were able to get the line through the hose and pulled it up to the top of the tree. Raising the antenna back up, Eldon is back in business. We still need to do the same thing at the other end of the antenna, but that is a project for another day.

We then turned our attention to his non-functioning Bus Pirate and were able to determine that the CLK output from the PIC device on his board is in fact non-functional. The board additionally has firmware self-test code and it indicated a failure of the CLK output as well. The rest of the circuit checked out. Eldon is going to order up a new PIC chip from Mouser and I think he will be back in business.

The last project for the afternoon (other than lunch at a local Mexican restaurant that I paid less for both of us to eat lunch that I do for just myself normally) was to lower Eldon's long wire antenna that we put up back in January of 2012. The line we use to raise it had gotten stuck probably where it had cut into the hemlock tree's branches when we pulled it up in the first place. After loosening the loop of line to the top of the tree, we wrapped the line around a length of 2x6 and the two of us took an end of the board and walked away from the tree putting increasing pressure on the line until it either broke the line (unlikely) or brought down the branch. In fact, neither happened as the line finally broke free and we were able to lower the antenna normally.

Eldon then installed a length of plastic tubing over the line and pulled it to the top of the tree where it remained when he pulled the line back down, so hopefully it is providing some protection for both the tree and the line. It was a bit comical to watch us try to stuff 3/8" line through this 6-8 foot length of plastic tubing just slightly larger inside diameter than the rope outside diameter. We were able to stuff a wire thru the hose, but only after running water through the hose slick things up a bit. Pulling the wire back with the rope connected by tape resulted in them parting ways with the rope about 2 feet from the end. Blue tape hates water and refused to stick to the rope, and now the rope was a bit saturated. We found some string that we could tie to the rope and still have it all fit in the hose, but were now faced with the problem of how to get the string through the hose to pull the rope back thru. In the end, the air compressor came to the rescue. I tied a knot in the end of the string and used an air nozzle to blow it through the hose (after blowing out all the water).

Finally we had success and were able to get the line through the hose and pulled it up to the top of the tree. Raising the antenna back up, Eldon is back in business. We still need to do the same thing at the other end of the antenna, but that is a project for another day.

Here is my function for determining an appropriate set of divisors (HSDiv and N1) given the desired output frequency that keeps the Si570 DCO within it's 4.85 GHz to 5.67 GHz operating range.

#define eSuccess 0

#define eFail 1

#define fDCOMinkHz 4850000 // Minimum DCO frequency in kHz

#define fDCOMaxkHz 5670000 // Maximum DCO frequency in kHz

int findDivisors(uint32_t fout, uint16_t &hsdiv, uint16_t &n1)

{

const uint16_t HS_DIV[] = {11, 9, 7, 6, 5, 4};

int32_t fout_kHz = fout / 1000;

// Floor of the division

uint16_t maxDivider = fDCOMaxkHz / fout_kHz;

// Ceiling of the division

n1 = 1 + ((fDCOMinkHz - 1) / fout_kHz / 11);

if (n1 < 1 || n1 > 128)

return eFail;

while (n1 <= 128)

{

if (0 == n1 % 2 || 1 == n1)

{

// Try each divisor from largest to smallest order

// to minimize power

for (int i = 0; i < 6 ; ++i)

{

hsdiv = HS_DIV[i];

if (hsdiv * n1 <= maxDivider)

return eSuccess;

}

}

n1++;

}

return eFail;

}

Plugging in my reset frequency of 56320000 we correctly calculate HSDiv of 11 and N1 of 8. Spot checking other fOut values with the results from SiLabs tools so far checks out. I will create a comprehensive test for this later, but pressing on...

So, considering the following test code:

uint32_t fOut = 56320000; // My Si570 reset frequency

uint32_t fxtal = 114252366;

uint16_t HSDiv;

uint16_t N1;

uint64_t RFREQ;

if (findDivisors(fOut, HSDiv, N1) == eSuccess)

{

uint64_t tmp = (uint64_t) fOut * HSDiv * N1;

tmp = (tmp << 28) / (uint64_t) fxtal;

RFREQ = tmp + ((tmp & 1<<(28-1))<<1); // Round result

}

So, with the input parameters specified, the RFREQ value I calculate is 0x2B6109F04 as compared to the reset value of 0x2B6109F08 which is within 0.00000001490116119384765625 of the reset value. This amounts to an output frequency of 56.31999998 instead of 56.32 which is within 0.02 Hz. I don't think I will worry about it.

As always, comments welcome. Straighten me out if I am not thinking clearly about this.

#define eSuccess 0

#define eFail 1

#define fDCOMinkHz 4850000 // Minimum DCO frequency in kHz

#define fDCOMaxkHz 5670000 // Maximum DCO frequency in kHz

int findDivisors(uint32_t fout, uint16_t &hsdiv, uint16_t &n1)

{

const uint16_t HS_DIV[] = {11, 9, 7, 6, 5, 4};

int32_t fout_kHz = fout / 1000;

// Floor of the division

uint16_t maxDivider = fDCOMaxkHz / fout_kHz;

// Ceiling of the division

n1 = 1 + ((fDCOMinkHz - 1) / fout_kHz / 11);

if (n1 < 1 || n1 > 128)

return eFail;

while (n1 <= 128)

{

if (0 == n1 % 2 || 1 == n1)

{

// Try each divisor from largest to smallest order

// to minimize power

for (int i = 0; i < 6 ; ++i)

{

hsdiv = HS_DIV[i];

if (hsdiv * n1 <= maxDivider)

return eSuccess;

}

}

n1++;

}

return eFail;

}

Plugging in my reset frequency of 56320000 we correctly calculate HSDiv of 11 and N1 of 8. Spot checking other fOut values with the results from SiLabs tools so far checks out. I will create a comprehensive test for this later, but pressing on...

So, considering the following test code:

uint32_t fOut = 56320000; // My Si570 reset frequency

uint32_t fxtal = 114252366;

uint16_t HSDiv;

uint16_t N1;

uint64_t RFREQ;

if (findDivisors(fOut, HSDiv, N1) == eSuccess)

{

uint64_t tmp = (uint64_t) fOut * HSDiv * N1;

tmp = (tmp << 28) / (uint64_t) fxtal;

RFREQ = tmp + ((tmp & 1<<(28-1))<<1); // Round result

}

So, with the input parameters specified, the RFREQ value I calculate is 0x2B6109F04 as compared to the reset value of 0x2B6109F08 which is within 0.00000001490116119384765625 of the reset value. This amounts to an output frequency of 56.31999998 instead of 56.32 which is within 0.02 Hz. I don't think I will worry about it.

As always, comments welcome. Straighten me out if I am not thinking clearly about this.

So, if I have learned nothing else, it is that I am no mathematician and am really slow at figuring out this stuff. Here is where I am at with regard to using integer math instead of floating point math for calculating the Si570 frequency setting elements.

Consider the following code segment:

uint32_t fxtal;

uint32_t fOut = 56320000; // My Si570 reset frequency

uint64_t RFREQ = 0x2B6109f08; // My Si570 RFREQ after reset

uint16_t HSDiv = 11; // My Si570 HSDiv after reset

uint16_t N1 = 8; // My Si570 N1 after reset

uint64_t tmp;

// Calculate fXtal for my device

fxtal = (((uint64_t) fOut * HSDiv * N1) << 28) / RFREQ;

// Calculate fOut value

tmp = ((uint64_t) fxtal * RFREQ / (uint64_t) (HSDiv * N1));

tmp = tmp + ((tmp & 1<<(28-1))<<1); // Round result

fOut = tmp >> 28;

Ok, so that is a bunch of math weirdness... Let me try to illustrate a bit what I am doing.

The fxtal and fout values are 32 bit integers. RFREQ on the other hand is a 38 bit fractional value with 10 bits of integer and 28 bits of fraction stored in a 64 bit unsigned integer.

So, promoting fOut to 64 bits before multiplication and then shifting left by 28 bits will align the decimal point between this partial result and the implied decimal point in RFREQ. The division by RFREQ results in a 32 bit value with zero fractional bits. In the case of the code above, I get the value 114252365 Hz. I should probably round this result, but let's leave it as is for the moment.

Now that I know the crystal frequency, I should be able reverse the calculation to obtain the same fOut that I started with. Remember that fOut = fxtal * RFREQ / (HSDiv * N1).

fOut is a 32 bit unsigned integer with zero fractional bits. The formula contains an element (RFREQ) that has 28 fractional bits and 10 integer bits stored in a 64 bit integer type. Basically it is the number 11644477192 / 2^28 which is 43.3790579140186309814453125 out to more fractional digits than I can handle (or need).

So, to handle this multiplication I use the integer value representation of RFREQ as a fixed point number with an implied decimal point between the 28th and 29th bits. Multiplying RFREQ by fxtal which is a 32 bit integer, gives me a result that is 42 bits of integer (32+10) and 28 bits of fraction (28+0). The result size (in bits) is the sum of the number of bits in the fractional part of each number plus the sum of the number of bits in the integer part. Even though the result is 70 bits total (42+28), the upper 6 bits are unused, so the truncation to fit in a 64 bit unsigned integer results in no data loss.

Rounding the result is recommended, so we isolate at the most significant fractional bit and add it to the integer part. Now, we have fOut scaled by 2^28. Shifting right by 28 bits gives us the integer result we started with of 56320000.

Ok, now to calculate RFREQ, HSDiv and N1 given a desired fOut in the next posting.

More to come...

Consider the following code segment:

uint32_t fxtal;

uint32_t fOut = 56320000; // My Si570 reset frequency

uint64_t RFREQ = 0x2B6109f08; // My Si570 RFREQ after reset

uint16_t HSDiv = 11; // My Si570 HSDiv after reset

uint16_t N1 = 8; // My Si570 N1 after reset

uint64_t tmp;

// Calculate fXtal for my device

fxtal = (((uint64_t) fOut * HSDiv * N1) << 28) / RFREQ;

// Calculate fOut value

tmp = ((uint64_t) fxtal * RFREQ / (uint64_t) (HSDiv * N1));

tmp = tmp + ((tmp & 1<<(28-1))<<1); // Round result

fOut = tmp >> 28;

Ok, so that is a bunch of math weirdness... Let me try to illustrate a bit what I am doing.

The fxtal and fout values are 32 bit integers. RFREQ on the other hand is a 38 bit fractional value with 10 bits of integer and 28 bits of fraction stored in a 64 bit unsigned integer.

So, promoting fOut to 64 bits before multiplication and then shifting left by 28 bits will align the decimal point between this partial result and the implied decimal point in RFREQ. The division by RFREQ results in a 32 bit value with zero fractional bits. In the case of the code above, I get the value 114252365 Hz. I should probably round this result, but let's leave it as is for the moment.

Now that I know the crystal frequency, I should be able reverse the calculation to obtain the same fOut that I started with. Remember that fOut = fxtal * RFREQ / (HSDiv * N1).

fOut is a 32 bit unsigned integer with zero fractional bits. The formula contains an element (RFREQ) that has 28 fractional bits and 10 integer bits stored in a 64 bit integer type. Basically it is the number 11644477192 / 2^28 which is 43.3790579140186309814453125 out to more fractional digits than I can handle (or need).

So, to handle this multiplication I use the integer value representation of RFREQ as a fixed point number with an implied decimal point between the 28th and 29th bits. Multiplying RFREQ by fxtal which is a 32 bit integer, gives me a result that is 42 bits of integer (32+10) and 28 bits of fraction (28+0). The result size (in bits) is the sum of the number of bits in the fractional part of each number plus the sum of the number of bits in the integer part. Even though the result is 70 bits total (42+28), the upper 6 bits are unused, so the truncation to fit in a 64 bit unsigned integer results in no data loss.

Rounding the result is recommended, so we isolate at the most significant fractional bit and add it to the integer part. Now, we have fOut scaled by 2^28. Shifting right by 28 bits gives us the integer result we started with of 56320000.

Ok, now to calculate RFREQ, HSDiv and N1 given a desired fOut in the next posting.

More to come...

While I certainly didn't intend to write a novel on the topic of Si570 programming, it is a topic that at least for me is unclear from datasheet. What follows are my current findings.

Silicon Labs puts out an application program that will calculate the various settings for the chip given a desired frequency. It will also calculate a new frequency and the associated settings based on a ppm change.

What becomes apparent from this tool is that Silicon Labs considers the 3500 ppm frequency change to be based on the output frequency rather than the DCO frequency. My interpretation of the datasheet therefore has been incorrect as I had concluded that the 3500 ppm frequency excursion limit was calculated based on the DCO frequency.

Based on this, the following table shows the amount of frequency excursion that can be done taking into account the Minima transceiver IF frequency of 19.997000 Mhz.

Band + IF 3500 ppm (kHz)

1.8 21.797 76.2895

3.5 23.497 82.2395

5 24.997 87.4895

7 26.997 94.4895

14 33.997 118.9895

18 37.997 132.9895

21 40.997 143.4895

24 43.997 153.9895

28 47.997 167.9895

29 48.997 171.4895

30 49.997 174.9895

50 69.997 244.9895

I have also worked up a spreadsheet that illustrates the amount of change in RFREQ for a given output frequency increment of 0.01, 0.1, 1, 10, etc Hz.

** fOut (MHz) fOut RFREQ **

change (Hz) change

14.00000000 0 0

14.00000001 0.01 9

14.00000010 0.1 82

14.00000100 1 818

14.00001000 10 8177

14.00010000 100 81763

14.00100000 1000 817625

14.01000000 10000 8176246

14.10000000 100000 81762455

15.00000000 1000000 817624545

Based on this, it should be possible at these operating frequencies to set the frequency to within .01 Hz for ham bands up through and including the 6 metre band.

You would not however be able to keep the same divisors over more than the 3500 ppm range specified for the band of interest, contrary to my table above.

More to come...

Silicon Labs puts out an application program that will calculate the various settings for the chip given a desired frequency. It will also calculate a new frequency and the associated settings based on a ppm change.

What becomes apparent from this tool is that Silicon Labs considers the 3500 ppm frequency change to be based on the output frequency rather than the DCO frequency. My interpretation of the datasheet therefore has been incorrect as I had concluded that the 3500 ppm frequency excursion limit was calculated based on the DCO frequency.

Based on this, the following table shows the amount of frequency excursion that can be done taking into account the Minima transceiver IF frequency of 19.997000 Mhz.

Band + IF 3500 ppm (kHz)

1.8 21.797 76.2895

3.5 23.497 82.2395

5 24.997 87.4895

7 26.997 94.4895

14 33.997 118.9895

18 37.997 132.9895

21 40.997 143.4895

24 43.997 153.9895

28 47.997 167.9895

29 48.997 171.4895

30 49.997 174.9895

50 69.997 244.9895

I have also worked up a spreadsheet that illustrates the amount of change in RFREQ for a given output frequency increment of 0.01, 0.1, 1, 10, etc Hz.

change (Hz) change

14.00000000 0 0

14.00000001 0.01 9

14.00000010 0.1 82

14.00000100 1 818

14.00001000 10 8177

14.00010000 100 81763

14.00100000 1000 817625

14.01000000 10000 8176246

14.10000000 100000 81762455

15.00000000 1000000 817624545

Based on this, it should be possible at these operating frequencies to set the frequency to within .01 Hz for ham bands up through and including the 6 metre band.

You would not however be able to keep the same divisors over more than the 3500 ppm range specified for the band of interest, contrary to my table above.

More to come...

In thinking about alternative methods of calculating the necessary parameters for the Si570 frequency setting elements, other than double precision floating point math, I have considered a number of approaches.

- Fixed point math
- Table lookup

The application of fixed point math is certainly a viable approach and one that I plan to complete an implementation of for purposes of comparison of strengths and weaknesses. I wondered however about the need for such an approach when I consider a number of characteristics of the Si570.

- There is a fair amount of frequency excursion that is possible without changing the values of N1 and HSDiv.
- For any given pair of N1 and HSDiv, there is a minimum and maximum frequency that can be represented.
- For a desired frequency increment within the minimum and maximum range of the N1 and HSDiv values currently set, there is a value that must be added to the current RFREQ value to change the frequency by that increment.

One could envision an approach where a table of output frequency ranges is stored consisting of an RFREQ, N1 and HSDiv value for the start of the range. For the set of frequency increments desired (1Hz, 10Hz, 100Hz, 1kHz, etc) a RFREQ increment could be calculated in advance and stored with the rest of the table entries.

The frequency setting calculation for Si570 as we know is:

fOut = fXtal * RFREQ / (N1 * HSDiv)

A change of frequency without changing N1 or HSDiv might be represented as follows:

fOutNew = fXtal * (RFREQ + delta) / (N1 * HSDiv)

In practice, it might be tempting to calculate a single value for the smallest frequency change we wish to be able to apply and then just multiply that value by the number of increments of that amount we wish to make. For example, if we decide the smallest frequency increment is 1Hz and calculate a value for delta necessary to move the output frequency by 1Hz, then to move 10Hz, we would just multiply the value by 10 and add it to RFREQ. I think there is some danger in this approach that may or may not be sufficiently significant, depending on the application.

Any errors involved in the calculation of the stored values for RFREQ and delta will tend to accumulate. These errors may in fact be very small, but it should be considered. Calculating the values for several frequency increments in advance and always using the largest increment first where possible may help to minimize such errors at the expense of a little more table space.

It may also be desirable to store more than one value for RFREQ depending on the amount of error between the bottom and top of the range represented by the values of N1 and HSDiv. Perhaps storing a value for the middle of the range rather than either end of the range or maybe storing a value for the top and bottom of the range. The amount of error accumulation will need be analyzed and an appropriate scheme chosen to meet the accuracy needs of the application.

I believe it should be possible with a couple dozen table entries to represent the entire range of the CMOS version (10-160 MHz) with this approach with sufficient frequency setting resolution to implement a general purpose VFO or general coverage receiver.

Another item to consider for any table driven approach is how to calibrate the values stored. An overall calibration value is unlikely to be applicable across the entire Si570 frequency range. It may be that calibration values will need be determined and stored for each pair of N1 and HSDiv.

More to come...

Subscribe to:
Posts (Atom)