Monday, January 19, 2015

I know a lot of postings have been written about analogue to digital conversion rates in the 8 bit Arduino processors.  I decided to do a little poking around and performance timing to see for myself how well these little processors perform.  I will compare the performance of the 8 bit ATMega328 and ATMega2560 processors with the 32 bit Arduino Due processor.

The ADC clock is 16 MHz divided by a prescale factor.  The default setting is found in wiring.c:

// set a2d prescale factor to 128
// 16 MHz / 128 = 125 KHz, inside the desired 50-200 KHz range.
// XXX: this will not work properly for other clock speeds, and
// this code should use F_CPU to determine the prescale factor.

// enable a2d conversions

Using the default setting of 128 for the prescale factor gives a conversion clock of 125 kHz.  Since ADC conversion requires 13 ADC clocks the effective sample rate at best is approximately 125 kHz / 13 = 9.615 kHz.

Using a prescale of 16 would give an ADC clock of 1 MHz and a sample rate of 76.923 kHz.  Increasing the ADC clock can affect ADC accuracy however.  ATMel recommends that the maximum ADC clock frequency is limited by the internal DAC in the conversion circuitry and should not exceed 200 kHz.  However frequencies up to 1 MHz do not reduce the ADC resolution significantly.  Operation above 1 Mhz has not been characterized however.

So to do a quick test of the impact on performance I did a quick an dirty script to measure the time required to do 1000 analogRead operations before and after speeding up the ADC clock and see how much performance gain there is.

// useful defines for setting and clearing register bits
#define cbi(sfr, bit) (_SFR_BYTE(sfr) &= ~_BV(bit))
#define sbi(sfr, bit) (_SFR_BYTE(sfr) |= _BV(bit))

void setup() {
int start;

Serial.begin(115200) ;
Serial.println("ADCTest at default 9.6 kHz sample rate") ;
start = millis() ;
for (int i = 0 ; i < 1000 ; i++)
Serial.print(millis() - start) ;
Serial.println(" ms (1000 calls)") ;
Serial.println();

// set prescale to 16

Serial.println("ADCTest at 76.93 kHz sample rate") ;
start = millis() ;
for (i = 0 ; i < 1000 ; i++)
Serial.print(millis() - start) ;
Serial.println(" ms (1000 calls)") ;
}

void loop()
{

}

The results are about as you would expect with nearly an order of magnitude improvement in ADC speed.

ADCTest at default 9.6 kHz sample rate
111 ms (1000 calls)

ADCTest at 76.93 kHz sample rate

18 ms (1000 calls)

Testing the Due with the following code shows the following:

3 ms (1000 calls)

Here is the code used:

void setup()
{
int start ;
int i ;

Serial.begin(115200) ;

start = millis() ;
for (i = 0 ; i < 1000 ; i++)
Serial.print(millis() - start) ;
Serial.println(" ms (1000 calls)") ;
Serial.println();

}

void loop()
{

}

Wednesday, January 14, 2015

Arduino Due Timers (Part 1)

My next foray into the wild and wonderful world of Arduino Due will be to take a close look at the Due notion of Timers.  Tighten up the seat belt as this world gets deep in a hurry.  I will endeavour to keep things as simple and practical as I can.

The Arduino Due Timers or Counter Timer (TC) as they are called are a bit different implementation from the 8 bit Arduino devices.  There is a lot of functionality in the Due  Timer Counter module and it is not a simple thing to describe it fully so I will likely break this into several postings.

The SAM3X8E CPU has 3 Timer Counters (TCs) named TC0, TC1, TC2.  Each TC includes three identical 32-bit channels.  Each channel can be independently programmed to perform a wide range of functions including frequency measurement, event counting, interval measurement, pulse generation, delay timing and pulse width modulation (PWM).

Each channel has three external clock inputs, five internal clock inputs and two multi-purpose input/output signals which can be configured by the user.  Each channel drives an internal interrupt signal which can be programmed to generate processor interrupts.

The TC embeds quadrature decoder logic connected in front of the 3 timers and driven by TIOA0, TIOB0 and TIOA1 inputs. When enabled, the quadrature decoder performs input line filtering and decoding of quadrature signals.  We will not be covering this feature in these postings.

The TC block has two global registers which act upon all three TC channels. The Block Control Register allows the three channels to be started simultaneously with the same instruction.  The Block Mode Register defines the external clock inputs for each channel, allowing them to be chained.

Clocks are assigned to Timer Counters as follows:
• TIMER_CLOCK1 - MCK/2
• TIMER_CLOCK2 - MCK/8
• TIMER_CLOCK3 - MCK/32
• TIMER_CLOCK4 - MCK/128
• TIMER_CLOCK5 - SLCK

MCK is the master clock (84 MHz) and SLCK is the slow clock (32 kHz).  It should be noted that it is possible to select the slow clock as the master clock, which case TIMER_CLOCK5 input is equivalent to the master clock.  As will be seen later, TCs can be chained together using the TIOA0, TIOA1, TIOA2 as an external clock input for subsequent TCs allowing further division of the clock frequency.    I may get into clock chaining in further detail in a separate post.

This rather daunting image is the Timer Counter block diagramme.  It is not as bad as it looks.

Channel signals seen above are as follows:

• XC0, XC1, XC2 - External Clock Inputs
• TIOA                - Capture Mode: TC Input, Waveform Mode: TC Output
• TIOB                - Capture Mode: TC Input, Waveform Mode: TC I/O
• INT                  - Interrupt Signal Output
• SYNC               - Synchronization Input Signal

The three channels of TC are identical in operation except when Quadrature decoder is enabled.

Each channel is organized around a separate 32-bit counter. The value of the counter is incremented at each positive edge of the selected clock. When the counter has reached the value 0xFFFFFFFF and wraps around to 0x00000000, an overflow occurs and the COVFS bit in TC_SR (Status Register) is set.  The current value of the counter is accessible anytime by reading the Counter Value Register, TC_CV. The counter can also be reset by a trigger. In this case, the counter value resets to 0x00000000 on the next valid edge of the selected clock following the trigger event.

At the block level, input clock signals of each channel can either be connected to the external inputs TCLK0, TCLK1 or TCLK2, or be connected to the internal I/O signals TIOA0, TIOA1 or TIOA2 for chaining by programming the TC_BMR (Block Mode) register.

Each channel can independently select an internal or external clock source for its counter via the TCCLKS bits in the TC Channel Mode register (TC_CMR).

• Internal clock signals: TIMER_CLOCK1, TIMER_CLOCK2, TIMER_CLOCK3, TIMER_CLOCK4, TIMER_CLOCK5
• External clock signals: XC0, XC1 or XC2

The selected clock can be inverted using the CLKI bit in TC_CMR. This allows counting on the opposite edges of the clock.  There is a burst function which allows the clock to be validated when an external signal is high. The BURST parameter in the Mode Register defines this signal (none, XC0, XC1, XC2).

Note that in all cases, if an external clock is used, the duration of each of its levels must be longer than the master clock period and the external clock frequency must be at least 2.5 times lower than the master clock.

Here is a block diagramme of the clock selection logic:

We still have not covered clock control, operating modes, or triggers, but we will touch on these topics as we work through examples.

Ok, enough background about Due timers for now and on to the first practical example.  In this example we will define a function that allows the configuration of a TC to generate a square wave at a relatively low frequency of the caller's choice.

Firstly, let's think about clocking our timer.  We have a system clock speed of 84 Mhz that can be divided by 4 different divisors (2, 8, 32 and 128) and the slow clock.  So, the available timer clock speeds are:

• 42 MHz
• 10.5 MHz
• 2.652 MHz
• 656.25 kHz
• 32 kHz

As previously mentioned, TCs can be chained to obtain other clock speeds, but that topic is beyond the scope of this posting.

To start up a timer, we need to deal with at least 4 different bits of information when doing simple operations with TCs.

• The Timer Counter (TC) you wish to use
• The channel in that TC you with to use
• The IRQ if interrupts are used
• The frequency of the timer

The following table is useful when performing TC configuration as it shows the relationship between the TC, it's channels, the IRQ to use, what the IRQ function must be called and the power management ID for that peripheral.  Looking at the first TC in the list (TC0) we can see that it has three channels (0, 1, 2).  The Nested Interrupt controller IRQ value is TC0_IRQn, TC1_IRQn and TC2_IRQn respectively.  When using interrupts, the IRQ handler function that is called is named TC0_Handler, TC1_Handler and TC2_Handler respectively.  The power management controller ID lastly are ID_TC0, ID_TC1 and ID_TC2 respectively.  The remaining TCs follow the same pattern.

So, we will create a function to encapsulate all this to get a simple timer running.  The timer will generate a square wave at the specified frequency.

There is a bit of housekeeping that needs to occur.

• We need to enable the ability to modify the power management controller's registers.
• We need to enable a specific peripheral clock specified by the IRQ.
• We need to set the TC configuration.

Power Management Controller calls look like this.  We need to turn off write protection and then enable the peripheral clock for TC1 Channel 0.

pmc_set_writeprotect(false);
pmc_enable_periph_clk(ID_TC3);

You could also use TC3_IRQn rather than ID_TC3 as they are both different names for the same constant value.  It is more clear to use the correct constant name, but as we will see shortly, it does simplify the implementation if we don't.

TC_Configure is used to configure a TC to operate in a given mode.  The timer is stopped after configuration and must be restarted with TC_Start().  All interrupts of the timer are also disabled.

We will select Waveform Mode and instruct the TC to count up with a reset on register C (RC) compare.  The following graphic depicts this mode, though it seems to imply the maximum counter value is 0xffff which is not true.  With 32 bits, the maximum counter value would be 0xffffffff.

TC configuration is accomplished with the following code.  We will use Timer Clock 4 (master clock / 128  = 656.25 kHz) as for this example we will be generating low frequency waveforms.  The function takes the TC and Channel as the first two parameters.  The last parameter sets bits to indicate the fact we are in Waveform mode, only counting up to the maximum value specified in Register C (RC) and which of the 5 clocks we will use.

TC_Configure(tc, channel, TC_CMR_WAVE | TC_CMR_WAVSEL_UP_RC |
TC_CMR_TCCLKS_TIMER_CLOCK4);

Now we need to set Register A (RA) to be the clock count where our output (TIOA) goes high and Register C (RC) at the clock count where our output goes low.  See the graphic above.  We chose points that would generate a symmetrical 50% duty cycle square wave.  Register C is set to the maximum count  specified by the clock frequency divided by the desired frequency.

uint32_t rc = VARIANT_MCK / 128 / freq;
TC_SetRA(tc, channel, rc / 2); // 50% duty cycle
TC_SetRC(tc, channel, rc);

Now we enable the Register C (RC) compare interrupt.  This bit is a little strange, because we have both an interrupt enable register and an interrupt disable register.  I suspect this is so that a complete set of interrupts that you might need can be set in the interrupt enable list and sub-sets turned off by modifying the list of disabled interrupts.  This way you don't have to remember which ones were enabled previously.  This code enables only the RC compare interrupt and disables everything except RC compare interrupt, or so I believe.

tc->TC_CHANNEL[channel].TC_IER =  TC_IER_CPCS;
tc->TC_CHANNEL[channel].TC_IDR = ~TC_IER_CPCS;

Start the timer running again.

TC_Start(tc, channel);

And tell the Nested Interrupt Controller to enable our IRQ.

NVIC_EnableIRQ(irq);

Simple, eh?  Yeah...  Nothing to it...  Here is the entire function:

void TimerStart(Tc *tc, uint32_t channel, IRQn_Type irq, uint32_t freq)
{
pmc_set_writeprotect(false);
pmc_enable_periph_clk((uint32_t) irq);
TC_Configure(tc, channel, TC_CMR_WAVE | TC_CMR_WAVSEL_UP_RC |
TC_CMR_TCCLKS_TIMER_CLOCK4);
uint32_t rc = VARIANT_MCK / 128 / freq;
TC_SetRA(tc, channel, rc/2); // 50% duty cycle square wave
TC_SetRC(tc, channel, rc);
TC_Start(tc, channel);
tc->TC_CHANNEL[channel].TC_IER=TC_IER_CPCS;
tc->TC_CHANNEL[channel].TC_IDR=~TC_IER_CPCS;
NVIC_EnableIRQ(irq);
}

Whew...  Still with me?  Ok!

Now we will implement an ISR handler that just toggles the LED on digital pin 13 on and off every time the timer fires an interrupt.  It also has to read the status of the Timer Counter (TC) in order to allow the next interrupt.

volatile boolean ledOn;

void TC3_Handler()
{
TC_GetStatus(TC1, 0);
digitalWrite(13, ledOn = !ledOn);
}

Ok, so given that we are only interrupting when Register C compare match occurs (see graphic above), and we are toggling pin 13 on every interrupt, we effectively divide the frequency that the led blinks at by two.  If we want the frequency of the led blinking to match the frequency of the timer, we will need to interrupt on Register A compare match as well.  The code changes to implement this would be to just enable the interrupt on RA compare as well as RC compare.

tc->TC_CHANNEL[channel].TC_IER=  TC_IER_CPCS | TC_IER_CPAS;
tc->TC_CHANNEL[channel].TC_IDR=~(TC_IER_CPCS | TC_IER_CPAS);

So, the only thing remaining is to implement the setup function and stand back...  We set the LED pin to output and initialize timer TC1, channel 0 using the IRQ TC3_IRQn (from the table above) with a frequency of 1 Hz.

void setup()
{
pinMode(13, OUTPUT);
TimerStart(TC1, 0, TC3_IRQn, 1);

}

So all of this is to blink a freaking LED at a 1 Hz rate.  Amazing flexibility (and the associated complexity) comes at the bit of a steep learning curve.  Here is the complete listing for your reference.

volatile boolean ledOn;

//TC1 ch 0
void TC3_Handler()
{
TC_GetStatus(TC1, 0);
digitalWrite(13, ledOn = !ledOn);
}

void TimerStart(Tc *tc, uint32_t channel, IRQn_Type irq, uint32_t freq)
{
pmc_set_writeprotect(false);
pmc_enable_periph_clk(irq);
TC_Configure(tc, channel, TC_CMR_WAVE | TC_CMR_WAVSEL_UP_RC |
TC_CMR_TCCLKS_TIMER_CLOCK4);
uint32_t rc = VARIANT_MCK / 128 / freq;
TC_SetRA(tc, channel, rc >> 1); // 50% duty cycle square wave
TC_SetRC(tc, channel, rc);
TC_Start(tc, channel);
tc->TC_CHANNEL[channel].TC_IER=  TC_IER_CPCS | TC_IER_CPAS;
tc->TC_CHANNEL[channel].TC_IDR=~(TC_IER_CPCS | TC_IER_CPAS);
NVIC_EnableIRQ(irq);
}

void setup()
{
pinMode(13, OUTPUT);
TimerStart(TC1, 0, TC3_IRQn, 1);
}

void loop()
{

}

More to come, but have fun with this if you are so inclined.  I am always willing to help out if you have questions.  Drop me a note at ko7m at arrl dot net or comment here and I will do my best.

Sunday, January 11, 2015

Arduino Due first project

With all the recent work I have done on Arduino 8 bit processors, I wanted to expand out a little more and have a play around with more capable devices that still maintain similar simplicity of hardware design and cost.  I settled on the Arduino Due for my next set of experiments and was able to obtain a board for less than the cost of eating lunch out due to some discounts I had in hand.  Amazon has them for around \$30-\$40 which appears to be pretty typical.

The Due is an interesting device with a lot more horsepower than the 8 bit Arduino versions.  Some features:

• A 32-bit core, that allows operations on 4 bytes wide data within a single CPU clock.
• CPU Clock at 84Mhz.
• 96 KBytes of SRAM.
• 512 KBytes of Flash memory for code.
• a DMA controller, that can relieve the CPU from doing memory intensive tasks.
• 54 digital I/O pins
• 12 bit true Digital to Analogue output
I thought as a first venture into programming this beast, I would implement a simple sine wave generator to play with the 12 bit DAC.  SInce I don't want to dive (yet) into the details of how to implement timers, I thought I would just take advantage of the faster processor and implement any timing delays inline.

So, to begin, I need a table of sine information that is small, in size while still using 12 bit data.  I have posted plenty of code previously that illustrates how I generate this data so I will not repeat it here.  To summarize, I write a simple C, C++ or C# application that generates the data and writes it as a data structure that I can just paste into my code.  I then graph the data using Excel to visualize the data.  I chose to use 120 phase points with a 12 bit range 0-4095.

static int sineTable[] =
{
0x7ff, 0x86a, 0x8d5, 0x93f, 0x9a9, 0xa11, 0xa78, 0xadd, 0xb40, 0xba1,
0xbff, 0xc5a, 0xcb2, 0xd08, 0xd59, 0xda7, 0xdf1, 0xe36, 0xe77, 0xeb4,
0xeec, 0xf1f, 0xf4d, 0xf77, 0xf9a, 0xfb9, 0xfd2, 0xfe5, 0xff3, 0xffc,
0xfff, 0xffc, 0xff3, 0xfe5, 0xfd2, 0xfb9, 0xf9a, 0xf77, 0xf4d, 0xf1f,
0xeec, 0xeb4, 0xe77, 0xe36, 0xdf1, 0xda7, 0xd59, 0xd08, 0xcb2, 0xc5a,
0xbff, 0xba1, 0xb40, 0xadd, 0xa78, 0xa11, 0x9a9, 0x93f, 0x8d5, 0x86a,
0x7ff, 0x794, 0x729, 0x6bf, 0x655, 0x5ed, 0x586, 0x521, 0x4be, 0x45d,
0x3ff, 0x3a4, 0x34c, 0x2f6, 0x2a5, 0x257, 0x20d, 0x1c8, 0x187, 0x14a,
0x112, 0x0df, 0x0b1, 0x087, 0x064, 0x045, 0x02c, 0x019, 0x00b, 0x002,
0x000, 0x002, 0x00b, 0x019, 0x02c, 0x045, 0x064, 0x087, 0x0b1, 0x0df,
0x112, 0x14a, 0x187, 0x1c8, 0x20d, 0x257, 0x2a5, 0x2f6, 0x34c, 0x3a4,
0x3ff, 0x45d, 0x4be, 0x521, 0x586, 0x5ed, 0x655, 0x6bf, 0x729, 0x794
};

Here is the sine data as graphed in Excel:

The following variables define the size of the sine table, the phase index variable and a calculated microsecond delay between phase points.  I will probably regret choosing 120 phase points as the microsecond delay is really 8.33333 microseconds which will truncate to 8 microseconds.

const int cSine       = sizeof(sineTable) / sizeof(int);
const int OnekHzDelay = 8;
int iPhase = 0;

The analog hardware by default will use 8 bits.  I am going to override this for analog read and write to use 12 bit resolution.

void setup()
{
analogWriteResolution(12);
}

Now in the main loop, I just sequence through the sine table writing each phase point to the DAC and then delaying the requisite number of microseconds before continuing.  This number is going to be too big, but I don't yet know the timing of the main loop.

void loop()
{
analogWrite(DAC0, sineTable[iPhase++]);
iPhase %= cSine;
delayMicroseconds(OnekHzDelay);
}

Now, looking at the DAC0 output with a scope, we see the following output which is a nice clean analogue output without the need to integrate as is needed with PWM output.

CAUTION: Please do not make the mistake of hooking a low impedance speaker or other load directly to the DAC output.  A low impedance load on either of the DAC outputs will result in blowing the DAC output transistor.  You should use a buffer amplifier stage to protect your shiny new Due device.  Loads of 10K impedance or higher should be safe to directly connect to the DAC outputs.

As can be seen on the output trace, the frequency of the output waveform is somewhat less than the expected as the period ended up being 1.68 ms rather than 1 ms as expected.  So to check this out, I thought I would take a few timings.

The Due is pretty quick.  Just testing a digitalWrite to set a pin high and then low in a loop results in the following information:

• digitalWrite takes about 1.26 us to execute.
• An empty main loop 4.4 us to execute

So, just looking at these timings, I should be able to time the main loop and see how long it is taking just using a simple pin toggle and looking at it on the scope.

The main loop (ignoring the two digital write calls) is taking 16.4 us to execute.  Without the delayMicroseconds call the main loop takes 8.44 us.  By experimentation, I found that a delay of 3 us produced a 1kHz tone (926Hz specifically).

Ok, so fun initial experiment.  Next I figure out how timers work so I can more accurately generating timing events.