Monday, January 19, 2015

Arduino ADC conversion rate

I know a lot of postings have been written about analogue to digital conversion rates in the 8 bit Arduino processors.  I decided to do a little poking around and performance timing to see for myself how well these little processors perform.  I will compare the performance of the 8 bit ATMega328 and ATMega2560 processors with the 32 bit Arduino Due processor.

The ADC clock is 16 MHz divided by a prescale factor.  The default setting is found in wiring.c:

        // set a2d prescale factor to 128
        // 16 MHz / 128 = 125 KHz, inside the desired 50-200 KHz range.
        // XXX: this will not work properly for other clock speeds, and
        // this code should use F_CPU to determine the prescale factor.
        sbi(ADCSRA, ADPS2);
        sbi(ADCSRA, ADPS1);
        sbi(ADCSRA, ADPS0);

        // enable a2d conversions
        sbi(ADCSRA, ADEN);

Using the default setting of 128 for the prescale factor gives a conversion clock of 125 kHz.  Since ADC conversion requires 13 ADC clocks the effective sample rate at best is approximately 125 kHz / 13 = 9.615 kHz.

Using a prescale of 16 would give an ADC clock of 1 MHz and a sample rate of 76.923 kHz.  Increasing the ADC clock can affect ADC accuracy however.  ATMel recommends that the maximum ADC clock frequency is limited by the internal DAC in the conversion circuitry and should not exceed 200 kHz.  However frequencies up to 1 MHz do not reduce the ADC resolution significantly.  Operation above 1 Mhz has not been characterized however.

So to do a quick test of the impact on performance I did a quick an dirty script to measure the time required to do 1000 analogRead operations before and after speeding up the ADC clock and see how much performance gain there is.

// useful defines for setting and clearing register bits
#define cbi(sfr, bit) (_SFR_BYTE(sfr) &= ~_BV(bit))
#define sbi(sfr, bit) (_SFR_BYTE(sfr) |= _BV(bit))

void setup() {
 int start;

 Serial.begin(115200) ;
 Serial.println("ADCTest at default 9.6 kHz sample rate") ;
 start = millis() ;
 for (int i = 0 ; i < 1000 ; i++)
   analogRead(0) ;
 Serial.print(millis() - start) ;
 Serial.println(" ms (1000 calls)") ;

 // set prescale to 16
 sbi(ADCSRA,ADPS2) ;
 cbi(ADCSRA,ADPS1) ;
 cbi(ADCSRA,ADPS0) ;

 Serial.println("ADCTest at 76.93 kHz sample rate") ;
 start = millis() ;
 for (i = 0 ; i < 1000 ; i++)
   analogRead(0) ;
 Serial.print(millis() - start) ;
 Serial.println(" ms (1000 calls)") ;

void loop()


The results are about as you would expect with nearly an order of magnitude improvement in ADC speed.

ADCTest at default 9.6 kHz sample rate
111 ms (1000 calls)

ADCTest at 76.93 kHz sample rate

18 ms (1000 calls)

Testing the Due with the following code shows the following:

ADCTest on Due 

3 ms (1000 calls)

Here is the code used:

void setup() 
 int start ;
 int i ;

 Serial.begin(115200) ;

 Serial.println("ADCTest on Due ") ;
 start = millis() ;
 for (i = 0 ; i < 1000 ; i++)
 Serial.print(millis() - start) ;
 Serial.println(" ms (1000 calls)") ;


void loop() 


Wednesday, January 14, 2015

Arduino Due Timers (Part 1)

My next foray into the wild and wonderful world of Arduino Due will be to take a close look at the Due notion of Timers.  Tighten up the seat belt as this world gets deep in a hurry.  I will endeavour to keep things as simple and practical as I can.

The Arduino Due Timers or Counter Timer (TC) as they are called are a bit different implementation from the 8 bit Arduino devices.  There is a lot of functionality in the Due  Timer Counter module and it is not a simple thing to describe it fully so I will likely break this into several postings.

The SAM3X8E CPU has 3 Timer Counters (TCs) named TC0, TC1, TC2.  Each TC includes three identical 32-bit channels.  Each channel can be independently programmed to perform a wide range of functions including frequency measurement, event counting, interval measurement, pulse generation, delay timing and pulse width modulation (PWM).

Each channel has three external clock inputs, five internal clock inputs and two multi-purpose input/output signals which can be configured by the user.  Each channel drives an internal interrupt signal which can be programmed to generate processor interrupts.

The TC embeds quadrature decoder logic connected in front of the 3 timers and driven by TIOA0, TIOB0 and TIOA1 inputs. When enabled, the quadrature decoder performs input line filtering and decoding of quadrature signals.  We will not be covering this feature in these postings.

The TC block has two global registers which act upon all three TC channels. The Block Control Register allows the three channels to be started simultaneously with the same instruction.  The Block Mode Register defines the external clock inputs for each channel, allowing them to be chained.

Clocks are assigned to Timer Counters as follows:
  • TIMER_CLOCK4 - MCK/128

MCK is the master clock (84 MHz) and SLCK is the slow clock (32 kHz).  It should be noted that it is possible to select the slow clock as the master clock, which case TIMER_CLOCK5 input is equivalent to the master clock.  As will be seen later, TCs can be chained together using the TIOA0, TIOA1, TIOA2 as an external clock input for subsequent TCs allowing further division of the clock frequency.    I may get into clock chaining in further detail in a separate post.

This rather daunting image is the Timer Counter block diagramme.  It is not as bad as it looks.

Channel signals seen above are as follows:

  • XC0, XC1, XC2 - External Clock Inputs
  • TIOA                - Capture Mode: TC Input, Waveform Mode: TC Output
  • TIOB                - Capture Mode: TC Input, Waveform Mode: TC I/O
  • INT                  - Interrupt Signal Output
  • SYNC               - Synchronization Input Signal

The three channels of TC are identical in operation except when Quadrature decoder is enabled.

Each channel is organized around a separate 32-bit counter. The value of the counter is incremented at each positive edge of the selected clock. When the counter has reached the value 0xFFFF and wraps around to 0x0000, an overflow occurs and the COVFS bit in TC_SR (Status Register) is set.  The current value of the counter is accessible anytime by reading the Counter Value Register, TC_CV. The counter can also be reset by a trigger. In this case, the counter value resets to 0x0000 on the next valid edge of the selected clock following the trigger event.

At the block level, input clock signals of each channel can either be connected to the external inputs TCLK0, TCLK1 or TCLK2, or be connected to the internal I/O signals TIOA0, TIOA1 or TIOA2 for chaining by programming the TC_BMR (Block Mode) register.

Each channel can independently select an internal or external clock source for its counter via the TCCLKS bits in the TC Channel Mode register (TC_CMR).

  • External clock signals: XC0, XC1 or XC2

The selected clock can be inverted using the CLKI bit in TC_CMR. This allows counting on the opposite edges of the clock.  There is a burst function which allows the clock to be validated when an external signal is high. The BURST parameter in the Mode Register defines this signal (none, XC0, XC1, XC2).

Note that in all cases, if an external clock is used, the duration of each of its levels must be longer than the master clock period and the external clock frequency must be at least 2.5 times lower than the master clock.

Here is a block diagramme of the clock selection logic:

We still have not covered clock control, operating modes, or triggers, but we will touch on these topics as we work through examples.

Ok, enough background about Due timers for now and on to the first practical example.  In this example we will define a function that allows the configuration of a TC to generate a square wave at a relatively low frequency of the caller's choice.

Firstly, let's think about clocking our timer.  We have a system clock speed of 84 Mhz that can be divided by 4 different divisors (2, 8, 32 and 128) and the slow clock.  So, the available timer clock speeds are:

  • 42 MHz
  • 10.5 MHz
  • 2.652 MHz
  • 656.25 kHz
  • 32 kHz

As previously mentioned, TCs can be chained to obtain other clock speeds, but that topic is beyond the scope of this posting.

To start up a timer, we need to deal with at least 4 different bits of information when doing simple operations with TCs.

  • The Timer Counter (TC) you wish to use
  • The channel in that TC you with to use
  • The IRQ if interrupts are used
  • The frequency of the timer

The following table is useful when performing TC configuration as it shows the relationship between the TC, it's channels, the IRQ to use, what the IRQ function must be called and the power management ID for that peripheral.  Looking at the first TC in the list (TC0) we can see that it has three channels (0, 1, 2).  The Nested Interrupt controller IRQ value is TC0_IRQn, TC1_IRQn and TC2_IRQn respectively.  When using interrupts, the IRQ handler function that is called is named TC0_Handler, TC1_Handler and TC2_Handler respectively.  The power management controller ID lastly are ID_TC0, ID_TC1 and ID_TC2 respectively.  The remaining TCs follow the same pattern.

So, we will create a function to encapsulate all this to get a simple timer running.  The timer will generate a square wave at the specified frequency.

There is a bit of housekeeping that needs to occur.  

  • We need to enable the ability to modify the power management controller's registers.
  • We need to enable a specific peripheral clock specified by the IRQ.
  • We need to set the TC configuration.

Power Management Controller calls look like this.  We need to turn off write protection and then enable the peripheral clock for TC1 Channel 0.


You could also use TC3_IRQn rather than ID_TC3 as they are both different names for the same constant value.  It is more clear to use the correct constant name, but as we will see shortly, it does simplify the implementation if we don't.

TC_Configure is used to configure a TC to operate in a given mode.  The timer is stopped after configuration and must be restarted with TC_Start().  All interrupts of the timer are also disabled.

We will select Waveform Mode and instruct the TC to count up with a reset on register C (RC) compare.  The following graphic depicts this mode, though it seems to imply the maximum counter value is 0xffff which is not true.  With 32 bits, the maximum counter value would be 0xffffffff.

TC configuration is accomplished with the following code.  We will use Timer Clock 4 (master clock / 128  = 656.25 kHz) as for this example we will be generating low frequency waveforms.  The function takes the TC and Channel as the first two parameters.  The last parameter sets bits to indicate the fact we are in Waveform mode, only counting up to the maximum value specified in Register C (RC) and which of the 5 clocks we will use.

   TC_Configure(tc, channel, TC_CMR_WAVE | TC_CMR_WAVSEL_UP_RC |

Now we need to set Register A (RA) to be the clock count where our output (TIOA) goes high and Register C (RC) at the clock count where our output goes low.  See the graphic above.  We chose points that would generate a symmetrical 50% duty cycle square wave.  Register C is set to the maximum count  specified by the clock frequency divided by the desired frequency.

   uint32_t rc = VARIANT_MCK / 128 / freq;
   TC_SetRA(tc, channel, rc / 2); // 50% duty cycle
   TC_SetRC(tc, channel, rc);

Now we enable the Register C (RC) compare interrupt.  This bit is a little strange, because we have both an interrupt enable register and an interrupt disable register.  I suspect this is so that a complete set of interrupts that you might need can be set in the interrupt enable list and sub-sets turned off by modifying the list of disabled interrupts.  This way you don't have to remember which ones were enabled previously.  This code enables only the RC compare interrupt and disables everything except RC compare interrupt, or so I believe.

   tc->TC_CHANNEL[channel].TC_IER =  TC_IER_CPCS;
   tc->TC_CHANNEL[channel].TC_IDR = ~TC_IER_CPCS;

Start the timer running again.

   TC_Start(tc, channel);

And tell the Nested Interrupt Controller to enable our IRQ.


Simple, eh?  Yeah...  Nothing to it...  Here is the entire function:

void TimerStart(Tc *tc, uint32_t channel, IRQn_Type irq, uint32_t freq)
   pmc_enable_periph_clk((uint32_t) irq);
   TC_Configure(tc, channel, TC_CMR_WAVE | TC_CMR_WAVSEL_UP_RC |
   uint32_t rc = VARIANT_MCK / 128 / freq;
   TC_SetRA(tc, channel, rc/2); // 50% duty cycle square wave
   TC_SetRC(tc, channel, rc);
   TC_Start(tc, channel);

Whew...  Still with me?  Ok!

Now we will implement an ISR handler that just toggles the LED on digital pin 13 on and off every time the timer fires an interrupt.  It also has to read the status of the Timer Counter (TC) in order to allow the next interrupt.

volatile boolean ledOn;

void TC3_Handler()
   TC_GetStatus(TC1, 0);
   digitalWrite(13, ledOn = !ledOn);

Ok, so given that we are only interrupting when Register C compare match occurs (see graphic above), and we are toggling pin 13 on every interrupt, we effectively divide the frequency that the led blinks at by two.  If we want the frequency of the led blinking to match the frequency of the timer, we will need to interrupt on Register A compare match as well.  The code changes to implement this would be to just enable the interrupt on RA compare as well as RC compare.


So, the only thing remaining is to implement the setup function and stand back...  We set the LED pin to output and initialize timer TC1, channel 0 using the IRQ TC3_IRQn (from the table above) with a frequency of 1 Hz.

void setup()
  pinMode(13, OUTPUT);
  TimerStart(TC1, 0, TC3_IRQn, 1);


So all of this is to blink a freaking LED at a 1 Hz rate.  Amazing flexibility (and the associated complexity) comes at the bit of a steep learning curve.  Here is the complete listing for your reference.

volatile boolean ledOn;

//TC1 ch 0
void TC3_Handler()
   TC_GetStatus(TC1, 0);
   digitalWrite(13, ledOn = !ledOn);

void TimerStart(Tc *tc, uint32_t channel, IRQn_Type irq, uint32_t freq)
   TC_Configure(tc, channel, TC_CMR_WAVE | TC_CMR_WAVSEL_UP_RC |
   uint32_t rc = VARIANT_MCK / 128 / freq;
   TC_SetRA(tc, channel, rc >> 1); // 50% duty cycle square wave
   TC_SetRC(tc, channel, rc);
   TC_Start(tc, channel);

void setup()
  pinMode(13, OUTPUT);
  TimerStart(TC1, 0, TC3_IRQn, 1);

void loop()


More to come, but have fun with this if you are so inclined.  I am always willing to help out if you have questions.  Drop me a note at ko7m at arrl dot net or comment here and I will do my best.

Sunday, January 11, 2015

Arduino Due first project

With all the recent work I have done on Arduino 8 bit processors, I wanted to expand out a little more and have a play around with more capable devices that still maintain similar simplicity of hardware design and cost.  I settled on the Arduino Due for my next set of experiments and was able to obtain a board for less than the cost of eating lunch out due to some discounts I had in hand.  Amazon has them for around $30-$40 which appears to be pretty typical.

The Due is an interesting device with a lot more horsepower than the 8 bit Arduino versions.  Some features:

  • A 32-bit core, that allows operations on 4 bytes wide data within a single CPU clock.
  • CPU Clock at 84Mhz.
  • 96 KBytes of SRAM.
  • 512 KBytes of Flash memory for code.
  • a DMA controller, that can relieve the CPU from doing memory intensive tasks.
  • 54 digital I/O pins
  • 12 bit true Digital to Analogue output
I thought as a first venture into programming this beast, I would implement a simple sine wave generator to play with the 12 bit DAC.  SInce I don't want to dive (yet) into the details of how to implement timers, I thought I would just take advantage of the faster processor and implement any timing delays inline.

So, to begin, I need a table of sine information that is small, in size while still using 12 bit data.  I have posted plenty of code previously that illustrates how I generate this data so I will not repeat it here.  To summarize, I write a simple C, C++ or C# application that generates the data and writes it as a data structure that I can just paste into my code.  I then graph the data using Excel to visualize the data.  I chose to use 120 phase points with a 12 bit range 0-4095.

static int sineTable[] = 
  0x7ff, 0x86a, 0x8d5, 0x93f, 0x9a9, 0xa11, 0xa78, 0xadd, 0xb40, 0xba1,
  0xbff, 0xc5a, 0xcb2, 0xd08, 0xd59, 0xda7, 0xdf1, 0xe36, 0xe77, 0xeb4,
  0xeec, 0xf1f, 0xf4d, 0xf77, 0xf9a, 0xfb9, 0xfd2, 0xfe5, 0xff3, 0xffc,
  0xfff, 0xffc, 0xff3, 0xfe5, 0xfd2, 0xfb9, 0xf9a, 0xf77, 0xf4d, 0xf1f,
  0xeec, 0xeb4, 0xe77, 0xe36, 0xdf1, 0xda7, 0xd59, 0xd08, 0xcb2, 0xc5a,
  0xbff, 0xba1, 0xb40, 0xadd, 0xa78, 0xa11, 0x9a9, 0x93f, 0x8d5, 0x86a,
  0x7ff, 0x794, 0x729, 0x6bf, 0x655, 0x5ed, 0x586, 0x521, 0x4be, 0x45d,
  0x3ff, 0x3a4, 0x34c, 0x2f6, 0x2a5, 0x257, 0x20d, 0x1c8, 0x187, 0x14a,
  0x112, 0x0df, 0x0b1, 0x087, 0x064, 0x045, 0x02c, 0x019, 0x00b, 0x002,
  0x000, 0x002, 0x00b, 0x019, 0x02c, 0x045, 0x064, 0x087, 0x0b1, 0x0df,
  0x112, 0x14a, 0x187, 0x1c8, 0x20d, 0x257, 0x2a5, 0x2f6, 0x34c, 0x3a4,
  0x3ff, 0x45d, 0x4be, 0x521, 0x586, 0x5ed, 0x655, 0x6bf, 0x729, 0x794

Here is the sine data as graphed in Excel:

The following variables define the size of the sine table, the phase index variable and a calculated microsecond delay between phase points.  I will probably regret choosing 120 phase points as the microsecond delay is really 8.33333 microseconds which will truncate to 8 microseconds.

const int cSine       = sizeof(sineTable) / sizeof(int);
const int OnekHzDelay = 8;
int iPhase = 0;

The analog hardware by default will use 8 bits.  I am going to override this for analog read and write to use 12 bit resolution.

void setup()

Now in the main loop, I just sequence through the sine table writing each phase point to the DAC and then delaying the requisite number of microseconds before continuing.  This number is going to be too big, but I don't yet know the timing of the main loop.

void loop()
  analogWrite(DAC0, sineTable[iPhase++]);
  iPhase %= cSine;

Now, looking at the DAC0 output with a scope, we see the following output which is a nice clean analogue output without the need to integrate as is needed with PWM output.

CAUTION: Please do not make the mistake of hooking a low impedance speaker or other load directly to the DAC output.  A low impedance load on either of the DAC outputs will result in blowing the DAC output transistor.  You should use a buffer amplifier stage to protect your shiny new Due device.  Loads of 10K impedance or higher should be safe to directly connect to the DAC outputs.

As can be seen on the output trace, the frequency of the output waveform is somewhat less than the expected as the period ended up being 1.68 ms rather than 1 ms as expected.  So to check this out, I thought I would take a few timings.

The Due is pretty quick.  Just testing a digitalWrite to set a pin high and then low in a loop results in the following information:

  • digitalWrite takes about 1.26 us to execute.
  • An empty main loop 4.4 us to execute

So, just looking at these timings, I should be able to time the main loop and see how long it is taking just using a simple pin toggle and looking at it on the scope.

The main loop (ignoring the two digital write calls) is taking 16.4 us to execute.  Without the delayMicroseconds call the main loop takes 8.44 us.  By experimentation, I found that a delay of 3 us produced a 1kHz tone (926Hz specifically).

Ok, so fun initial experiment.  Next I figure out how timers work so I can more accurately generating timing events.

Sunday, December 28, 2014


In a previous post I pondered briefly about BPSK (used in PSK31) as compared to FSK. Instead of traditional frequency-shift keying, in BPSK information is transmitted by patterns of polarity-reversals (sometimes called 180-degree phase shifts). One way to think about this would be to swap antenna terminals on each phase reversal. 

BPSK uses a 180 degree phase shift when encoding a zero bit in a varicode character.  By way of review, BPSK uses a sinusoid of constant amplitude and a 180 degree phase shift to represent a binary 1 as opposed to a binary 0.

Now, the problem with this in radio circuits is the phase reversal if hard keyed will result in a lot of splatter and the accompanying bandwidth.  So, in radio circuits as discussed in my series about PSK31 on the arduino, we typically ramp the sinusoid level down to zero at the phase change points to eliminate this splatter and shift the phase reversal to the middle of the bit time.  PSK31 encodes a zero bit as a phase reversal and a one bit as no phase change.  In other aspects however PSK31 is BPSK.

In thinking about BPSK vs. FSK or more accurately, MSK (Minimum Shift Keying) I found myself pondering whether they are functionally equivalent.

MSK is basically FSK with the shift set to ½ the baud rate.  Realistically this is the smallest shift you can use without trading off transmission speed.  PSK31 uses a 31.25 baud transmission speed.  So, if we were going to encode a 31.25 baud data transmission using MSK, our keying shift would be 1/4 of the data rate either side of the transmit center frequency.

  31.25 baud / 2 = 15.625 / 2 = 7.8125 Hz

If you think of the phase of a carrier that is 7.8125 Hz below the transmit center frequency, it will lag by 90 degrees after 32 mS, and at 7.8125 Hz above, it will lead by 90 degrees after 32 mS for a difference of 180 degrees.

So MSK appears to be functionally equivalent to BPSK while using +90 and -90 degree shifts instead of 0 and 180 degree shifts.

An advantage to this approach is that the resultant signal has no amplitude modulated component and so non-linear amplification techniques may be utilized greatly simplifying transmitter design.  It does require the ability to do phase continuous frequency changes however.


Ok, so to give this concept a play around, I decided to use all the recent work I did on the Minima to implement 1 Hz tuning on the Si570 and see if I could implement a decode-able PSK31 varicode message using MSK techniques with the Si570.

The Si570 is technically capable of 0.42 Hz frequency precision, but for this experiment, I will use  8 Hz as the frequency shift above and below the transmit frequency instead of 7.8125 Hz. 

At 7.8125 Hz, 360 degrees of phase change takes 128 ms.  90 degrees of phase change requires 128/4 = 32 ms.

At 8 Hz, 360 degrees of phase change takes 125 ms.  90 degrees of phase change requires 125/4 = 31.25 ms.

I suspect that most PSK31 decoding applications will be able to deal with this difference.

Having modified my recent PSK31 code to drive the Si570 oscillator rather than generate audio, I have run into a number of problems.  I suspected that the i2c communications library that comes with the Arduino IDE would require interrupts to be enabled, which is true.  If you try to call the Si570 code from within a timer interrupt, it will hang the device.

Enabling interrupts around the Si570 call within the timer ISR cures the hanging problem, but fails to communicate with the Si570 to set the frequency.  I suspect that while the library is returning quickly, the i2c communications doesn't complete until sometime later.

Moving the setFrequency() call out of the ISR generates MSK, but it appears that there are bits being lost.  The ISR is completing in 1.040 us, but is not accurately generating all of the frequency shifts.  My suspicion is that the wire library is returning quickly, but that its own interrupt are not completing in a timely fashion to allow precise control over when frequency shifts are happening.  I will have to do some measurements on the i2c communications and put some debugging pin toggles in the wire library to measure how long things are taking to complete.

More to come...

Saturday, December 27, 2014

Generating Audio PSK31 with an Arduino (Part 2)

I have completed my initial pass on PSK31 audio generation using an Arduino.  While I have not yet fixed the PSK31 character timing, fldigi is able to decode what I am generating, so there is a bit of character timing flexibility at least in the implementation of fldigi.  I have not tested with any other PSK decoder as it is still my intention to fix this part of the implementation.  However, I wanted to publish an update with what I have working regardless.

Here is the top part of the code listing where we describe the functionality and define the table that translates between 7 bit ASCII characters and the variable bit length PSK31 character set.

// PSK31 audio generation
// Jeff Whitlatch - ko7m

// We are going to generate a 1 kHz centre frequency tone.  
// Each 1 kHz cycle of the sinusoid will be generated from
// 32 eight bit amplitude samples.  The period of a 1 kHz tone
// is 1 ms.  Each of the 32 samples per cycle has a period
// of 31.25 us.  We will construct each sinusoid from a 32 byte
// per cycle lookup table of amplitude values ranging from
// 0x00 to 0xff where the zero crossing value is 0x80.

// The PSK31 character bit time is 31.25 ms constructed of 1024
// samples.  A binary zero is represented by a phase reversal
// while a binary 1 is represented by the lack of a phase reversal.

// Characters are encoded with a variable bit length code (varicode)
// where the length of each character is inversely
// proportional to the frequency of use in the english language 
// of that character.  Characters are encoded with a bit
// pattern where there are no sequential zero bits.  Two zero bits
// in a row signify the end of a character.

// Varicode lookup table
// This table defines the PKS31 varicode.  There are 128 entries,
// corresponding to ASCII characters 0-127 with two bytes for each entry.
// The bits for the varicode are to be shifted out LSB-first.
// More than one zero in sequence signifies the end of the character.
// For modulation, a 0 represents a phase reversal while a 1 
// represents a steady-state carrier.

uint16_t varicode[] = {
  0x0355,  // 0 NUL
  0x036d,  // 1 SOH
  0x02dd,  // 2 STX
  0x03bb,  // 3 ETX
  0x035d,  // 4 EOT
  0x03eb,  // 5 ENQ
  0x03dd,  // 6 ACK
  0x02fd,  // 7 BEL
  0x03fd,  // 8 BS
  0x00f7,  // 9 HT
  0x0017,  // 10 LF
  0x03db,  // 11 VT
  0x02ed,  // 12 FF
  0x001f,  // 13 CR
  0x02bb,  // 14 SO
  0x0357,  // 15 SI
  0x03bd,  // 16 DLE
  0x02bd,  // 17 DC1
  0x02d7,  // 18 DC2
  0x03d7,  // 19 DC3
  0x036b,  // 20 DC4
  0x035b,  // 21 NAK
  0x02db,  // 22 SYN
  0x03ab,  // 23 ETB
  0x037b,  // 24 CAN
  0x02fb,  // 25 EM
  0x03b7,  // 26 SUB
  0x02ab,  // 27 ESC
  0x02eb,  // 28 FS
  0x0377,  // 29 GS
  0x037d,  // 30 RS
  0x03fb,  // 31 US
  0x0001,  // 32 SP
  0x01ff,  // 33 !
  0x01f5,  // 34 @
  0x015f,  // 35 #
  0x01b7,  // 36 $
  0x02ad,  // 37 %
  0x0375,  // 38 &
  0x01fd,  // 39 '
  0x00df,  // 40 (
  0x00ef,  // 41 )
  0x01ed,  // 42 *
  0x01f7,  // 43 +
  0x0057,  // 44 ,
  0x002b,  // 45 -
  0x0075,  // 46 .
  0x01eb,  // 47 /
  0x00ed,  // 48 0
  0x00bd,  // 49 1
  0x00b7,  // 50 2
  0x00ff,  // 51 3
  0x01dd,  // 52 4
  0x01b5,  // 53 5
  0x01ad,  // 54 6
  0x016b,  // 55 7
  0x01ab,  // 56 8
  0x01db,  // 57 9
  0x00af,  // 58 :
  0x017b,  // 59 ;
  0x016f,  // 60 <
  0x0055,  // 61 =
  0x01d7,  // 62 >
  0x03d5,  // 63 ?
  0x02f5,  // 64 @
  0x005f,  // 65 A
  0x00d7,  // 66 B
  0x00b5,  // 67 C
  0x00ad,  // 68 D
  0x0077,  // 69 E
  0x00db,  // 70 F
  0x00bf,  // 71 G
  0x0155,  // 72 H
  0x007f,  // 73 I
  0x017f,  // 74 J
  0x017d,  // 75 K
  0x00eb,  // 76 L
  0x00dd,  // 77 M
  0x00bb,  // 78 N
  0x00d5,  // 79 O
  0x00ab,  // 80 P
  0x0177,  // 81 Q
  0x00f5,  // 82 R
  0x007b,  // 83 S
  0x005b,  // 84 T
  0x01d5,  // 85 U
  0x015b,  // 86 V
  0x0175,  // 87 W
  0x015d,  // 88 X
  0x01bd,  // 89 Y
  0x02d5,  // 90 Z
  0x01df,  // 91 [
  0x01ef,  // 92 
  0x01bf,  // 93 ]
  0x03f5,  // 94 ^
  0x016d,  // 95 _
  0x03ed,  // 96 `
  0x000d,  // 97 a
  0x007d,  // 98 b
  0x003d,  // 99 c
  0x002d,  // 100 d
  0x0003,  // 101 e
  0x002f,  // 102 f
  0x006d,  // 103 g
  0x0035,  // 104 h
  0x000b,  // 105 i
  0x01af,  // 106 j
  0x00fd,  // 107 k
  0x001b,  // 108 l
  0x0037,  // 109 m
  0x000f,  // 110 n
  0x0007,  // 111 o
  0x003f,  // 112 p
  0x01fb,  // 113 q
  0x0015,  // 114 r
  0x001d,  // 115 s
  0x0005,  // 116 t
  0x003b,  // 117 u
  0x006f,  // 118 v
  0x006b,  // 119 w
  0x00fb,  // 120 x
  0x005d,  // 121 y
  0x0157,  // 122 z
  0x03b5,  // 123 {
  0x01bb,  // 124 |
  0x02b5,  // 125 }
  0x03ad,  // 126 ~
  0x02b7   // 127 (del)

Now we define the sinusoid data table of 513 bytes.  The first 512 bytes are 16 cycles of sine data ramping up from zero to full volume.

// 16 cycles of 32 samples each (512 bytes) of ramp-up
// sinusoid information.  There is an extra byte at the
// end of the table with the value 0x80 which allows the
// first byte to always be at the zero crossing point
// whether ramping up or down.
char data[] = {

// The last 32 bytes (33 with the extra on the end)
// define a single cycle of full amplitude sinusoid.
#define one  (&data[15*32])  // Sine table pointer for a one bit
#define zero (&data[16*32])      // Sine table pointer for a zero bit

// Useful macros for setting and resetting bits
#define cbi(sfr, bit) (_SFR_BYTE(sfr) &= ~_BV(bit))
#define sbi(sfr, bit) (_SFR_BYTE(sfr) |= _BV(bit))

The following variables are used in the interrupt service routine.  These variables define the 7 bit ASCII buffer of the text to be sent.  The idea is to maintain a head and tail index into the buffer where the head is the next character to be sent.  The tail is the place where new text will be inserted.  The buffer is circular and when head == tail, the buffer is empty and we stop the PSK31 transmission.

// Variables used by the timer ISR to generate sinusoidal information.
volatile char    rgchBuf[256];    // Buffer of text to send
volatile uint8_t head = 0;        // Buffer head (next character to send)
volatile uint8_t tail = 0;        // Buffer tail (next insert point)

The following variable holds the variable bit length character currently being sent.  Bits are sent least significant bit (LSB) first.  When two zero bits have been sent, the character is finished and the next character is fetched from the buffer above.

volatile uint16_t vcChar = 0;     // Current varicode char being sent

The following variable should be a constant as there is a constant number of phase points in 1/2 the PSK31 bit time of 1024 phase points.  I will fix this in the final version.

volatile int   cbHalfBit = 512;

The following variables are keeping track of the current phase point (index into the sine table) and the number of phase points that remain in the direction we are scanning the table.  Lastly, we keep track of if we are currently sending a PSK31 one bit or a zero bit.

volatile char *pbSine = zero;
volatile int   cbDirection = 512;
volatile char  fSendOne = false;

The IX variable is the increment (+1 or -1) to add to the phase index to get to the next phase point to be processed.  The sign of the variable indicates whether we are processing the table in the forward or reverse direction.  Phase is the current phase of the sinusoid and is either +1 for no phase shift or -1 for 180 degree phase shift.  The fFullBit variable tells us if we are processing the first or second half of the PSK31 bit.

volatile char  ix      = -1;
volatile char  phase   = 1;
volatile char fFullBit = 0;

The cZeroBits counts the number of consecutive zero bits that have been sent in order to detect the end of a character.  The maxZeroBits variable tells us how many zero bits indicate the end of a character.  This is also used to send a few zero bits at the end of a transmission before turning off the tone.

volatile char cZeroBits = 0;
volatile char maxZeroBits = 2;

The following code sets up timer 2 to process our phase point generation.  This still needs to be adjusted to be 31.25 microseconds rather than the current 32 microseconds.  I will fix this, I promise.  It just has not been a priority.

// Setup timer2 with prescaler = 1, PWM mode to phase correct PWM
// See th ATMega datasheet for all the gory details

// This is not quite right for PSK31 as it is 32 us vs. 31.25 us
void timer2Setup()
  // Clock prescaler = 1
  sbi (TCCR2B, CS20);    // 001 = no prescaling
  cbi (TCCR2B, CS21);
  cbi (TCCR2B, CS22);

  // Phase Correct PWM
  cbi (TCCR2A, COM2A0);  // 10 = clear OC2A on compare match when up counting
  sbi (TCCR2A, COM2A1);  //      set OC2A on compare match when down counting

  // Mode 1
  sbi (TCCR2A, WGM20);   // 101 = Mode 5 uses OCR2A as top value rather than 0xff
  cbi (TCCR2A, WGM21);

Now we have the main waveform generation state machine.  This code is still very rough and will need to be optimized once I get all functionality implemented.

// Timer 2 interrupt service routine (ISR).
// Grab the next phase point from the table and 
// set the amplitude value of the sinusoid being
// constructed.  For a one bit, set 512 phase points
// (16 amplitudes of 32 samples each) to ramp
// down to zero and then immediately back up to full
// amplitude for a total of 1024 phase points.
// For a zero bit, there is no amplitude or phase
// change, so we just play 32 phase points of
// full amplitude data 32 times for a total of 1024
// phase points.
// Each end of the ramp-up table starts with a zero 
// crossing byte, so there is one extra byte in
// the table (513 entries).  Ramping up plays bytes
// 0 -> 511 and ramping down plays bytes 512 -> 1
// allowing each direction to start at the zero
// crossing point.
  // Set current amplitude value for the sine wave 
  // being constructed taking care to invert the
  // phase when processing the table in reverse order.
  OCR2A = *pbSine * ix * phase;
  pbSine += ix;
  // At the half bit time, we need to change phase
  // if generating a zero bit
  if (0 == --cbHalfBit)
    cbHalfBit = 512;
    // Get the next varichar bit to send
    if (fFullBit)
      // Count the number of sequential zero bits
      if (fSendOne = vcChar & 1) cZeroBits = 0; else cZeroBits++;

      // Shift off the least significant bit.
      vcChar >>= 1;
      // If we have sent two zero bits, end of character has occurred
      if (cZeroBits > maxZeroBits)
        cZeroBits = 0;
        // If send buffer not empty, get next varicode character
        if (head != tail)
          // Assumes a 256 byte buffer as index increments modulo 256
          vcChar = varicode[rgchBuf[head++]];
          if (maxZeroBits > 2) cbi (TIMSK2,TOIE2); else maxZeroBits = 75;
    fFullBit = !fFullBit;  // Toggle end of full bit flag
    // When we get done ramping down, phase needs to
    // change unless we are sending a one bit
    if (ix < 0 &&!fSendOne) phase = -phase;
  // At the end of the table for the bit being
  // generated, we need to change direction
  // and process the table in the other direction.
  if (0 == --cbDirection)
    cbDirection = fSendOne ? 32 : 512;
    ix = -ix;

Setup is going to set our pin modes, set up timer 2 and put some test text into the send buffer to be processed.  Once the timer is enabled, PSK generation is automagic.

void setup() 
  // PWM output for timer2 is pin 10 on the ATMega2560
  // If you use an ATMega328 (such as the UNO) you need
  // to make this pin 11
  // See
  pinMode(10, OUTPUT);   // Timer 2 PWM output on mega256 is pin 10

  // Set up timer2 to a phase correct 32kHz clock

  // Put something in the buffer to be sent
  strcpy((char *) &rgchBuf[0], "\ncq cq cq de ko7m ko7m ko7m"
                               "\ncq cq cq de ko7m ko7m ko7m pse k\n");
  tail = strlen((const char *) rgchBuf);
  head = 0;
  sbi (TIMSK2,TOIE2);    // Enable timer 2.

Nothing to do (yet) in the main loop

void loop() 

Here is a screen shot of fldigi decoding the test message I have hard coded.

As always, your mileage may vary.  If you have any questions or comments I would love to hear from you by posting here or dropping me a line at ko7m at arrl dot org and I will do my best to help you out.