This reminds me a bit of the Pitch Drop Experiment, probably one of the longest running scientific experiments I have ever encountered. Fortunately my problem is reproduce-able in hours, not decades.
The audio signal path is, to say the least, complicated. Audio originated from a USB source got transmitted to a dedicated micro-controller. From there it traversed an I2S bus to a DSP where some sample rate conversion is done. The I2S bus then continues on to an Audinate network audio device where the audio is deposited on a Dante channel and sent via an Ethernet network switch onto a private LAN network. On the other side of the network, the audio goes through another network switch into another Audinate device and then across an I16S bus to an FPGA where signal switching can be accomplished. From there across an I32S bus to another DSP where any effects and other audio mixing can be done. The composite signal then returns through the same path in reverse to the USB device that originated the signal where it is recorded. What could possibly go wrong?
Well, surprisingly little actually does go wrong, but when it does it can be a bit subtle, so how to track it down? Well, there are eight signal path segments in each direction. My approach is to try and isolate each segment and verify them individually. I needed to keep the audio signal in the digital realm as I certainly didn't want to introduce the complexity of digital to analogue conversions, plus the issue reproduces in the digital realm.
Since one piece of equipment in the scheme was from a 3rd party, the first task was to verify it could source and record digital audio without error and without using any of the suspect equipment. So, I purchased a digital audio USB interface that would allow me to loop back said audio while keeping it in the digital realm. I chose a commercially available USB to S/PDIF adapter and just looped the S/PDIF output back to the input. Now the USB device could playback and record its own signal and see if an exact copy was produced. No problems found.
So, in the interest of divide and conquer, I started by sending audio all the way to the FPGA but then intercepting the outbound signal and routing it to a 32 channel hard disk audio recorder. This allowed recording of the digital signal at the mid-point of the signal path while simultaneously recording it at the return end of the signal path. Multiple four hour runs were free of errors at the signal mid-point, but still had glitches at the return end of the signal path. This eliminated all but one signal path in the outbound direction. To eliminate the remaining signal path, I looped back the audio signal at the DSP at the far end of the path back to the FPGA before sending it to the digital audio recorder. Still clean.
So, now with half the signal path eliminated, I generated a 997 Hz sinusoidal tone in the same DSP (997 is a prime number) that was used to loop back the signal above in order to isolate the return path. I have audio glitch testing software than is used to analyze the 48,000 audio samples per second over four hours for abnormalities This ends up being several gigabytes of data. Once a reproduce-able error is found, I find that generating a triangle waveform where each sample starts at zero and rises by a value of one to the maximum value and the decreases by a value of one to the minimum. My analysis software can find glitches (basically looking for vertical edges over some threshold) with either sinusoidal or this kind of ramp-up/ramp-down waveform. When it comes to looking for firmware issues of dropped samples or buffering issues, I usually use a sawtooth waveform. At the sample rate of 48 kHz, a ramp from zero to maximum and a sudden drop to zero of a 16 bit sample is about 1.365 seconds. It just sounds like a ~1 hertz click. At 24 bits, it is a much slower changing waveform at nearly 350 seconds. Using this kind of data provides predictability to the expected data, making it easier to recognize when things are not working and is of sufficient duration to ensure that no waveform can repeat itself inside any digital data buffering scheme. Additionally, specialized code can be written to examine the binary data flowing across the system at any point and predict what should be happening at any given time.
Removing complexities in the Ethernet segment such as removing external network switches and using direct connections between pieces of equipment in order to simplify the setup failed to resolve the issue.
The return audio path ended up being the source of the audio glitch. By having the ability to record the audio at each end of the complete audio path, the next task was to move one signal segment at a time down the return path and loop it back to the recorder. This allowed isolating each signal segment and being able to rely on previously tested paths to perform the recording. Again, using divide and conquer techniques, I did a binary search of the remaining signal segments. If I had 8 segments to check, I went halfway down and looped back to the recorder. If a problem was found, divide that signal path in half and retest until the offending segment was found.
In the end some restructuring of DMA priorities on one of the micro-controllers and some DSP work was involved in resolving the issue.