I am posting this issue here first as I am using Arduino-Pico, but this could well be an issue in the underlying Pico SDK or even a hardware issue. I am going to do further testing but I will describe what I have tried so far below.
This is a bit long winded, mainly because writing it out is a good way for me to make sure I have covered everything I can think of and has helped me reach the conclusion that there must actually be an issue here somewhere.
Ok, so I have noticed some strange behaviour when sending data from one Pico to another using the i2c interface. I am sending 2 arrays of data from one pico to the other, the first array is 84 bytes long and the second is 48 bytes long, I understand that the wire library implementation for the Pico uses a 128 byte transmit buffer so this is within the buffer size.
The format of the data I am sending isn't really relevant, but it helped me spot the issue. The data I am sending is 2 arrays of 32bit float values, the first array is 21 floats which equals 84 bytes and the second array is 12 floats which equals 48 bytes. For each float array I send each group of 4 bytes sequentially using the write function until all the values have been placed in the buffer before calling endTransmission to send the data. The sequence I use for each array is like this:
beginTransmission(slave_address); write(byte1); write(byte2); write(byte3); write(byte4); write(byte1); …. Rest-of-bytes … endTransmission();
So basically I am sending 2 separate i2c bus transmissions, the first one being 84 bytes followed immediately by another of 48 bytes.
On the receiving pico i check the expected size of the data as reported in the onReceive interrupt then depending on the size i then copy the bytes into one of 2 correctly sized byte arrays for later processing in the main loop.
In the main loop I convert the bytes from each array back into 2 arrays of floats so that I have the original float arrays, the first one being 21 floats = 84 bytes. And the second one being 12 floats = 48 bytes. I am using i2c1 (Wire1) on both Pico’s as this makes the pins better suit my requirements and layout.
When testing I started to notice that the second array on the receiving side was not being populated sometimes, digging further I found that this was because the onReceive interrupt sometimes only reported 47 bytes for the second transmission rather than 48 bytes, resulting in the transmission being ignored by the routine inside the onReceive interrupt . To see what was missing I modified the code so it filled the second array even if the received bytes was less then the expected value of 48. What I found was that on the occasions when only 47 bytes was reported, all of the floats were corrupted, being way off value, close to their max values. This indicates that the missing data was at the start of the i2c transmission which messed up the expected boundaries of the 4 byte groups which make up every float value.
So far I had mainly been using serial print debugging for this issue so used some more spare pins as debug signal outputs and connected up the scope to find out what was ging on. On the receiving side I setup one of the new debug pins to output a low pulse when the bytes received was less than expected for the second i2c transmission.
The purple trace is the i2c clock, Blue SDA, yellow low pulse is the error marker from the debug pin, green is the data processing and error detection code running.
The pair of transmissions are here, rather than sending the second immediately after the first, in this trace they are separated by a 1mS gap which is one of the things I tried changing later.
By trigging off the debug pin I was able to determine from measuring the clock pulses for the address and data transfers that the correct number of bytes was being sent on the wire despite what was being reported in the onReceive interrupt. I also had some serial debugging on the sending Pic to report any errors from the endTransmission call, for which none were ever reported.
The error rate was high, around 50% of the second transmissions reported 47 bytes rather than 48. I was using an i2c clock of 1MHz (fast mode plus) as stated in the documentation as the fastest supported speed. I had also changed the pull up resistors from 4.7k to 1k to help shape the pulses, this also increased the actual clock speed as I was only getting around 700KHz (presumably due to clock stretching), with 1k pull-ups I am getting 868KHz. The 1k pull-ups made no difference to the error rate, but the rising edge slope was now better.
I was sending the pair of transmissions at a frequency of 50Hz, so at 20 mS intervals, this left plenty space between each pair of i2c transmissions and I could confirm via use of debug output pulses that everything had finished its work well before the next onReceive interrupt. Some of the processing work was being done in core1 using loop1() so I eliminated this and ran everything in core0, commenting out loop1(), this made no difference.
On the scope I noticed that there was quite a delay between the end of the clock pulse block and the start of the onReceive interrupt (around 220uS). This meant that the next i2c transmission was arriving before the interrupt had fired. This resulted in a short break in the SCL pulses during the interrupt handling, I thought maybe this was the issue, but why was it only intermittent?
To rule this out, on the transmitting side, I added a 500uS delay between the first and second transmissions to see what difference it would make, this made the error rate drop to around 10%. I was able to confirm on the scope that all processing related to the first transmission was complete before the second transmission started, but the missing data was still occurring. I then increased this to 600uS just for the heck of it to create a larger gap and found the error rate dropped further to around 5%.
Having previously had issues with random interrupt firing delays on pins when using the USB-UART serial comms on a Pico, due to that also using interrupts, I wondered if the USB-UART serial communications on the receiving side was having an effect as there was some serial debugging output being done in between the first and second transmission blocks. Previously this would have overlapped with the reception of the second transmission until I added the 500uS delay on the transmitter. I moved this serial debug output code completely outside of the pair of transmission receptions so that it occurred in the dead 20ms space between each pair of transmissions, this further reduced the error rate to around 3%
So far I have been able to confirm that the transmitting side was working correctly and the scope showed the correct number of clock pulses. Having spent a whole bunch of time debugging this issue and making a mess of the code branch with all the extra debugging stuff I decided to create a test harness project for this with a dedicated sketch for each pico.
The test harness I created to debug this issue is here on my GitHub account:
For the test harness created 2 sequential float arrays (1.1, 1.2, 1.3 etc.) of the same sizes I was using in my project. I dumped them out on the serial port as their respective byte arrays in hex and binary format to aid debugging on the scope. I used a lower 10Hz frequency for transmission of the 2 arrays and decoded them back into floats on the receiving side as before. This time I also allowed for missing data in the first transmission to be processed I had purely been focusing on what was going on in the second transmission so far. I also maintained the delay between the first and second transmissions, increasing it to 1000uS
The only real differences I noticed with the new test code was that I sometimes had 2 missing bytes rather than 1 missing, also the missing byte(s) are at the end rather than at the beginning as only the last float value was corrupted.
My test code has confirmed that the issue is still there, and is also occurring in the first transmission, but only at a round half the error rate of the second transmission. I also used the protocol decoder on my scope to check the bytes against the hex values that I output over serial debugging. By triggering off a low pulse from a debug pin indicating missing data I was able to confirm that all of the the bytes were indeed correct and present, despite what the onReceive interrupt reported.
The 48 bytes transmitted in the second transmission are as follows, separated by dashes and comma for clarity: ( float1byte1-float1byte2-float1byte3-float1byte4,float2byte1-float2byte2-float2byte3-float2byte4, .... )
Here is the protocol decoder output from the wire:
In the above scope image, an error has been detected on the second transmission of the pair, the yellow trace low pulse is the error marker, purple is i2c clock and blue is i2c SDA, green is the marker for the code which processes to data and also generates the yellow pulse if there is an error. As can be seen, the protocol decoder has decoded the data, showing the data in the second transmission. The last byte is 0x40 which is clearly visible in the blue trace. The interrupt handler only reported 47 bytes for this transmission and the decoded float value array had a corrupted last value due to the 0x40 not being received.
Ok, so now armed with the test harness and having confirmed that the correct bytes are always being sent, regardless of an error being detected at the receiver, I tried the following:
Reduced the transmission rate (the gap between each pair of transmissions), to 1Hz, errors still occurred around same percentage, I changed it back to 10Hz for the rest of the changes below.
Reduced the i2c clock to 400KHz (actually about 365KHz as measured), errors were now down to around 0.5 to 1%
Changed some serial debug output so that it occurred in-between the first and second transmission blocks as it was originally, errors now increased to around 1 to 2%
Reduced the i2c clock further to 100KHz (actually about 95 KHz as measured), errors were now down to around 0.007% and I had to leave it running for a while to get a decent amount of failures to generate the average failure percentage.
The circuit that I originally encountered this issue on is built on a protoboard with soldered wires, the connections between the i2c pins on each pico are very short (around 15mm) however there is an ADC connected to the transmitting Pico which is also being clocked by it using the Pico’s internal 12MHz clock output on a pin. The receiving pico was actually a PicoW and I also wanted to rule that out of the picture, even though there was no Wifi code being used at this stage.
To rule out my hardware configuration I rebuilt a simple version on a breadboard using 2 brand new Pico’s (no PicoW involved here) this time the i2c wires are bit longer at around 50mm due to the breadboard layout constraints. As before I connected the VBUS pins together so that the USB connected to one of the Pico’s also powers the other Pico. Obviously the ground pins are also connected together.
Breadboard test setup:
My actual project prototyping hardware which I originally encountered this issue on:
Results were the same on the breadboard setup, the issue still existed with the failure rates around the same percentages.
I then connected the VSYS pins of each Pico together and powered VSYS from 4 NiMH cells in series, around 5v. I used VSYS rather than VBUS to take advantage of the diode to avoid back-powering the USB bus on the laptop when connecting it to the USB port on the Pico.
I left it running with nothing else connected, no USB cable, no scope, no test wires sticking out of the breadboard etc. After some time had passed I connected the USB cable to the receiving Pico to check the serial output and found many errors had built up, at around the same rate as when i was powering from USB and had the scope connected. I left this running over night and it accumulated around 12,000 errors across both the first and second transmissions in the pair.
Minimal battery operated setup:
The reason for the above test was to rule out any noisy USB power issues, although seeing as the 3.3v reg on the Pico is a buck-boost converter this was unlikely to make any difference, and sure enough did not.
Having so far not determined if this is a hardware or software issue, my next steps will be as follows:
Attempt to read the expected number of bytes inside the onReceive interrupt rather than what is supplied by the interrupt handler function, Although this would not account for data missing at the start of the transmission as observed initially.
Bypass the Arduino Wire API and make use of the underlying Pico SDK directly, while also having a good look at the Wire API implementation.
Swap out just the receiving Pico for a different dev board, say a Teensy4
Swap out just the transmitting Pico for a different dev board, say a Teensy4
As I am currently using i2c1 on both of the Pico’s (Wire1) change the code and pins to use i2c0 (Wire)
Go back to the 2 Pico’s and power them both from an external 3.3v linear reg, bypassing the switching regulator.
Play around with the amount of data being sent in each transmission so see what difference it makes.