Can SPI chip select timing be ensured without delay_ns? #9
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
ENC424J600 requires that the chip select CS_n (known as
NSS
on STM32 datasheets) must go low or high in-between each SPI transactions, as described on p.41 of its datasheet:The timing diagram on p.148 might also suggest a 60 ns delay is needed after data reception and before CS_n goes from low to high.
Currently (since
1add94c12e77924747f6bc5e9a3858f85fe3115d
), we added this delay by calling a user-provided delay function right afterembedded_hal::blocking::spi::transfer()
returns. It's been discussed on embedded_hal's repo thattransfer()
doens't guarantee the whole SCK period is passed during the final byte read operation (NB the definition of "final" is determined by the user, since the SPI slave usually keeps shifting out more data on next addresses), especially on STM32F4xx, so without a delay the chip might got "deselected" too early and the SPI transaction might be incomplete.Although we have implemented the delay mechanism, it might or might not be the only solution, especially when the user application doesn't want to provide a hard-defined delay function to the driver.
@occheung When you're free again, would you please show us how to test that this behaviour has been completely eliminated?
Take a step back and get a feel for the timescales involved! How long does one CPU clock cycle (instruction) take? How long is the SPI clock period? How many instructions do you think are there between asserting CS and the SPI transaction starting or between it ending and CS deassertion? Or just take a scope and look at the timing.
Hi Robert, it was stupid of me to phrase my question like this. Clearly, using a scope or logic analyser is suitable enough to check how CS changes w.r.t. data transactions (indicated by SCK pulses).
Actually, what I intend to ask is: under what situations would SPI transactions fail if there is not enough delay before CS gets deasserted and before CS gets asserted for the next contiguous transaction? Obviously, timing issues don't happen always (empirically speaking) - using my PmodNIC100 module on an STM32F407 board, the Ethernet controller can always be initialised reliably, and incoming/outgoing packets seem to be fine. It should be possible to test the packets thoroughly by having the board repeatedly transmit a packet at certain regular intervals, and use
tshark
to check if there is any lost or malformed packet.Still, I used a logic analyser to capture the SPI signals under two types of situation: during initialisation, and during smoltcp's polling. The captures reveal the following: (edit: these are empirical findings)
The current code adds a 60ns delay in each of the following:
Without changing these delays, CS always got deasserted at least 400ns after the final SCK pulse (min. req. 50ns), and CS always stayed asserted for at least 400ns in-between contiguous SPI transactions (min. req. 20ns).
When these additional delays were removed, the CS assertion between transactions would often be skipped (i.e. NSS would stay low between two back-to-back SPI commands). However, the controller would still send the correct data (aligned to SCK) for every command -- which is only because an SPI transaction must always be initiated by the master, which marks the beginning of each transaction.
In the occasion where CS assertions wasn't skipped without the additional delays, CS always got deasserted ~250ns after the final SCK pulse, and stayed up for ~250ns, which would still meet the timing requirements.
By skim-reading, there was no difference in terms of data received from the controller, if the delays were removed.
To clarify my thoughts and doubts:
The issue on embedded_hal might still hold. On both Booster and my own testing firmware, CPU is clocked at 168MHz. In my firmware, SPI SCK is set to 5MHz (capped at 14MHz by ENC424J600). Since the HAL blocks to poll the RXNE flag, which is asserted at the final sampling edge on SCK (RM0090 Rev 18, p.883), and there's only a volatile read operation on the RX buffer (via SPI_DR) right before the transfer function returns the read bytes to our driver, followed by deasserting CS, theoretically CS could be deasserted before the complete SPI SCK cycle during the slave's final output stage. This issue could very well still happen if we kept the delay that holds CS high, though I haven't tested.
Because I saw no difference in terms of data emitted by the slave, I am confused by ENC424J600 datasheet's statement:
I really doubt that we would ever need to assert/deassert CS in-between SPI transactions that are back-to-back. Eliminating such operations would greatly simplify our SPI implementation and reduce spinning time.
(edit: observing empirical results and disregarding actual specs is often dangerous)
I made a mistake writing the title for this open issue. It's not intended to say that @occheung's delay logic is going to fail - I'm rather confident it works to guarantee the SPI timing. I simpply meant to say that this might not be the only way to meet the timing requirements. And folks over on Booster's repo seem to want us to use embedded_hal's
DelayUs
, which is more coarse than both its CPU and ENC424J600's SPI clocks.Alternatively, we could use cortex_m's
asm::delay
to dedicate a fixed number of CPU spinning cycles for CS assertion/deassertion. But I don't think it's good to assume that all user applications usingembedded_hal
must operate at several hundred MHz CPU clocks.DelayUs
would be better.SPI NSS (chip select) might go high too early after data receivedto Can SPI chip select timing be ensured without delay_ns?To all reading this: most of my comments are for my own record and future reference. I am not requesting for any comments or answers from others, but if someone do I would greatly appreciate their help!
Update:
I read the datasheet more carefully and found two different types of valid CS behaviour:
If there are contiguous transactions that are not fixed-length, CS must be deasserted when the transaction ends.
Currently, all SPI transactions are done this way. Reads/writes of register data are done in the typical "Command - Address - Data" sequence. Reads/writes of SRAM data (i.e. Tx/Rx buffers) are done by sending a specific 8-byte opcode, followed by the data. Thus, in our current code, as long as CS is asserted, data will keep getting out/in while the address keeps incrementing.
If there are contiguous transactions (aka "instructions" on the datasheet) that are fixed-length, CS can be kept asserted.
This can actually apply to several SPI transactions we need during init as well as packet Tx/Rx. This means CS can indeed keep asserted for certain sequences of SPI transfers. The total duration of such SPI commands can be shortened by the simpler, fixed-length transactions as well as reduced CS deassertion.
Considering these two types of behaviour, I am going to keep the delays for (i) and remove CS deassertion logic for consecutive type-(ii) transactions. I'll repeatedly test my implementation, but that would only show empirical results.
To answer my original question, enforcing embedded_hal's
DelayUs
is still too coarse, so I am for keeping the delays in nanosecond scale. The problem is cortex-m'sasm::delay()
, which is based on instruction cycles, is effectively the only API for achieving nanosecond delays, and it depends on the CPU speed set by the user.(Side-note:
asm::delay()
takes 3-4 cycles whileasm::nop()
takes 1, butNOP
might not be necessary time-consuming on certain platforms, including STM32F4xx's Cortex-M4 (see ARM's manual ARM DUI 0553B, section 3.12.8). So, we can't go more precise than several CPU cycles, irrespectively of the CPU speed.)Because this driver has been implemented as "universal", I am against hardcoding CS delays this way, which is specific to STM32F4xx's max SYSCLK frequency and the Cortex-M cores. I will approach the rust-embedded community to try to bring the lack of nanosecond delay support to their attention.
Hi, for all who are interested, I've tried my best to properly address this issue in #11. Indeed, despite empirical findings, this driver should observe the timing requirements as stated by the datasheets. Thus, I kept the nanosecond delays for CS_n, but switched from a user-provided
delay_ns
closure to Cortex-M instruction-based delay (provided by the cortex-m crate). I also reduced the overall SPI transfer durations by using the fixed-byte instructions, which don't require CS_n deassertion at the end of the transfer.Although #11 only adds the delay for Cortex-M platforms, it can easily be extended for other CPUs with feature gates. Most platforms use hundreds of MHz clocks so the instruction-based delays should be precise enough.
I have already tested this new delay implementation with logic analyser, so the empirical proof should also be good enough. Therefore, we can close this discussion for now.