fix (workaround) drtioaux packets being corrupted #176

Merged
sb10q merged 3 commits from mwojcik/artiq-zynq:fix_drtio_corruption into master 2022-04-01 14:15:14 +08:00

Following the thread on m-labs forum: https://forum.m-labs.hk/d/377-drtio-with-zc706-master-and-kasli-v11-satellite/6

Issue was reproduced on Kasli-SoC - for some reason writes to DRTIOAUX memory may fail, not changing the contents, causing the satellite to report CRC issues. Investigation turned nothing really (AXI-specific instructions/burst modes were thought to be the culprit, but it turned out there were none)

There is a little more to be said here - if there's no kernel running, master sends DestinationStatusRequest packets periodically to the satellite with no problems at all; but once a kernel is run, the corruption starts occuring (and does not stop on its own). More weird is that running the kernel doesn't really change the communications (there are no additional packets sent), but it does starts corrupting.

Running a loop that checks the contents and rewrites if necessary solves the problem, at least from a practical standpoint.

However, this does not fix self.core.get_rtio_destination_status - the function in Zynq is not implemented (yet). That function currently returns true only if the destination is 0 (that is, if the master is up). It will be implemented soon, however.

Following the thread on m-labs forum: https://forum.m-labs.hk/d/377-drtio-with-zc706-master-and-kasli-v11-satellite/6 Issue was reproduced on Kasli-SoC - for some reason writes to DRTIOAUX memory may fail, not changing the contents, causing the satellite to report CRC issues. Investigation turned nothing really (AXI-specific instructions/burst modes were thought to be the culprit, but it turned out there were none) There is a little more to be said here - if there's no kernel running, master sends ``DestinationStatusRequest`` packets periodically to the satellite with no problems at all; but once a kernel is run, the corruption starts occuring (and does not stop on its own). More weird is that running the kernel doesn't really change the communications (there are no additional packets sent), but it does starts corrupting. Running a loop that checks the contents and rewrites if necessary solves the problem, at least from a practical standpoint. However, this *does not* fix ``self.core.get_rtio_destination_status`` - the function in Zynq is not implemented (*yet*). That function currently returns ``true`` only if the destination is 0 (that is, if the master is up). It will be implemented soon, however.

There is a little more to be said here - if there's no kernel running, master sends DestinationStatusRequest packets periodically to the satellite with no problems at all; but once a kernel is run, the corruption starts occuring (and does not stop on its own). More weird is that running the kernel doesn't really change the communications (there are no additional packets sent), but it does starts corrupting.

Did you check caches/write buffers?

> There is a little more to be said here - if there's no kernel running, master sends DestinationStatusRequest packets periodically to the satellite with no problems at all; but once a kernel is run, the corruption starts occuring (and does not stop on its own). More weird is that running the kernel doesn't really change the communications (there are no additional packets sent), but it does starts corrupting. Did you check caches/write buffers?
Poster
Owner

Cache could make sense from what I read about them - with being involved in AXI transactions quite directly, for one. However... this is not the case.

I tried disabling L2 cache (well, not enabling it in the first place) - didn't help; experimented with various combinations of flushing and invalidating the cache as well, from the parts that would be affected, to the entire content, to no avail.

The entire AXI thing is managed by a memory controller actually (according to TRM), that's why there aren't any specific instructions in the disassembly that I looked through. I think I'll have a deeper look into that later.

Cache could make sense from what I read about them - with being involved in AXI transactions quite directly, for one. However... this is not the case. I tried disabling L2 cache (well, not enabling it in the first place) - didn't help; experimented with various combinations of flushing and invalidating the cache as well, from the parts that would be affected, to the entire content, to no avail. The entire AXI thing is managed by a memory controller actually (according to TRM), that's why there aren't any specific instructions in the disassembly that I looked through. I think I'll have a deeper look into that later.

It could also just be some bugs with the AXI implementation of the SRAM. It's easy to get it wrong. http://zipcpu.com/ has some information about common AXI problems and verification of AXI cores.

It could also just be some bugs with the AXI implementation of the SRAM. It's easy to get it wrong. http://zipcpu.com/ has some information about common AXI problems and verification of AXI cores.
mwojcik force-pushed fix_drtio_corruption from b1d0c1d8a2 to 77ca020450 2022-03-30 17:27:20 +08:00 Compare
Poster
Owner

Because the fix seems like it's doing nothing in particular, I feel like it requires a bit of explanation.

Alright so: the implementation of the axi2csr bridge is alright - it does the job in very specific conditions - there are no bursts, all writes are 32-bit, as there's no granularity for write enable, reads can be arbitrary. Of course throughput is limited, but it's OK for most CSR uses (single word writes and reads) and given not-so-big sizes of DRTIO AUX buffers the performance penalty is somewhat acceptable. Sure, it would be good if it behaved natively, but that would require a massive overhaul and implementing most of the AXI specification.

If bursts are generated by the core, then this module will cause issues and unwritten bytes, which would (and did in previous tests) cause issues down the line. Adding to that, there was problem with unaligned access - while permitted by the specification, caused a data abort on Zynq. Thus the work buffer was created, so the DRTIO modules could write freely in the RAM, which would be then copied over to DRTIO AUX transmitter. The good part is that Zynq TRM (ug585) mentions straightforwardly that bursts are not generated for 8- and 16-bit writes/reads, so 16-bit copy loop was used. And that worked, for startup and establishing a connection.

As you may notice, there was a lot of worry about the burst modes, and the other condition was omitted (or rather, went unnoticed) - all writes are 32-bit - because the bus is 32 bit wide, and even the 2 lsbs of the address are discarded; and AXI's WSTRB signal (which would indicate width of the write) is not even translated over, but a single we is used.

Thus, looking at it I'm not even sure how it worked in the way it did before.

The AXI master controller would send a normal 16-bit write with some garbage on the unused lanes of the data bus, and that garbage would get written as well, since the axi2csr bridge doesn't work with WSTRB, causing corruption (and then CRC errors). So, at least in my view, there are three ways of dealing with this:

  • amending axi2csr with some interpretation of wstrb (probably a tiny extra "bus" parallel to internal_csr, only connected to Memory/SRAM we with increased granularity) - feels like a workaround still and doesn't really solve anything, just attaching a hack on top of a hack, plus would need a PR in migen-axi with me explaining why another hack is a necessity

  • rewriting axi2csr (or SRAM/memory controller, or AXI slave bus) with full AXI support - that's a lot of work involved, would probably take months to get it all down correctly within migen-axi

  • writing 32 bits at a time in software, making sure bursts don't happen still - there's a bit of reliance on trying to confuse the memory controller on Zynq, but if this works (spoiler: it does), it will fix the original issue while also not adding more complexity to the FPGA part.

So I went with the last option and tested it with Kasli-SoC/Kasli 1.1 configuration. Neither internal checks detected any discrepancies, nor did Kasli satellite complain about CRC errors anymore.

Because the fix seems like it's doing nothing in particular, I feel like it requires a bit of explanation. Alright so: the implementation of the axi2csr bridge is *alright* - it does the job in very specific conditions - there are no bursts, all writes are 32-bit, as there's no granularity for write enable, reads can be arbitrary. Of course throughput is limited, but it's OK for most CSR uses (single word writes and reads) and given not-so-big sizes of DRTIO AUX buffers the performance penalty is somewhat acceptable. Sure, it would be good if it behaved natively, but that would require a massive overhaul and implementing most of the AXI specification. If bursts are generated by the core, then this module will cause issues and unwritten bytes, which would (and did in previous tests) cause issues down the line. Adding to that, there was problem with unaligned access - while permitted by the specification, caused a data abort on Zynq. Thus the work buffer was created, so the DRTIO modules could write freely in the RAM, which would be then copied over to DRTIO AUX transmitter. The good part is that Zynq TRM (ug585) mentions straightforwardly that bursts are not generated for 8- and 16-bit writes/reads, so 16-bit copy loop was used. And that worked, for startup and establishing a connection. As you may notice, there was a lot of worry about the burst modes, and the other condition was omitted (or rather, went unnoticed) - *all writes are 32-bit* - because the bus is 32 bit wide, and even the 2 lsbs of the address are discarded; and AXI's ``WSTRB`` signal (which would indicate width of the write) is not even translated over, but a single ``we`` is used. Thus, looking at it I'm not even sure how it worked in the way it did before. The AXI master controller would send a normal 16-bit write with some garbage on the unused lanes of the data bus, and that garbage would get written as well, since the axi2csr bridge doesn't work with ``WSTRB``, causing corruption (and then CRC errors). So, at least in my view, there are three ways of dealing with this: - amending ``axi2csr`` with some interpretation of ``wstrb`` (probably a tiny extra "bus" parallel to internal_csr, only connected to Memory/SRAM ``we`` with increased granularity) - feels like a workaround still and doesn't really solve anything, just attaching a hack on top of a hack, plus would need a PR in migen-axi with me explaining why another hack is a necessity - rewriting ``axi2csr`` (or SRAM/memory controller, or AXI slave bus) with full AXI support - that's a lot of work involved, would probably take months to get it all down correctly within migen-axi - writing 32 bits at a time in software, making sure bursts don't happen still - there's a bit of reliance on trying to confuse the memory controller on Zynq, but if this works (spoiler: it does), it will fix the original issue while also not adding more complexity to the FPGA part. So I went with the last option and tested it with Kasli-SoC/Kasli 1.1 configuration. Neither internal checks detected any discrepancies, nor did Kasli satellite complain about CRC errors anymore.

Adding to that, there was problem with unaligned access - while permitted by the specification, caused a data abort on Zynq.

IIRC they work on Zynq but you need the MMU enabled. The asm startup code used by ARTIQ should enable it already. Are you pulsing POR before each JTAG load?

> Adding to that, there was problem with unaligned access - while permitted by the specification, caused a data abort on Zynq. IIRC they work on Zynq but you need the MMU enabled. The asm startup code used by ARTIQ should enable it already. Are you pulsing POR before each JTAG load?
Poster
Owner

Well then that didn't exactly work then, not sure why then. However in current situation writes and reads have to be aligned and 32 bit wide anyway, since separate bytes aren't addressable with this setup. Even if that didn't cause a data abort (maybe something axi-related is the case actually?) that could still lead to corruption. I think it's something to take a closer look at when the gateware side is rewritten.

Well then that didn't exactly work then, not sure why then. However in current situation writes and reads have to be aligned and 32 bit wide anyway, since separate bytes aren't addressable with this setup. Even if that didn't cause a data abort (maybe something axi-related is the case actually?) that could still lead to corruption. I think it's something to take a closer look at when the gateware side is rewritten.
sb10q reviewed 2022-03-31 10:47:09 +08:00
@ -67,1 +65,4 @@
for i in (0..(len/4)).step_by(2) {
//mix offsets to prevent bursts
*dst.offset(i+1) = *src.offset(i+1);
*dst.offset(i) = *src.offset(i);

I think the compiler is allowed to reorder those, unless the memory is marked as volatile.

I think the compiler is allowed to reorder those, unless the memory is marked as volatile.
mwojcik added 1 commit 2022-03-31 12:26:56 +08:00
Poster
Owner

Looks like there's no need for volatile memory - data memory barrier (dmb) seems to work as a good substitute.

I think bursts are not a problem with other (consecutive) CSR accesses as they're done within API usually and there's a bit of logic in between so they're not scheduled as bursts.

Except csr timestamps as their type is i64 - they should be looked at and in case of issues this could be a culprit.

Looks like there's no need for volatile memory - data memory barrier (``dmb``) seems to work as a good substitute. I think bursts are not a problem with other (consecutive) CSR accesses as they're done within API usually and there's a bit of logic in between so they're not scheduled as bursts. Except csr timestamps as their type is ``i64`` - they should be looked at and in case of issues this could be a culprit.

I think bursts are not a problem with other (consecutive) CSR accesses as they're done within API usually and there's a bit of logic in between so they're not scheduled as bursts.

That doesn't sound reliable and we may want to use dmb there as well.

> I think bursts are not a problem with other (consecutive) CSR accesses as they're done within API usually and there's a bit of logic in between so they're not scheduled as bursts. That doesn't sound reliable and we may want to use ``dmb`` there as well.
Poster
Owner

That doesn't sound reliable and we may want to use dmb there as well.

That will have to be done internally within auto-generated CSR rust file (which is done within misoc, rather than this repo) as dmb would have to be added between consecutive u32 reads. Adding it post read (ie. in csr code for runtime) won't help. Actually, have been any issues noticed for that?

I'm also testing returning axi.Response.slverr on bursts longer than 1, but that's part of migen-axi too.

> That doesn't sound reliable and we may want to use ``dmb`` there as well. That will have to be done internally within auto-generated CSR rust file (which is done within misoc, rather than this repo) as ``dmb`` would have to be added between consecutive ``u32`` reads. Adding it post read (ie. in csr code for runtime) won't help. Actually, have been any issues noticed for that? I'm also testing returning ``axi.Response.slverr`` on bursts longer than 1, but that's part of migen-axi too.

Sure, send a misoc patch.

Sure, send a misoc patch.
Poster
Owner

I'm also testing returning axi.Response.slverr on bursts longer than 1, but that's part of migen-axi too.

I went quite deep into it (e.g. learned that despite errors, axi transactions still need to go through in full), turns out the master happily ignores the error (absolutely no change in behavior, even trying to force something out by using u64).

While testing that I also tried a variant where it would deliberately hang if a burst was issued - and that seemed to work still. Looks like csr writes/reads over 32 bit (which are broken down beforehand) don't generate bursts after all.

I think this could be mergable for now - with CSR/Misoc fix losing priority, and not really being a part of this PR.

> I'm also testing returning ``axi.Response.slverr`` on bursts longer than 1, but that's part of migen-axi too. I went quite deep into it (e.g. learned that despite errors, axi transactions still need to go through in full), turns out the master happily ignores the error (absolutely no change in behavior, even trying to force something out by using ``u64``). While testing that I also tried a variant where it would deliberately hang if a burst was issued - and that seemed to work still. Looks like csr writes/reads over 32 bit (which are broken down beforehand) don't generate bursts after all. I think this could be mergable for now - with CSR/Misoc fix losing priority, and not really being a part of this PR.
sb10q merged commit dcfb28ce61 into master 2022-04-01 14:15:14 +08:00
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/artiq-zynq#176
There is no content yet.