Post clock merge: test_dma fails #222

Closed
opened 2023-02-21 13:57:43 +08:00 by mwojcik · 1 comment

test_full_stack fails with:

test_dma_noerror (test_dma.TestDMA) ... ok
test_full_stack (test_dma.TestDMA) ... ERROR

======================================================================
ERROR: test_full_stack (test_dma.TestDMA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/spaqin/m-labs/artiq-zynq/src/gateware/test_dma.py", line 228, in test_full_stack
    run_simulation(tb, {"sys": [
  File "/nix/store/nb17b218w86c23a2hdrjxhs16nfzhdv7-python3-3.10.9-env/lib/python3.10/site-packages/migen/sim/core.py", line 414, in run_simulation
    s.run()
  File "/nix/store/nb17b218w86c23a2hdrjxhs16nfzhdv7-python3-3.10.9-env/lib/python3.10/site-packages/migen/sim/core.py", line 403, in run
    self._process_generators(cd)
  File "/nix/store/nb17b218w86c23a2hdrjxhs16nfzhdv7-python3-3.10.9-env/lib/python3.10/site-packages/migen/sim/core.py", line 357, in _process_generators
    request = generator.send(reply)
  File "/home/spaqin/m-labs/artiq-zynq/src/gateware/test_dma.py", line 105, in do_dma
    raise RTIOUnderflow
artiq.coredevice.exceptions.RTIOUnderflow

Happens due to changes in mainline ARTIQ. As DMA tests work there I assume it's due to some little incompatibility in Zynq DMA module.

Running a vcdiff tool against previous iteration pinpoints first difference:

diff #148
==================
(good_dma.vcd).fullstacktb.rtio_record0_we	= 1
(dma.vcd).fullstacktb.rtio_record0_we     	= 0

diff #156
==================
(good_dma.vcd).fullstacktb.cri_master_cri_o_status[2:0]	= 000
(dma.vcd).fullstacktb.cri_master_cri_o_status[2:0]     	= 010

(status at 010 points to RTIO underflow already, as a consequence I assume)

``test_full_stack`` fails with: ``` test_dma_noerror (test_dma.TestDMA) ... ok test_full_stack (test_dma.TestDMA) ... ERROR ====================================================================== ERROR: test_full_stack (test_dma.TestDMA) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/spaqin/m-labs/artiq-zynq/src/gateware/test_dma.py", line 228, in test_full_stack run_simulation(tb, {"sys": [ File "/nix/store/nb17b218w86c23a2hdrjxhs16nfzhdv7-python3-3.10.9-env/lib/python3.10/site-packages/migen/sim/core.py", line 414, in run_simulation s.run() File "/nix/store/nb17b218w86c23a2hdrjxhs16nfzhdv7-python3-3.10.9-env/lib/python3.10/site-packages/migen/sim/core.py", line 403, in run self._process_generators(cd) File "/nix/store/nb17b218w86c23a2hdrjxhs16nfzhdv7-python3-3.10.9-env/lib/python3.10/site-packages/migen/sim/core.py", line 357, in _process_generators request = generator.send(reply) File "/home/spaqin/m-labs/artiq-zynq/src/gateware/test_dma.py", line 105, in do_dma raise RTIOUnderflow artiq.coredevice.exceptions.RTIOUnderflow ``` Happens due to changes in mainline ARTIQ. As DMA tests work there I assume it's due to some little incompatibility in Zynq DMA module. Running a vcdiff tool against previous iteration pinpoints first difference: ``` diff #148 ================== (good_dma.vcd).fullstacktb.rtio_record0_we = 1 (dma.vcd).fullstacktb.rtio_record0_we = 0 diff #156 ================== (good_dma.vcd).fullstacktb.cri_master_cri_o_status[2:0] = 000 (dma.vcd).fullstacktb.cri_master_cri_o_status[2:0] = 010 ``` (status at 010 points to RTIO underflow already, as a consequence I assume)
mwojcik self-assigned this 2023-02-21 13:57:43 +08:00
Poster
Owner

I found the culprit.

Turns out to be not much of a deal, but with DMA being important, I could not let it go by.

The DMA operation did not fit within minimum_coarse_timestamp in RTIO SED.
This is the line that caused the issue: ad000609ce (diff-5e06a98d471b8294533a49a8a654b41349ff57e21557a0cb21e5a66d554eec15L69)

There used to be coarse_ts and coarse_ts_sys, the latter used here. By looking at vcd waveforms I found that they differed by 4.

The things about coarse_ts_sys was that it essentially was coarse_ts from RTIO domain that was transferred to the SYS domain. The transfer took 4 cycles in total (two syncs + MultiReg).

Essentially at any given moment coarse_ts_sys = coarse_ts - 4.
Thus minimum_coarse_delay should be coarse_ts + 12, rather than 16. And after I decreased that value, the test would pass.

I will make a PR for mainline ARTIQ fixing that tiny ommission after I verify that no tests in ARTIQ are broken after that change either.

I found the culprit. Turns out to be not much of a deal, but with DMA being important, I could not let it go by. The DMA operation did not fit within ``minimum_coarse_timestamp`` in RTIO SED. This is the line that caused the issue: https://github.com/m-labs/artiq/commit/ad000609ced68ab84457265bd5b12b28429cf0a4#diff-5e06a98d471b8294533a49a8a654b41349ff57e21557a0cb21e5a66d554eec15L69 There used to be ``coarse_ts`` and ``coarse_ts_sys``, the latter used here. By looking at vcd waveforms I found that they differed by 4. The things about ``coarse_ts_sys`` was that it essentially was ``coarse_ts`` from RTIO domain that was transferred to the SYS domain. The transfer took 4 cycles in total (two ``sync``s + MultiReg). Essentially at any given moment ``coarse_ts_sys = coarse_ts - 4``. Thus minimum_coarse_delay should be coarse_ts + 12, rather than 16. And after I decreased that value, the test would pass. I will make a PR for mainline ARTIQ fixing that tiny ommission after I verify that no tests in ARTIQ are broken after that change either.
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/artiq-zynq#222
There is no content yet.