enhanced RTIO event submission (ACPKI) #55
Labels
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: M-Labs/artiq-zynq#55
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
https://github.com/m-labs/artiq/issues/1167#issuecomment-427188287
Implementation already done at Oxford, needs integration and testing (some potential for difficult bugs).
https://git.m-labs.hk/pca006132/artiq-zynq/src/branch/rtio/src
Currently implemented wide output. It takes ~490ns for normal output, and up to ~650ns for wide output. We may be able to reduce the latency if we enable L2 cache, but that is currently problematic as it breaks drivers.
That's quite slow compared to the numbers reported in https://github.com/m-labs/artiq/issues/1167#issuecomment-427188287 any idea as to the origin of the discrepancy?
I think that is probably due to L2 cache. We have not enabled the L2 cache currently.
If I move the status checking before sending sending events to RTIO, i.e. make the time for the CPU to execute instructions overlap with the time for the gateware to send back the status buffer, I could do 300ns or less. So the time is probably spent on code execution.
I'm currently working on enabling the L2 cache, which requires some configuration in the MMU etc.
With L2 cache enabled with instruction and data prefetch, the time is reduced to ~390ns, but still pretty slow comparing to the number reported in the issue.
Not much idea for the discrepancy, could that be related to static linking of the kernel? Not sure if there could be ~100ns of difference.
enhanced RTIO event submissionto enhanced RTIO event submission (ACPKI)The branch for ACPKI with L2 cache enabled: https://git.m-labs.hk/pca006132/artiq-zynq/src/branch/l2-cache
With optimization level changed from
z
tos
, the time is decreased by 10ns. I would expect some further improvement later when we change the optimization level to 2. Not sure if we can eventually get to 280ns.Sorry for late update, I have been working on RPC optimization for some time.
After toggling some CPU options and optimization options, the sustained output rate for the ACPKI interface can reach 300ns but not lower. I would try to finish the implementation in the upcoming days.
great! So that brings this to a par with Chris' implemenation.
I assume the position is now that there isn't likely to be any big improvement left without sacrificing some level of exception granularity (e.g. some form of batching API as Chris suggested), which would be a separate project...
"Batching" is not particularly relevant when there is regular DMA.
It's generally nice to do thigns without relying on pre-recorded DMA sequences. If batching allows us to reduce the time for compound RTIO events like DDS setting down to be comparable to a single RTIO event then we wouldn't need DMA in several places, which would be a big win for us.
Isn't batching just like DMA but with a different syntax and slightly different performance trade-offs?
That's not my understanding of Chris' proposal, but I'm also not best qualified to comment. Anyway, the main thing is that we thing that the current implementation is about as good as it's going to get for bare RTIO.
Batching would get significant performance improvement over single RTIO event, I've tried reordering before (when the time is ~490ns...) and got pretty large improvement (~50%). However, there should be less improvement now as the overhead for instruction execution is reduced due to CPU option tweaks.
Batching would probably be faster than DMA as we don't have to send the slice to core0, and we don't have to wait for replay of the DMA sequence. However, I think this requires some benchmark to prove/disprove the claim. I could write a simple implementation for batching and do a micro benchmark if that is needed.
If that's something you feel like doing, I would be curious to see how the numbers pan out.