poor Ethernet performance #30
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
IIRC just a couple MB/s from @astro's tests.
2 MB/s
First approach: set the DDR pages bufferable (writeback) in the MMU. Unfortunately, the eth descriptors (not the buffers) need to be placed into non-bufferable pages as they're smaller than a cacheline. That means they cannot be invalidated from L1 individually.
What is the Xilinx code doing?
The Xilinx embeddedsw doesn't contain cache/barrier instructions. It probably runs w/o bufferable pages.
The Linux driver uses only barriers which I am so far unable to adopt towards reliable behavior.
One non-bufferable MMU page just for the descriptors seems to be a promising solution...
Enabling MMU bufferable pages has doubled throughput.
M-Labs/artiq-zynq#55
The L2 cache would completely break the ethernet driver. That is probably related to cache invalidation issue, as a lot of the polling returns buffer with 0 length. There are several seconds of delay.
By adding
dcci_slice
for rx buffer, and returnErr(None)
when len is 0 forrecv_next
function, the delay is reduced, but there is still occasional latency spike of up to ~500ms which would not occur with L2 cache turned off.Also, there is still problem with RTIO analyzer buffer transmission, the stream is not closed even if I flush the data at the end of the transmission. The stream would close if I open another connection though.
The L2 cache problem is fixed by modifying the MMU setting for uncached slice. However, the rx speed is still pretty slow comparing to tx speed. There are also a few changes, like modified the cache flush for rx and prevented duplicates in the waker queue.
Branch:
We get good performance by changing memcpy implementation and optimization level to 's'/2. Those would be done in artiq-zynq, closing now