Low memcpy throughput #75
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Tested with the following code in artiq-zynq:
With ~630MiB/s throughput. Same for using
u32
, so should not be alignment issue.Seems a bit low with branch prediction, all cache turned on, etc.
The throughput is not a lot higher even when the data resides in cache.
Tested with 10KB arrays, the throughput is roughly 2000MiB/s. Definitely not normal.
If the cache is a SRAM with a 64-bit bus and 800MHz frequency, the maximum total throughput is
800e6*64/(8*1024*1024) = 6103 MiB/s
. And you need to read and write at the same time. It's not that far off...OK, forgot that the frequency is lower than a typical PC CPU...
By tweaking prefetch offset and
alloc_one_way
to avoid cache pollution formemcpy
, the throughput is increased to about 780MiB/s. Considering it is memcpy, the throughput should be doubled, so real throughput should be about 1500MiB/s.