Break when using custom compiler-builtins + opt-level change #93
Labels
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: M-Labs/artiq-zynq#93
Loading…
Reference in New Issue
There is no content yet.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may exist for a short time before cleaning up, in most cases it CANNOT be undone. Continue?
The current master would break when the optimization level is changed to 's', with out custom
compiler-builtins
providing faster memcpy. We have to use faster optimization for getting higher speed from the ethernet driver as tested in zynq-rs experiments.Behavior:
artiq_coremgmt log
response, halt when doing RPC transfer)Reproduce:
crates.io
inCargo.toml
.The behavior is much worse in my branch which did some performance improvement for RPC, and upgraded to the latest zynq-rs dependency. Note that my branch modified RPC protocol for faster performance, you have to use my artiq fork if you need to do list/array RPC.
Things I've found:
For the current master, we can override selected crates (smoltcp, libboard_zynq, libasync) with opt-level s/2 without causing any bug (I've only did a few tests, not sure if it is perfectly fine), but it would break in my branch.
This sounds like some classic race condition somewhere, e.g. missing barriers in the Ethernet driver…
Fixed in
671968bac3
Well done!
Why does the comment say "start tcp transfer"? The ethernet layer doesn't know about TCP. Maybe it should say "start packet transfer"?
nice! Out of curiosity, what are the current ethernet performance numbers?
yes, would fix that later
I have some optimization plans later, including changing the optimization level and RPC protocol, and some RPC implementation. Currently the numbers are:
With my patch for RPC
With the test https://github.com/pca006132/artiq/blob/rpc/artiq/test/coredevice/test_performance.py
Note that the current
test_performance.py
in the master is broken as async RPC for zynq would not block until the buffer is sent, so we have to use normal RPC for testing the throughput.We can absolutely do it better later, we are now optimized for size but not speed. :)
And we haven't did any hack there.