acpki: Zero out request_data #469

Merged
sb10q merged 3 commits from :zero-request-data into master 2026-04-16 17:01:39 +08:00
See https://git.m-labs.hk/M-Labs/artiq-zynq/issues/466#issuecomment-58521.
Ghost added 1 commit 2026-04-15 04:07:49 +08:00
sb10q requested review from mwojcik 2026-04-15 07:21:28 +08:00
Owner

Interesting. The block you pointed out is exactly why I thought it was okay not to clean the data; but obviously it's not covering the whole transfer - we're again dealing with 32-bit chunks of data, which gets transferred 64 bits at a time.

To decrease the overhead would it be feasible to clear the data up to the nearest 64 bit boundary? That would be either 0 or 1 extra clear.

Interesting. The block you pointed out is exactly why I thought it was okay not to clean the data; but obviously it's not covering the whole transfer - we're again dealing with 32-bit chunks of data, which gets transferred 64 bits at a time. To decrease the overhead would it be feasible to clear the data up to the nearest 64 bit boundary? That would be either 0 or 1 extra clear.
Ghost added 1 commit 2026-04-15 18:53:04 +08:00
Owner

To decrease the overhead

Is that actually faster? Tests are not free either, especially since they may involve a jump that can be mispredicted and cause CPU pipeline flushes.

> To decrease the overhead Is that actually faster? Tests are not free either, especially since they may involve a jump that can be mispredicted and cause CPU pipeline flushes.

Both of these approaches assumedly have one mispredicted branch, as .fill(0) will also have to loop over the array? I guess I could rearrange this so we do a OUT_BUFFER.transactions[0].request_data[(data.len() - 1) | 1] = 0; before the copy_from_slice? I'm not really sure how best to benchmark this — the RTIO dispatch still dominates this function after all!

This all said, I do also wonder if there should be a bounds check at the start of all the output_wide methods, as currently writing an array >16 will panic the kernel.

Both of these approaches assumedly have one mispredicted branch, as `.fill(0)` will also have to loop over the array? I guess I could rearrange this so we do a `OUT_BUFFER.transactions[0].request_data[(data.len() - 1) | 1] = 0;` *before* the `copy_from_slice`? I'm not really sure how best to benchmark this — the RTIO dispatch still dominates this function after all! This all said, I do also wonder if there should be a bounds check at the start of all the `output_wide` methods, as currently writing an array >16 will panic the kernel.
Owner

If there is no noticeable performance improvement, then the simplest code is better.

If there is no noticeable performance improvement, then the simplest code is better.
Owner

Yeah let's keep it simple for the wide fn, the single function will do fine with a single extra fill w/o checks.

Yeah let's keep it simple for the wide fn, the single function will do fine with a single extra fill w/o checks.
Ghost added 1 commit 2026-04-16 16:17:39 +08:00
mwojcik approved these changes 2026-04-16 16:18:50 +08:00
sb10q merged commit bb9e3f49c6 into master 2026-04-16 17:01:39 +08:00
Sign in to join this conversation.
No Reviewers
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: M-Labs/artiq-zynq#469