corruption of kernel RPC values #8
Labels
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: M-Labs/artiq-zynq#8
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This incorrectly prints
0.0
:Making changes to the code e.g. removing the t0 or t1 statement or reducing the size of the data buffer makes the issue disappear.
Playing with stack/heap sizes does not help.
This seems to be the RPC buffer being written where it should not, changing the contents to
0xff
instead of0x00
makes the program printnan
.My first approach is tightening permissions in the page tables (branch
mem_protect
) so that some bogus memory access immediately causes a CPU exception at the instruction that caused it.It seems that both the intended destination and the victim of corruption are in the same readable+writable section, so that might not turn up anything - am I wrong?
corruption of kernel valuesto corruption of kernel RPC valuesIn the latest master, this would not print an incorrect value, instead it will exhibit some other weird behaviors...
rpc_send_common
:BorrowMutError
as we did not correctly terminated the kernel.More information regarding the data abort:
Next approach: dumping/diffing the kernel image before start and after crash.
After dumping the image into SD card, it seems that the corruption is caused by both
rpc::send_args
andcore1_tx.send
inrpc_send_common
.It seems that the RPC buffer is written into the image.
Before
send_args
and aftersend_args
:Before
core0_tx.send
and after:Code: https://git.m-labs.hk/pca006132/artiq-zynq/src/branch/corruption_test
Confirmed this works, well done @pca006132 !
Oh wow. Very nice find. I am very happy that this bug hunt is over.
Famous last words... there's still plenty of corruption and obscure crashes and some seems triggered by RPCs.