Compiler accepts returning allocated types from kernel #1502
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Bug Report
One-Line Summary
String RPC return values (and/or kernel return values?) seem to be getting corrupted, which can indirectly cause a kernel panic.
Issue Details
Steps to Reproduce
Run the following experiment:
Expected Behavior
Prints
a stringto the console.Actual (undesired) Behavior
Prints
ringto the console. And if you uncomment theprintstatement in the kernel it causes a kernel panic with the following message:And on the host side:
Strangely, it prints the correct string before panicking. It almost looks like the components work fine independently -- printing a string literal from the kernel works fine, as does returning a string literal from the kernel, and based on the above output it seems that the RPC return value is at least mostly correct when received in the kernel (possibly with some extra non-printable byte(s) that cause the panic?) -- but when used in conjunction this issue appears.
Your System (omit irrelevant parts)
Also holds if I modify the types to
bytes:Output:
BringHowever if I uncomment the
printstatement I get something new:Then, removing the
.decode():And it gets weirder! (I might be having too much fun with this.) If I iterate over the bytes in-kernel and print them individually it changes the return value!
output:
Most likely an allocation size isn't computed correctly somewhere (firmware and/or compiler) and receiving the string corrupts memory.
Maybe related to #1934?
After some research, found out
rpc_send_asyncreceives corrupted data. In some cases some parts (second 4 bytes, more often) may be correct. Also, can confirm there is no corruption with copied/owned data. Moreover, slicing[:]also generates normal output.Still, not sure what causes such behavior.
Given how the compiler is implemented currently, returning a string from a kernel can't work, and neither for any other "allocated" type (arrays/lists), as the backing allocation for the elements is on the stack frame of
get_string(), while the code marshalling the value back to the host is in an implicitmain-type function generated by the compiler. I'm not sure why this isn't caught by the escape analysis; my previous comments in #1497 indicate that it used to, though it might be a related issue. This is an "accepts invalid" bug – the code shouldn't compile, but does. We could think about changing the codegen such that the top-level function is special-cased to make the particular case of returning an array from a kernel work.If you can make
print(my_str)(or any other type of RPC) receive corrupted data without that, that's a separate bug, though; passing the data to an RPC while inget_string()should totally work, and is in fact the "intended" way of passing array data back to the host.I can't seem to reproduce the print() crash on current master, though – unless you can, we can probably close the remaining issue as a duplicate of #1497/#1677.
Checked latest master, and the crash is not being reproduced anymore. Though bytes/str corruption still persists.
As for me, it still looks like a legitimate code, I'm not sure that the compiler should not accept it.
Right, this doesn't actually have to do with RPC but with returning a string from a kernel function.
Remember that in kernels we don't have a heap.
Also see https://github.com/m-labs/artiq/issues/1298.