Compiler accepts returning allocated types from kernel #1502

Closed
opened 2026-01-18 19:03:50 +08:00 by bradbqc · 10 comments

Migrated from GitHub: #1967


Bug Report

One-Line Summary

String RPC return values (and/or kernel return values?) seem to be getting corrupted, which can indirectly cause a kernel panic.

Issue Details

Steps to Reproduce

Run the following experiment:

from artiq.experiment import *


class BadString(EnvExperiment):
    def build(self):
        self.setattr_device('core')

    @rpc
    def get_string_rpc(self) -> TStr:
        return "a string"

    @kernel
    def get_string(self) -> TStr:
        self.core.reset()
        my_str = self.get_string_rpc()
        # seems to cause the kernel panic
        # print(my_str)
        return my_str

    def run(self):
        print(self.get_string())

Expected Behavior

Prints a string to the console.

Actual (undesired) Behavior

Prints ring to the console. And if you uncomment the print statement in the kernel it causes a kernel panic with the following message:

[ 64082.239740s]  INFO(kernel): panic at /nix/store/2w4h6z0k128aa52r1law0zysi68arp34-python3.9-artiq-7.8123.3038639/lib/python3.9/site-packages/artiq/firmware/libproto_artiq/rpc_proto.rs:209:69: called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }
[ 64082.264133s] ERROR(runtime::session): session aborted: unexpected request RunAborted from kernel CPU

And on the host side:

a string
Traceback (most recent call last):
  File "/nix/store/2w4h6z0k128aa52r1law0zysi68arp34-python3.9-artiq-7.8123.3038639/bin/.artiq_run-wrapped", line 9, in <module>
    sys.exit(main())
  File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 224, in main
    return run(with_file=True)
  File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 210, in run
    raise exn
  File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 203, in run
    exp_inst.run()
  File "bad_string.py", line 21, in run
    print(self.get_string())
  File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/language/core.py", line 54, in run_on_core
    return getattr(self, arg).run(run_on_core, ((self,) + k_args), k_kwargs)
  File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/coredevice/core.py", line 140, in run
    self._run_compiled(kernel_library, embedding_map, symbolizer, demangler)
  File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/coredevice/core.py", line 130, in _run_compiled
    self.comm.serve(embedding_map, symbolizer, demangler)
  File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/coredevice/comm_kernel.py", line 706, in serve
    self._read_header()
  File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/coredevice/comm_kernel.py", line 249, in _read_header
    sync_byte = self._read(1)[0]
  File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/coredevice/comm_kernel.py", line 237, in _read
    raise ConnectionResetError("Core device connection closed unexpectedly")
ConnectionResetError: Core device connection closed unexpectedly

Strangely, it prints the correct string before panicking. It almost looks like the components work fine independently -- printing a string literal from the kernel works fine, as does returning a string literal from the kernel, and based on the above output it seems that the RPC return value is at least mostly correct when received in the kernel (possibly with some extra non-printable byte(s) that cause the panic?) -- but when used in conjunction this issue appears.

Your System (omit irrelevant parts)

  • Operating System: Ubuntu 22.04
  • ARTIQ version: 7.8123.3038639
  • Version of the gateware and runtime loaded in the core device: 7.8123.3038639
  • Hardware involved: Kasli v1.1 (full device db can be found here)
> **Migrated from GitHub:** [#1967](https://github.com/m-labs/artiq/issues/1967) --- <!-- Above are non-Markdown tags for Github auto-prompting issue type. Template based on pylint: https://raw.githubusercontent.com/PyCQA/pylint/master/.github/ISSUE_TEMPLATE/ --> # Bug Report <!-- Thanks for reporting a bug report to ARTIQ! You can also discuss issues and ask questions on IRC (the [#m-labs channel on freenode](https://webchat.freenode.net/?channels=m-labs) or on the [forum](https://forum.m-labs.hk). Please check Github/those forums to avoid posting a repeat issue. Context helps us fix issues faster, so please include the following when relevant: --> ## One-Line Summary String RPC return values (and/or kernel return values?) seem to be getting corrupted, which can indirectly cause a kernel panic. ## Issue Details ### Steps to Reproduce Run the following experiment: ```python from artiq.experiment import * class BadString(EnvExperiment): def build(self): self.setattr_device('core') @rpc def get_string_rpc(self) -> TStr: return "a string" @kernel def get_string(self) -> TStr: self.core.reset() my_str = self.get_string_rpc() # seems to cause the kernel panic # print(my_str) return my_str def run(self): print(self.get_string()) ``` ### Expected Behavior Prints `a string` to the console. ### Actual (undesired) Behavior Prints `ring` to the console. And if you uncomment the `print` statement in the kernel it causes a kernel panic with the following message: ``` [ 64082.239740s] INFO(kernel): panic at /nix/store/2w4h6z0k128aa52r1law0zysi68arp34-python3.9-artiq-7.8123.3038639/lib/python3.9/site-packages/artiq/firmware/libproto_artiq/rpc_proto.rs:209:69: called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) } [ 64082.264133s] ERROR(runtime::session): session aborted: unexpected request RunAborted from kernel CPU ``` And on the host side: ``` a string Traceback (most recent call last): File "/nix/store/2w4h6z0k128aa52r1law0zysi68arp34-python3.9-artiq-7.8123.3038639/bin/.artiq_run-wrapped", line 9, in <module> sys.exit(main()) File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 224, in main return run(with_file=True) File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 210, in run raise exn File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 203, in run exp_inst.run() File "bad_string.py", line 21, in run print(self.get_string()) File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/language/core.py", line 54, in run_on_core return getattr(self, arg).run(run_on_core, ((self,) + k_args), k_kwargs) File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/coredevice/core.py", line 140, in run self._run_compiled(kernel_library, embedding_map, symbolizer, demangler) File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/coredevice/core.py", line 130, in _run_compiled self.comm.serve(embedding_map, symbolizer, demangler) File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/coredevice/comm_kernel.py", line 706, in serve self._read_header() File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/coredevice/comm_kernel.py", line 249, in _read_header sync_byte = self._read(1)[0] File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/coredevice/comm_kernel.py", line 237, in _read raise ConnectionResetError("Core device connection closed unexpectedly") ConnectionResetError: Core device connection closed unexpectedly ``` Strangely, it prints the correct string before panicking. It almost looks like the components work fine independently -- printing a string literal from the kernel works fine, as does returning a string literal from the kernel, and based on the above output it seems that the RPC return value is at least mostly correct when received in the kernel (possibly with some extra non-printable byte(s) that cause the panic?) -- but when used in conjunction this issue appears. ### Your System (omit irrelevant parts) * Operating System: Ubuntu 22.04 * ARTIQ version: 7.8123.3038639 * Version of the gateware and runtime loaded in the core device: 7.8123.3038639 * Hardware involved: Kasli v1.1 (full device db can be found [here](https://gitlab.com/duke-artiq/dax/-/blob/release-7-cleanup/test/hw_test/device_db.py)) <!-- For in-depth information on bug reporting, see: http://www.chiark.greenend.org.uk/~sgtatham/bugs.html https://developer.mozilla.org/en-US/docs/Mozilla/QA/Bug_writing_guidelines -->
bradbqc added the area:compiler label 2026-01-18 19:03:50 +08:00
esavkin was assigned by bradbqc 2026-01-18 19:03:50 +08:00
sb10q closed this issue 2026-01-18 19:03:50 +08:00
Author

Also holds if I modify the types to bytes:

from artiq.experiment import *


class BadString(EnvExperiment):
    def build(self):
        self.setattr_device('core')

    @rpc
    def get_string_rpc(self) -> TBytes:
        return bytes('a string', encoding='utf-8')

    @kernel
    def get_string(self) -> TBytes:
        self.core.reset()
        my_str = self.get_string_rpc()
        # print(my_str)
        return my_str

    def run(self):
        print(self.get_string().decode('utf-8'))

Output: Bring

However if I uncomment the print statement I get something new:

bytearray(b'a string')
Traceback (most recent call last):
  File "/nix/store/2w4h6z0k128aa52r1law0zysi68arp34-python3.9-artiq-7.8123.3038639/bin/.artiq_run-wrapped", line 9, in <module>
    sys.exit(main())
  File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 224, in main
    return run(with_file=True)
  File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 210, in run
    raise exn
  File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 203, in run
    exp_inst.run()
  File "bad_string.py", line 21, in run
    print(self.get_string().decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 0: invalid start byte

Then, removing the .decode():

bytearray(b'a string')
bytearray(b'\xa8\x05\x06E\x00\x00\x00\x00')
Also holds if I modify the types to `bytes`: ```python from artiq.experiment import * class BadString(EnvExperiment): def build(self): self.setattr_device('core') @rpc def get_string_rpc(self) -> TBytes: return bytes('a string', encoding='utf-8') @kernel def get_string(self) -> TBytes: self.core.reset() my_str = self.get_string_rpc() # print(my_str) return my_str def run(self): print(self.get_string().decode('utf-8')) ``` Output: `Bring` However if I uncomment the `print` statement I get something new: ``` bytearray(b'a string') Traceback (most recent call last): File "/nix/store/2w4h6z0k128aa52r1law0zysi68arp34-python3.9-artiq-7.8123.3038639/bin/.artiq_run-wrapped", line 9, in <module> sys.exit(main()) File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 224, in main return run(with_file=True) File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 210, in run raise exn File "/nix/store/7qs5shpgswzik9vsr7rkvz2l7znzmjz0-python3-3.9.13-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 203, in run exp_inst.run() File "bad_string.py", line 21, in run print(self.get_string().decode('utf-8')) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 0: invalid start byte ``` Then, removing the `.decode()`: ``` bytearray(b'a string') bytearray(b'\xa8\x05\x06E\x00\x00\x00\x00') ```
Author

And it gets weirder! (I might be having too much fun with this.) If I iterate over the bytes in-kernel and print them individually it changes the return value!

from artiq.experiment import *


class BadString(EnvExperiment):
    def build(self):
        self.setattr_device('core')

    @rpc
    def get_string_rpc(self) -> TBytes:
        return bytes('a string', encoding='utf-8')

    @kernel
    def get_string(self) -> TBytes:
        self.core.reset()
        my_str = self.get_string_rpc()
        print(my_str)
        for b in my_str:
            print(int(b))
        return my_str

    def run(self):
        my_str = self.get_string()
        print(my_str)
        for b in my_str:
            print(int(b))

output:

bytearray(b'a string')
97
32
115
116
114
105
110
103
bytearray(b'\x04\x00\x00D\xfc\x0f\x00\x00')
4
0
0
68
252
15
0
0
And it gets weirder! (I might be having too much fun with this.) If I iterate over the bytes in-kernel and print them individually it changes the return value! ```python from artiq.experiment import * class BadString(EnvExperiment): def build(self): self.setattr_device('core') @rpc def get_string_rpc(self) -> TBytes: return bytes('a string', encoding='utf-8') @kernel def get_string(self) -> TBytes: self.core.reset() my_str = self.get_string_rpc() print(my_str) for b in my_str: print(int(b)) return my_str def run(self): my_str = self.get_string() print(my_str) for b in my_str: print(int(b)) ``` output: ``` bytearray(b'a string') 97 32 115 116 114 105 110 103 bytearray(b'\x04\x00\x00D\xfc\x0f\x00\x00') 4 0 0 68 252 15 0 0 ```
Owner

Most likely an allocation size isn't computed correctly somewhere (firmware and/or compiler) and receiving the string corrupts memory.

Most likely an allocation size isn't computed correctly somewhere (firmware and/or compiler) and receiving the string corrupts memory.
pmldrmota commented 2026-01-18 19:03:51 +08:00
Owner

Maybe related to #1934?

Maybe related to #1934?

After some research, found out rpc_send_async receives corrupted data. In some cases some parts (second 4 bytes, more often) may be correct. Also, can confirm there is no corruption with copied/owned data. Moreover, slicing [:] also generates normal output.
Still, not sure what causes such behavior.

After some research, found out `rpc_send_async` receives corrupted data. In some cases some parts (second 4 bytes, more often) may be correct. Also, can confirm there is no corruption with copied/owned data. Moreover, slicing `[:]` also generates normal output. Still, not sure what causes such behavior.
Contributor

Given how the compiler is implemented currently, returning a string from a kernel can't work, and neither for any other "allocated" type (arrays/lists), as the backing allocation for the elements is on the stack frame of get_string(), while the code marshalling the value back to the host is in an implicit main-type function generated by the compiler. I'm not sure why this isn't caught by the escape analysis; my previous comments in #1497 indicate that it used to, though it might be a related issue. This is an "accepts invalid" bug – the code shouldn't compile, but does. We could think about changing the codegen such that the top-level function is special-cased to make the particular case of returning an array from a kernel work.

If you can make print(my_str) (or any other type of RPC) receive corrupted data without that, that's a separate bug, though; passing the data to an RPC while in get_string() should totally work, and is in fact the "intended" way of passing array data back to the host.

Given how the compiler is implemented currently, returning a string from a kernel can't work, and neither for any other "allocated" type (arrays/lists), as the backing allocation for the elements is on the stack frame of `get_string()`, while the code marshalling the value back to the host is in an implicit `main`-type function generated by the compiler. I'm not sure why this isn't caught by the escape analysis; my previous comments in #1497 indicate that it used to, though it might be a related issue. This is an "accepts invalid" bug – the code shouldn't compile, but does. We could think about changing the codegen such that the top-level function is special-cased to make the particular case of returning an array from a kernel work. If you can make `print(my_str)` (or any other type of RPC) receive corrupted data without that, that's a separate bug, though; passing the data to an RPC while in `get_string()` should totally work, and is in fact the "intended" way of passing array data back to the host.
Contributor

I can't seem to reproduce the print() crash on current master, though – unless you can, we can probably close the remaining issue as a duplicate of #1497/#1677.

I can't seem to reproduce the print() crash on current master, though – unless you can, we can probably close the remaining issue as a duplicate of #1497/#1677.

Checked latest master, and the crash is not being reproduced anymore. Though bytes/str corruption still persists.

This is an "accepts invalid" bug – the code shouldn't compile, but does.

As for me, it still looks like a legitimate code, I'm not sure that the compiler should not accept it.

Checked latest master, and the crash is not being reproduced anymore. Though bytes/str corruption still persists. >This is an "accepts invalid" bug – the code shouldn't compile, but does. As for me, it still looks like a legitimate code, I'm not sure that the compiler should not accept it.
Owner

Right, this doesn't actually have to do with RPC but with returning a string from a kernel function.

I'm not sure that the compiler should not accept it.

Remember that in kernels we don't have a heap.

Right, this doesn't actually have to do with RPC but with returning a string from a kernel function. > I'm not sure that the compiler should not accept it. Remember that in kernels we don't have a heap.
Contributor
Also see https://github.com/m-labs/artiq/issues/1298.
Sign in to join this conversation.