List temporaries/literals in loops can cause stack overflows without compiler diagnostics #1724

Open
opened 2026-01-18 19:06:27 +08:00 by Justinpiggy · 5 comments
Justinpiggy commented 2026-01-18 19:06:27 +08:00

Migrated from GitHub: #2919


Summary

Running the attached sample code will freeze the ARTIQ core device. This demo code repeatedly calls a function that returns the sum of the first element of two lists.

from artiq.experiment import *

class TestExperiment(EnvExperiment):
    def build(self):
        self.setattr_device("core")

    def prepare(self):
        self.nLoops = 8250000 # Can run up to ~6280000 with .stack1 enlarged to 0x4000000

    @kernel
    def add_lists(self,a,b):
        return a[0]+b[0]

    @kernel
    def run(self):
        self.core.reset()
        print("Core reset")
        self.core.break_realtime()

        for j in range(self.nLoops):
            self.add_lists([0],[1])
            delay(2*us)

            if j % 10000 == 0:
                print(j)
                self.core.break_realtime()

        self.core.wait_until_mu(now_mu())

        print("Finished")
        self.core.break_realtime()
        

Expected behavior

Numbers printed out on the computer in 10000 increments, until self.nLoops. Once the total delay of 2*us*self.nLoops reached, "Finished" is printed out.

Actual behavior

When self.nLoops is set to large enough number, the core device freezes, and "Finished" is never printed out. The only way to recover seems to be power cycling the crate. By recompiling a firmware with larger stack1 size, the number self.nLoops that can trigger the problem gets bigger.

I checked LLVM IR dump, it seems that the two lists are allocated on the stack, but the function call is optimized away. These memory allocated on the stack is not released until the end of the main function, before when the stack already overflows.

for.head:                                         ; preds = %_Z41artiq.coredevice.core.Core.break_realtimeI26artiq.coredevice.core.CoreEzz.exit, %if.tail
  %IND = phi i32 [ 0, %_Z41artiq.coredevice.core.Core.break_realtimeI26artiq.coredevice.core.CoreEzz.exit ], [ %IND.new, %if.tail ], !dbg !21
  %CMP = icmp slt i32 %IND, %val.LOC.self.8.FLD.nLoops, !dbg !21
  br i1 %CMP, label %for.body, label %for.tail, !dbg !21

for.body:                                         ; preds = %for.head
  %.53 = alloca i32, align 4, !dbg !22
  %.54 = alloca { i32*, i32 }, align 4, !dbg !22
  %.55 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.54, i32 0, i32 0, !dbg !22
  store i32* %.53, i32** %.55, align 4, !dbg !22
  %.57 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.54, i32 0, i32 1, !dbg !22
  store i32 1, i32* %.57, align 4, !dbg !22
  store i32 0, i32* %.53, align 4, !dbg !22
  %.63 = alloca i32, align 4, !dbg !23
  %.64 = alloca { i32*, i32 }, align 4, !dbg !23
  %.65 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.64, i32 0, i32 0, !dbg !23
  store i32* %.63, i32** %.65, align 4, !dbg !23
  %.67 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.64, i32 0, i32 1, !dbg !23
  store i32 1, i32* %.67, align 4, !dbg !23
  store i32 1, i32* %.63, align 4, !dbg !23
  call void @delay_mu(i64 2000), !dbg !24
  %.5.i = srem i32 %IND, 10000, !dbg !25
  %UNN.3720 = icmp eq i32 %.5.i, 0, !dbg !25
  br i1 %UNN.3720, label %if.body, label %if.tail, !dbg !26

Although this test case seems quite arbitrary, but is actually quite relevant to some core device drivers. The Zotino core device driver contains several functions like set_dac() that have lists as input arguments. An infinite loop of updating Zotino outputs using set_dac() can easily crash the core device.

Logging info

No logging message is printed out on the computer when core device is frozen. "UndefinedInstruction" is printed out on Kasli-SoC UART.

System

Components involved: ARTIQ core device, compiler?
Operating system used: Windows 11 with MSYS2
ARTIQ version: ARTIQ v8.9006+17be223
Hardware involved: Kasli-SoC

> **Migrated from GitHub:** [#2919](https://github.com/m-labs/artiq/issues/2919) --- ## Summary Running the attached sample code will freeze the ARTIQ core device. This demo code repeatedly calls a function that returns the sum of the first element of two lists. ```python from artiq.experiment import * class TestExperiment(EnvExperiment): def build(self): self.setattr_device("core") def prepare(self): self.nLoops = 8250000 # Can run up to ~6280000 with .stack1 enlarged to 0x4000000 @kernel def add_lists(self,a,b): return a[0]+b[0] @kernel def run(self): self.core.reset() print("Core reset") self.core.break_realtime() for j in range(self.nLoops): self.add_lists([0],[1]) delay(2*us) if j % 10000 == 0: print(j) self.core.break_realtime() self.core.wait_until_mu(now_mu()) print("Finished") self.core.break_realtime() ``` ## Expected behavior Numbers printed out on the computer in 10000 increments, until self.nLoops. Once the total delay of 2\*us\*self.nLoops reached, "Finished" is printed out. ## Actual behavior When self.nLoops is set to large enough number, the core device freezes, and "Finished" is never printed out. The only way to recover seems to be power cycling the crate. By recompiling a firmware with larger stack1 size, the number self.nLoops that can trigger the problem gets bigger. I checked LLVM IR dump, it seems that the two lists are allocated on the stack, but the function call is optimized away. These memory allocated on the stack is not released until the end of the main function, before when the stack already overflows. ``` for.head: ; preds = %_Z41artiq.coredevice.core.Core.break_realtimeI26artiq.coredevice.core.CoreEzz.exit, %if.tail %IND = phi i32 [ 0, %_Z41artiq.coredevice.core.Core.break_realtimeI26artiq.coredevice.core.CoreEzz.exit ], [ %IND.new, %if.tail ], !dbg !21 %CMP = icmp slt i32 %IND, %val.LOC.self.8.FLD.nLoops, !dbg !21 br i1 %CMP, label %for.body, label %for.tail, !dbg !21 for.body: ; preds = %for.head %.53 = alloca i32, align 4, !dbg !22 %.54 = alloca { i32*, i32 }, align 4, !dbg !22 %.55 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.54, i32 0, i32 0, !dbg !22 store i32* %.53, i32** %.55, align 4, !dbg !22 %.57 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.54, i32 0, i32 1, !dbg !22 store i32 1, i32* %.57, align 4, !dbg !22 store i32 0, i32* %.53, align 4, !dbg !22 %.63 = alloca i32, align 4, !dbg !23 %.64 = alloca { i32*, i32 }, align 4, !dbg !23 %.65 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.64, i32 0, i32 0, !dbg !23 store i32* %.63, i32** %.65, align 4, !dbg !23 %.67 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.64, i32 0, i32 1, !dbg !23 store i32 1, i32* %.67, align 4, !dbg !23 store i32 1, i32* %.63, align 4, !dbg !23 call void @delay_mu(i64 2000), !dbg !24 %.5.i = srem i32 %IND, 10000, !dbg !25 %UNN.3720 = icmp eq i32 %.5.i, 0, !dbg !25 br i1 %UNN.3720, label %if.body, label %if.tail, !dbg !26 ``` Although this test case seems quite arbitrary, but is actually quite relevant to some core device drivers. The Zotino core device driver contains several functions like set_dac() that have lists as input arguments. An infinite loop of updating Zotino outputs using set_dac() can easily crash the core device. ## Logging info No logging message is printed out on the computer when core device is frozen. "UndefinedInstruction" is printed out on Kasli-SoC UART. ## System Components involved: ARTIQ core device, compiler? Operating system used: Windows 11 with MSYS2 ARTIQ version: ARTIQ v8.9006+17be223 Hardware involved: Kasli-SoC
sb10q added the area:compiler label 2026-01-18 19:06:27 +08:00

return a[0]+b[0]

Returning lists from a kernel is not currently supported (the bug is actually that the compiler should have thrown an error but did not). It will be supported after the implementation of the CTRC proposal in NAC3.

The Zotino core device driver contains several functions like set_dac() that have lists as input arguments. An infinite loop of updating Zotino outputs using set_dac() can easily crash the core device.

Are you sure?

> return a[0]+b[0] Returning lists from a kernel is not currently supported (the bug is actually that the compiler should have thrown an error but did not). It will be supported after the implementation of the [CTRC proposal](https://git.m-labs.hk/M-Labs/nac3/issues/661) in NAC3. > The Zotino core device driver contains several functions like set_dac() that have lists as input arguments. An infinite loop of updating Zotino outputs using set_dac() can easily crash the core device. Are you sure?
Justinpiggy commented 2026-01-18 19:06:28 +08:00

return a[0]+b[0]

Returning lists from a kernel is not currently supported (the bug is actually that the compiler should have thrown an error but did not). It will be supported after the implementation of the CTRC proposal in NAC3.

I believe this function is returning an integer value, not a list.

The Zotino core device driver contains several functions like set_dac() that have lists as input arguments. An infinite loop of updating Zotino outputs using set_dac() can easily crash the core device.

Are you sure?

The following code also crashes the device.

from artiq.experiment import *

class TestExperiment(EnvExperiment):
    def build(self):
        self.setattr_device("core")
        self.setattr_device("zotino0")

    def prepare(self):
        self.move_delay = 20*us
        self.nLoops = 8250000 # fails at 6290000>j>6280000 everytime

    @kernel
    def run(self):
        self.core.reset()
        print("Core reset")
        self.core.break_realtime()

        self.zotino0.init()

        j = 0
        while j < self.nLoops:
            self.zotino0.set_dac([0.0],[20])
            delay(self.move_delay)
            j = j+1

        self.core.wait_until_mu(now_mu())

        print("Finished")
        self.core.break_realtime()
        
> > return a[0]+b[0] > > Returning lists from a kernel is not currently supported (the bug is actually that the compiler should have thrown an error but did not). It will be supported after the implementation of the [CTRC proposal](https://git.m-labs.hk/M-Labs/nac3/issues/661) in NAC3. I believe this function is returning an integer value, not a list. > > > The Zotino core device driver contains several functions like set_dac() that have lists as input arguments. An infinite loop of updating Zotino outputs using set_dac() can easily crash the core device. > > Are you sure? The following code also crashes the device. ```python from artiq.experiment import * class TestExperiment(EnvExperiment): def build(self): self.setattr_device("core") self.setattr_device("zotino0") def prepare(self): self.move_delay = 20*us self.nLoops = 8250000 # fails at 6290000>j>6280000 everytime @kernel def run(self): self.core.reset() print("Core reset") self.core.break_realtime() self.zotino0.init() j = 0 while j < self.nLoops: self.zotino0.set_dac([0.0],[20]) delay(self.move_delay) j = j+1 self.core.wait_until_mu(now_mu()) print("Finished") self.core.break_realtime() ```

Oh I see, that's because you allocate a list by writing [0.0] and they aren't freed until the function returns. I suppose the MWE could be this?

@kernel
  def run(self):
    while True:
      a = [0]
Oh I see, that's because you allocate a list by writing [0.0] and they aren't freed until the function returns. I suppose the MWE could be this? ```python @kernel def run(self): while True: a = [0] ```
Justinpiggy commented 2026-01-18 19:06:28 +08:00

Oh I see, that's because you allocate a list by writing [0.0] and they aren't freed until the function returns. I suppose the MWE could be this?

@kernel
def run(self):
while True:
a = [0]

I tried this briefly. It does run without crashing the core device, but the IR shows that allocation of the local variable a is completely optimized away. Not surprising, maybe this is too 'minimal'...

un_opt version

while.head:                                       ; preds = %while.body, %entry
  br i1 true, label %while.body, label %while.tail, !dbg !16

while.body:                                       ; preds = %while.head
  %.19 = alloca i32, align 4, !dbg !17
  %.20 = alloca { i32*, i32 }, align 4, !dbg !17
  %.21 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.20, i32 0, i32 0, !dbg !17
  store i32* %.19, i32** %.21, align 4, !dbg !17
  %.23 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.20, i32 0, i32 1, !dbg !17
  store i32 1, i32* %.23, align 4, !dbg !17
  %.25 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.20, i32 0, i32 0, !dbg !17
  %.26 = load i32*, i32** %.25, align 4, !dbg !17
  %.27 = getelementptr inbounds i32, i32* %.26, i32 0, !dbg !17
  store i32 0, i32* %.27, align 4, !dbg !17
  %ptr.ENV.a = getelementptr inbounds %env._Z46artiq_worker_main_frozen_22.TestExperiment.runzz.17, %env._Z46artiq_worker_main_frozen_22.TestExperiment.runzz.17* %ENV, i32 0, i32 1, !dbg !18
  store { i32*, i32 }* %.20, { i32*, i32 }** %ptr.ENV.a, align 4, !dbg !18
  br label %while.head, !dbg !18

while.tail:                                       ; preds = %while.head
  ret void, !dbg !18
}

optimized version

while.head:                                       ; preds = %while.head, %entry
  br label %while.head, !dbg !21
}
> Oh I see, that's because you allocate a list by writing [0.0] and they aren't freed until the function returns. I suppose the MWE could be this? > > @kernel > def run(self): > while True: > a = [0] I tried this briefly. It does run without crashing the core device, but the IR shows that allocation of the local variable a is completely optimized away. Not surprising, maybe this is too 'minimal'... un_opt version ```llvm while.head: ; preds = %while.body, %entry br i1 true, label %while.body, label %while.tail, !dbg !16 while.body: ; preds = %while.head %.19 = alloca i32, align 4, !dbg !17 %.20 = alloca { i32*, i32 }, align 4, !dbg !17 %.21 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.20, i32 0, i32 0, !dbg !17 store i32* %.19, i32** %.21, align 4, !dbg !17 %.23 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.20, i32 0, i32 1, !dbg !17 store i32 1, i32* %.23, align 4, !dbg !17 %.25 = getelementptr inbounds { i32*, i32 }, { i32*, i32 }* %.20, i32 0, i32 0, !dbg !17 %.26 = load i32*, i32** %.25, align 4, !dbg !17 %.27 = getelementptr inbounds i32, i32* %.26, i32 0, !dbg !17 store i32 0, i32* %.27, align 4, !dbg !17 %ptr.ENV.a = getelementptr inbounds %env._Z46artiq_worker_main_frozen_22.TestExperiment.runzz.17, %env._Z46artiq_worker_main_frozen_22.TestExperiment.runzz.17* %ENV, i32 0, i32 1, !dbg !18 store { i32*, i32 }* %.20, { i32*, i32 }** %ptr.ENV.a, align 4, !dbg !18 br label %while.head, !dbg !18 while.tail: ; preds = %while.head ret void, !dbg !18 } ``` optimized version ```llvm while.head: ; preds = %while.head, %entry br label %while.head, !dbg !21 } ```
Contributor

This is indeed quite a brittle area in the current implementation, in that the compiler barely helps in ensuring the correct lifetimes. That being said, the aspects at play can be understood from a typical C/C++/… perspective, and as long as they are taken into account, code quite extensively reliant on arrays can be used in practice without issues. There are two different considerations here

(1) the lifetime tracking of parameters/return values is subtly faulty (e.g. https://github.com/m-labs/artiq/issues/1497 and https://github.com/m-labs/artiq/issues/1677), which can affect lists, and
(2) temporary lists (from literals, array operations, etc.) are allocated on the caller stack using the equivalent of C's alloca(), and so only deallocated on function exit.

(1) is not an issue here, but (2) is. a = [0.0]; b = [20]; while True: self.zotino.set_dac(a, b) should work just fine, as would while True: foo() with def foo(): self.zotino.set_dac([0.0], [20]).

This is indeed quite a brittle area in the current implementation, in that the compiler barely helps in ensuring the correct lifetimes. That being said, the aspects at play can be understood from a typical C/C++/… perspective, and as long as they are taken into account, code quite extensively reliant on arrays can be used in practice without issues. There are two different considerations here (1) the lifetime tracking of parameters/return values is subtly faulty (e.g. https://github.com/m-labs/artiq/issues/1497 and https://github.com/m-labs/artiq/issues/1677), which can affect lists, and (2) temporary lists (from literals, array operations, etc.) are allocated on the caller stack using the equivalent of C's `alloca()`, and so only deallocated on function exit. (1) is not an issue here, but (2) is. `a = [0.0]; b = [20]; while True: self.zotino.set_dac(a, b)` should work just fine, as would `while True: foo()` with `def foo(): self.zotino.set_dac([0.0], [20])`.
Sign in to join this conversation.