UndefinedInstruction error on Zynq #28

Closed
opened 2021-09-24 14:44:20 +08:00 by sb10q · 7 comments
from language import *

@syscall
def rtio_init():
    raise NotImplementedError("syscall not simulated")

@kernel
class Demo:
    @kernel
    def run(self):
        rtio_init()


if __name__ == "__main__":
    Demo().run()

[     4.856408s]  INFO(runtime::kernel::core1): kernel starting
UndefinedInstruction
```python from language import * @syscall def rtio_init(): raise NotImplementedError("syscall not simulated") @kernel class Demo: @kernel def run(self): rtio_init() if __name__ == "__main__": Demo().run() ``` ``` [ 4.856408s] INFO(runtime::kernel::core1): kernel starting UndefinedInstruction ```
sb10q added the
high-priority
label 2021-09-24 14:44:20 +08:00
pca006132 was assigned by sb10q 2021-09-24 14:44:20 +08:00

I don't quite understand what is going on here... Is rtio_init defined elsewhere? Do we need special calling conventions for it?

I don't quite understand what is going on here... Is `rtio_init` defined elsewhere? Do we need special calling conventions for it?
Poster
Owner
It's the regular syscall: https://git.m-labs.hk/M-Labs/artiq-zynq/src/commit/7c336f77702b61d95ba0672029e98f7a87a4ba31/src/runtime/src/kernel/api.rs#L89

Cannot reproduce. I've compiled a firmware that would log when rtio_init is called.

▶ artiq_coremgmt log set_level DEBUG

▶ artiq_run ~/code/rust/nac3/nac3embedded/module.elf

▶ artiq_coremgmt log                                
[     0.000083s]  INFO(runtime): NAR3/Zynq7000 starting...
[     0.005265s]  INFO(runtime): detected gateware: NIST_QC2
[     0.039003s]  INFO(runtime): using internal RTIO clock (default)
[     0.047001s]  INFO(runtime): RTIO PLL locked
[     0.055429s]  INFO(runtime::comms): network addresses: MAC=02-00-00-00-00-52 IPv4=192.168.1.52 IPv6-LL=fe80::ff:fe00:52 IPv6: no configured address
[     4.500039s]  INFO(libboard_zynq::eth): eth: got Link { speed: S1000, duplex: Full }
[     9.275654s]  INFO(runtime::mgmt): received connection
[     9.291295s]  INFO(runtime::mgmt): Changing log level to DEBUG
[    12.312285s] DEBUG(runtime::kernel::core1): Core1 started
[    12.312306s] DEBUG(dyld): ELF target: 688 bytes, align to 10000, allocated at 08260000
[    12.312320s] DEBUG(dyld): Relocating 0 rela, 0 rel, 8 pltrel
[    12.312336s] DEBUG(runtime::kernel::core1): kernel loaded
[    12.312505s]  INFO(runtime::kernel::core1): kernel starting
[    12.318063s] DEBUG(runtime::rtio): rtio init
[    12.318071s]  INFO(runtime::kernel::core1): kernel finished
[    12.323961s]  INFO(runtime::comms): peer closed connection
[    17.012371s]  INFO(runtime::mgmt): received connection

▶ cat ~/code/rust/nac3/nac3embedded/demo.py
from language import *

@syscall
def rtio_init():
    raise NotImplementedError("syscall not simulated")

@kernel
class Demo:
    @kernel
    def run(self):
        rtio_init()


if __name__ == "__main__":
    Demo().run()


▶ llvm-objdump -S ~/code/rust/nac3/nac3embedded/module.elf

/home/pca006132/code/rust/nac3/nac3embedded/module.elf:	file format ELF32-arm-little


Disassembly of section .text:

000000f4 __modinit__:
      f4: 09 00 00 ea                  	b	#36 <$a>

Disassembly of section .plt:

00000100 $a:
     100: 04 e0 2d e5                  	str	lr, [sp, #-4]!
     104: 00 e6 8f e2                  	add	lr, pc, #0, #12
     108: 00 ea 8e e2                  	add	lr, lr, #0, #20
     10c: 9c f1 be e5                  	ldr	pc, [lr, #412]!

00000110 $d:
     110:	d4 d4 d4 d4	.word	0xd4d4d4d4
     114:	d4 d4 d4 d4	.word	0xd4d4d4d4
     118:	d4 d4 d4 d4	.word	0xd4d4d4d4
     11c:	d4 d4 d4 d4	.word	0xd4d4d4d4

00000120 $a:
     120: 00 c6 8f e2                  	add	r12, pc, #0, #12
     124: 00 ca 8c e2                  	add	r12, r12, #0, #20
     128: 84 f1 bc e5                  	ldr	pc, [r12, #388]!

0000012c $d:
     12c:	d4 d4 d4 d4	.word	0xd4d4d4d4

Can you provide the objdump output of your compiled binary? I'm using the latest master to compile this, and the md5sum output is ce46e9e7cf73d5cb5fa759c9104e3528. I've got the exact checksum on my own machine and on zeus with freshly cloned nac3 master.

Cannot reproduce. I've compiled a firmware that would log when `rtio_init` is called. ``` ▶ artiq_coremgmt log set_level DEBUG ▶ artiq_run ~/code/rust/nac3/nac3embedded/module.elf ▶ artiq_coremgmt log [ 0.000083s] INFO(runtime): NAR3/Zynq7000 starting... [ 0.005265s] INFO(runtime): detected gateware: NIST_QC2 [ 0.039003s] INFO(runtime): using internal RTIO clock (default) [ 0.047001s] INFO(runtime): RTIO PLL locked [ 0.055429s] INFO(runtime::comms): network addresses: MAC=02-00-00-00-00-52 IPv4=192.168.1.52 IPv6-LL=fe80::ff:fe00:52 IPv6: no configured address [ 4.500039s] INFO(libboard_zynq::eth): eth: got Link { speed: S1000, duplex: Full } [ 9.275654s] INFO(runtime::mgmt): received connection [ 9.291295s] INFO(runtime::mgmt): Changing log level to DEBUG [ 12.312285s] DEBUG(runtime::kernel::core1): Core1 started [ 12.312306s] DEBUG(dyld): ELF target: 688 bytes, align to 10000, allocated at 08260000 [ 12.312320s] DEBUG(dyld): Relocating 0 rela, 0 rel, 8 pltrel [ 12.312336s] DEBUG(runtime::kernel::core1): kernel loaded [ 12.312505s] INFO(runtime::kernel::core1): kernel starting [ 12.318063s] DEBUG(runtime::rtio): rtio init [ 12.318071s] INFO(runtime::kernel::core1): kernel finished [ 12.323961s] INFO(runtime::comms): peer closed connection [ 17.012371s] INFO(runtime::mgmt): received connection ▶ cat ~/code/rust/nac3/nac3embedded/demo.py from language import * @syscall def rtio_init(): raise NotImplementedError("syscall not simulated") @kernel class Demo: @kernel def run(self): rtio_init() if __name__ == "__main__": Demo().run() ▶ llvm-objdump -S ~/code/rust/nac3/nac3embedded/module.elf /home/pca006132/code/rust/nac3/nac3embedded/module.elf: file format ELF32-arm-little Disassembly of section .text: 000000f4 __modinit__: f4: 09 00 00 ea b #36 <$a> Disassembly of section .plt: 00000100 $a: 100: 04 e0 2d e5 str lr, [sp, #-4]! 104: 00 e6 8f e2 add lr, pc, #0, #12 108: 00 ea 8e e2 add lr, lr, #0, #20 10c: 9c f1 be e5 ldr pc, [lr, #412]! 00000110 $d: 110: d4 d4 d4 d4 .word 0xd4d4d4d4 114: d4 d4 d4 d4 .word 0xd4d4d4d4 118: d4 d4 d4 d4 .word 0xd4d4d4d4 11c: d4 d4 d4 d4 .word 0xd4d4d4d4 00000120 $a: 120: 00 c6 8f e2 add r12, pc, #0, #12 124: 00 ca 8c e2 add r12, r12, #0, #20 128: 84 f1 bc e5 ldr pc, [r12, #388]! 0000012c $d: 12c: d4 d4 d4 d4 .word 0xd4d4d4d4 ``` Can you provide the objdump output of your compiled binary? I'm using the latest master to compile this, and the md5sum output is `ce46e9e7cf73d5cb5fa759c9104e3528`. I've got the exact checksum on my own machine and on zeus with freshly cloned nac3 master.
Poster
Owner

Sorry, I must have mixed things up when trying to minimize the repro. This definitely crashes: 7ab762a174/nac3embedded/demo.py

Sorry, I must have mixed things up when trying to minimize the repro. This definitely crashes: https://git.m-labs.hk/M-Labs/nac3/src/commit/7ab762a17492508e201c71f13f5340e5e5132b4c/nac3embedded/demo.py
sb10q changed title from syscalls broken on Zynq to UndefinedInstruction error Zynq 2021-09-24 21:54:04 +08:00
sb10q changed title from UndefinedInstruction error Zynq to UndefinedInstruction error on Zynq 2021-09-24 21:54:10 +08:00

I have no idea about this. Minimal example that could crash:

def something() -> int32:
    return 1

def __modinit__():
    a = something()

LLVM optimized IR (with only passes.add_promote_memory_to_register_pass())

; ModuleID = 'module0'
source_filename = "module0"
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"

define void @__modinit__() {
init:
  call void @__main__something.0()
  ret void
}

define void @__main__something.0() {
init:
  ret void
}

; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #0

attributes #0 = { nounwind }

Assembly:

/home/pca006132/code/rust/nac3/nac3standalone/module.elf:	file format elf32-littlearm


Disassembly of section .text:

000000f4 <__modinit__>:
llvm-objdump: warning: '/home/pca006132/code/rust/nac3/nac3standalone/module.elf': failed to parse debug information for /home/pca006132/code/rust/nac3/nac3standalone/module.elf
      f4: 00 48 2d e9  	push	{r11, lr}
      f8: 0c 00 00 eb  	bl	#48 <$a>
      fc: 00 88 bd e8  	pop	{r11, pc}

00000100 <__main__something.0>:
     100: 1e ff 2f e1  	bx	lr

Disassembly of section .plt:

00000110 <$a>:
     110: 04 e0 2d e5  	str	lr, [sp, #-4]!
     114: 00 e6 8f e2  	add	lr, pc, #0, #12
     118: 00 ea 8e e2  	add	lr, lr, #0, #20
     11c: b4 f1 be e5  	ldr	pc, [lr, #436]!

00000120 <$d>:
     120:	d4 d4 d4 d4	.word	0xd4d4d4d4
     124:	d4 d4 d4 d4	.word	0xd4d4d4d4
     128:	d4 d4 d4 d4	.word	0xd4d4d4d4
     12c:	d4 d4 d4 d4	.word	0xd4d4d4d4

00000130 <$a>:
     130: 00 c6 8f e2  	add	r12, pc, #0, #12
     134: 00 ca 8c e2  	add	r12, r12, #0, #20
     138: 9c f1 bc e5  	ldr	pc, [r12, #412]!

0000013c <$d>:
     13c:	d4 d4 d4 d4	.word	0xd4d4d4d4

Is this a problem regarding calling convention?

I have no idea about this. Minimal example that could crash: ```python def something() -> int32: return 1 def __modinit__(): a = something() ``` LLVM optimized IR (with only `passes.add_promote_memory_to_register_pass()`) ``` ; ModuleID = 'module0' source_filename = "module0" target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64" define void @__modinit__() { init: call void @__main__something.0() ret void } define void @__main__something.0() { init: ret void } ; Function Attrs: nounwind declare void @llvm.stackprotector(i8*, i8**) #0 attributes #0 = { nounwind } ``` Assembly: ``` /home/pca006132/code/rust/nac3/nac3standalone/module.elf: file format elf32-littlearm Disassembly of section .text: 000000f4 <__modinit__>: llvm-objdump: warning: '/home/pca006132/code/rust/nac3/nac3standalone/module.elf': failed to parse debug information for /home/pca006132/code/rust/nac3/nac3standalone/module.elf f4: 00 48 2d e9 push {r11, lr} f8: 0c 00 00 eb bl #48 <$a> fc: 00 88 bd e8 pop {r11, pc} 00000100 <__main__something.0>: 100: 1e ff 2f e1 bx lr Disassembly of section .plt: 00000110 <$a>: 110: 04 e0 2d e5 str lr, [sp, #-4]! 114: 00 e6 8f e2 add lr, pc, #0, #12 118: 00 ea 8e e2 add lr, lr, #0, #20 11c: b4 f1 be e5 ldr pc, [lr, #436]! 00000120 <$d>: 120: d4 d4 d4 d4 .word 0xd4d4d4d4 124: d4 d4 d4 d4 .word 0xd4d4d4d4 128: d4 d4 d4 d4 .word 0xd4d4d4d4 12c: d4 d4 d4 d4 .word 0xd4d4d4d4 00000130 <$a>: 130: 00 c6 8f e2 add r12, pc, #0, #12 134: 00 ca 8c e2 add r12, r12, #0, #20 138: 9c f1 bc e5 ldr pc, [r12, #412]! 0000013c <$d>: 13c: d4 d4 d4 d4 .word 0xd4d4d4d4 ``` Is this a problem regarding calling convention?

This is a bug in the artiq-zynq loader:

M-Labs/artiq-zynq#134

demo.py runs fine after this fix.

This is a bug in the artiq-zynq loader: https://git.m-labs.hk/M-Labs/artiq-zynq/pulls/134 `demo.py` runs fine after this fix.
Poster
Owner

@pca006132 Good find!

@pca006132 Good find!
sb10q closed this issue 2021-09-25 13:51:08 +08:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#28
There is no content yet.