Fix float**int with negative power #254

Merged
pca006132 merged 2 commits from neg_powi_fix into master 2022-04-04 22:43:20 +08:00
Collaborator

Although according to llvm langref here, the llvm.powi.f64.i16 can handle negative power, but when testing it produces garbage value.. and after checking I did not manage to see the problem in our code.. So I attempt to correct it currently by taking the reciprocal when doing negative power.

current behavior:

@kernel
def run(self):
    print_float(1.0 ** -2)
    print_float(1.0 ** 0)
    print_float(1.0 ** 1)
    print_float(2.0 ** 0)
    print_float(2.0 ** 1)
    print_float(2.0 ** 2)
    print_float(2.0 ** -1)
    print_float(2.0 ** -2)

output:

print_float: 1.0
print_float: 1.0
print_float: 1.0
print_float: 1.0
print_float: 2.0
print_float: 4.0
print_float: inf
print_float: inf
Although according to [llvm langref here](https://llvm.org/docs/LangRef.html#id519), the `llvm.powi.f64.i16` can handle negative power, but when testing it produces garbage value.. and after checking I did not manage to see the problem in our code.. So I attempt to correct it currently by taking the reciprocal when doing negative power. current behavior: ```python @kernel def run(self): print_float(1.0 ** -2) print_float(1.0 ** 0) print_float(1.0 ** 1) print_float(2.0 ** 0) print_float(2.0 ** 1) print_float(2.0 ** 2) print_float(2.0 ** -1) print_float(2.0 ** -2) ``` output: ```llvm print_float: 1.0 print_float: 1.0 print_float: 1.0 print_float: 1.0 print_float: 2.0 print_float: 4.0 print_float: inf print_float: inf ```
Owner

On which architectures did you see the problem? It could just be a bug in LLVM or elsewhere outside NAC3.

On which architectures did you see the problem? It could just be a bug in LLVM or elsewhere outside NAC3.
Owner

And what is the assembly code generated? I think with constant folding optimizations disabled, it should be just a call to libm - if the bug is still present then, it would point to a problem with libm?

And what is the assembly code generated? I think with constant folding optimizations disabled, it should be just a call to libm - if the bug is still present then, it would point to a problem with libm?
Contributor

what are the unoptimized and optimized IRs for this code prior to this patch?

what are the unoptimized and optimized IRs for this code prior to this patch?
Author
Collaborator

Thanks for the suggestions! I have tested on wsl2 on my laptop, on zeus by runkernel and rv32g by the device on 192.168.1.50, they all give the errornous output.

For this code:

@nac3
class Demo:
    core: KernelInvariant[Core]
    def build(self):
        self.core = Core()
    @kernel
    def run(self):
        my_print(3.0 ** get_pow())

T = TypeVar('T')
@rpc
def my_print(v: T):
    print(v)

@rpc
def get_pow() -> int32:
    return int32(input("get pow: "))

outputs prior to this patch:

unoptimzed (by setting OptimizationLevel::None at all places in nac3)
assembly for x86_64
	.text
	.file	"main"
	.globl	__modinit__
	.p2align	4, 0x90
	.type	__modinit__,@function
__modinit__:
.Lfunc_begin0:
	.cfi_startproc
	.cfi_personality 155, DW.ref.__nac3_personality
	.cfi_lsda 27, .Lexception0
	pushq	%rax
	.cfi_def_cfa_offset 16
	movq	140014876172192@GOTPCREL(%rip), %rdi
	callq	.L__main__.Demo.run.0
	callq	.Lattributes_writeback
	popq	%rax
	.cfi_def_cfa_offset 8
	retq
.Lfunc_end0:
	.size	__modinit__, .Lfunc_end0-__modinit__
	.cfi_endproc
	.section	.gcc_except_table,"a",@progbits
	.p2align	2
GCC_except_table0:
.Lexception0:
	.byte	255
	.byte	255
	.byte	1
	.uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
	.uleb128 .Lfunc_begin0-.Lfunc_begin0
	.uleb128 .Lfunc_end0-.Lfunc_begin0
	.byte	0
	.byte	0
.Lcst_end0:
	.p2align	2

	.text
	.p2align	4, 0x90
	.type	.L__main__.Demo.run.0,@function
.L__main__.Demo.run.0:
.Lfunc_begin1:
	.cfi_startproc
	.cfi_personality 155, DW.ref.__nac3_personality
	.cfi_lsda 27, .Lexception1
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	pushq	%rbx
	pushq	%rax
	.cfi_offset %rbx, -24
	movq	%rdi, -16(%rbp)
	movq	%rsp, %rbx
	movq	%rsp, %rdx
	leaq	.L2054164975901393949(%rip), %rsi
	movl	$83, %edi
	callq	rpc_send@PLT
	movq	%rbx, %rsp
	movq	%rsp, %rdi
	addq	$-16, %rdi
	.p2align	4, 0x90
.LBB1_1:
	movq	%rdi, %rsp
	callq	rpc_recv@PLT
	testl	%eax, %eax
	je	.LBB1_3
	movq	%rsp, %rdi
	movl	%eax, %eax
	leaq	15(,%rax,8), %rax
	andq	$-16, %rax
	subq	%rax, %rdi
	jmp	.LBB1_1
.LBB1_3:
	movq	%rbx, %rsp
	callq	print_float@PLT
	leaq	-8(%rbp), %rsp
	popq	%rbx
	popq	%rbp
	.cfi_def_cfa %rsp, 8
	retq
.Lfunc_end1:
	.size	.L__main__.Demo.run.0, .Lfunc_end1-.L__main__.Demo.run.0
	.cfi_endproc
	.section	.gcc_except_table,"a",@progbits
	.p2align	2
GCC_except_table1:
.Lexception1:
	.byte	255
	.byte	255
	.byte	1
	.uleb128 .Lcst_end1-.Lcst_begin1
.Lcst_begin1:
	.uleb128 .Lfunc_begin1-.Lfunc_begin1
	.uleb128 .Lfunc_end1-.Lfunc_begin1
	.byte	0
	.byte	0
.Lcst_end1:
	.p2align	2

	.text
	.p2align	4, 0x90
	.type	.Lattributes_writeback,@function
.Lattributes_writeback:
.Lfunc_begin2:
	.cfi_startproc
	.cfi_personality 155, DW.ref.__nac3_personality
	.cfi_lsda 27, .Lexception2
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	pushq	%rbx
	pushq	%rax
	.cfi_offset %rbx, -24
	movq	%rsp, %rbx
	movq	%rsp, %rdx
	leaq	.L2400020124162657182(%rip), %rsi
	xorl	%edi, %edi
	callq	rpc_send@PLT
	movq	%rbx, %rsp
	xorl	%edi, %edi
	callq	rpc_recv@PLT
	leaq	-8(%rbp), %rsp
	popq	%rbx
	popq	%rbp
	.cfi_def_cfa %rsp, 8
	retq
.Lfunc_end2:
	.size	.Lattributes_writeback, .Lfunc_end2-.Lattributes_writeback
	.cfi_endproc
	.section	.gcc_except_table,"a",@progbits
	.p2align	2
GCC_except_table2:
.Lexception2:
	.byte	255
	.byte	255
	.byte	1
	.uleb128 .Lcst_end2-.Lcst_begin2
.Lcst_begin2:
	.uleb128 .Lfunc_begin2-.Lfunc_begin2
	.uleb128 .Lfunc_end2-.Lfunc_begin2
	.byte	0
	.byte	0
.Lcst_end2:
	.p2align	2

	.type	.Ltagptr83,@object
	.data
.Ltagptr83:
	.ascii	":i"
	.size	.Ltagptr83, 2

	.type	.L2054164975901393949,@object
	.p2align	3
.L2054164975901393949:
	.quad	.Ltagptr83
	.quad	2
	.size	.L2054164975901393949, 16

	.type	140014876172192,@object
	.globl	140014876172192
	.p2align	3
140014876172192:
	.quad	140014876171472
	.size	140014876172192, 8

	.type	140014876171472,@object
	.globl	140014876171472
	.p2align	3
140014876171472:
	.quad	0x3e112e0be826d695
	.size	140014876171472, 8

	.type	.Ltagptr0,@object
.Ltagptr0:
	.ascii	":n"
	.size	.Ltagptr0, 2

	.type	.L2400020124162657182,@object
	.p2align	3
.L2400020124162657182:
	.quad	.Ltagptr0
	.quad	2
	.size	.L2400020124162657182, 16

	.hidden	DW.ref.__nac3_personality
	.weak	DW.ref.__nac3_personality
	.section	.data.DW.ref.__nac3_personality,"aGw",@progbits,DW.ref.__nac3_personality,comdat
	.p2align	3
	.type	DW.ref.__nac3_personality,@object
	.size	DW.ref.__nac3_personality, 8
DW.ref.__nac3_personality:
	.quad	__nac3_personality
	.section	".note.GNU-stack","",@progbits

assembly for rv32g
	.text
	.attribute	4, 16
	.attribute	5, "rv32i2p0_m2p0_a2p0_f2p0_d2p0"
	.file	"main"
	.globl	__modinit__
	.p2align	2
	.type	__modinit__,@function
__modinit__:
.Lfunc_begin0:
	.cfi_startproc
	.cfi_personality 155, DW.ref.__nac3_personality
	.cfi_lsda 27, .Lexception0
	addi	sp, sp, -16
	.cfi_def_cfa_offset 16
	sw	ra, 12(sp)
	.cfi_offset ra, -4
.LBB0_1:
	auipc	a0, %got_pcrel_hi(139988775571360)
	lw	a0, %pcrel_lo(.LBB0_1)(a0)
	call	.L__main__.Demo.run.0
	call	.Lattributes_writeback
	lw	ra, 12(sp)
	addi	sp, sp, 16
	ret
.Lfunc_end0:
	.size	__modinit__, .Lfunc_end0-__modinit__
	.cfi_endproc
	.section	.gcc_except_table,"a",@progbits
	.p2align	2
GCC_except_table0:
.Lexception0:
	.byte	255
	.byte	255
	.byte	3
	.uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
	.word	.Lfunc_begin0-.Lfunc_begin0
	.word	.Lfunc_end0-.Lfunc_begin0
	.word	0
	.byte	0
.Lcst_end0:
	.p2align	2

	.section	.sdata,"aw",@progbits
	.p2align	3
.LCPI1_0:
	.quad	0x4008000000000000
	.text
	.p2align	2
	.type	.L__main__.Demo.run.0,@function
.L__main__.Demo.run.0:
.Lfunc_begin1:
	.cfi_startproc
	.cfi_personality 155, DW.ref.__nac3_personality
	.cfi_lsda 27, .Lexception1
	addi	sp, sp, -32
	.cfi_def_cfa_offset 32
	sw	ra, 28(sp)
	sw	s0, 24(sp)
	sw	s1, 20(sp)
	sw	s2, 16(sp)
	.cfi_offset ra, -4
	.cfi_offset s0, -8
	.cfi_offset s1, -12
	.cfi_offset s2, -16
	addi	s0, sp, 32
	.cfi_def_cfa s0, 0
	sw	a0, -24(s0)
	mv	s2, sp
	mv	a2, sp
.LBB1_3:
	auipc	a1, %pcrel_hi(.L2054164975901393949)
	addi	a1, a1, %pcrel_lo(.LBB1_3)
	addi	a0, zero, 59
	call	rpc_send@plt
	mv	sp, s2
	addi	s1, sp, -16
	mv	sp, s1
	mv	a0, s1
	call	rpc_recv@plt
	beqz	a0, .LBB1_2
.LBB1_1:
	slli	a0, a0, 2
	addi	a0, a0, 15
	andi	a0, a0, -16
	sub	a0, sp, a0
	mv	sp, a0
	call	rpc_recv@plt
	bnez	a0, .LBB1_1
.LBB1_2:
	lh	a0, 0(s1)
	mv	sp, s2
.LBB1_4:
	auipc	a1, %pcrel_hi(.LCPI1_0)
	addi	a1, a1, %pcrel_lo(.LBB1_4)
	fld	fa0, 0(a1)
	call	__powidf2@plt
	call	print_float@plt
	addi	sp, s0, -32
	lw	s2, 16(sp)
	lw	s1, 20(sp)
	lw	s0, 24(sp)
	lw	ra, 28(sp)
	addi	sp, sp, 32
	ret
.Lfunc_end1:
	.size	.L__main__.Demo.run.0, .Lfunc_end1-.L__main__.Demo.run.0
	.cfi_endproc
	.section	.gcc_except_table,"a",@progbits
	.p2align	2
GCC_except_table1:
.Lexception1:
	.byte	255
	.byte	255
	.byte	3
	.uleb128 .Lcst_end1-.Lcst_begin1
.Lcst_begin1:
	.word	.Lfunc_begin1-.Lfunc_begin1
	.word	.Lfunc_end1-.Lfunc_begin1
	.word	0
	.byte	0
.Lcst_end1:
	.p2align	2

	.text
	.p2align	2
	.type	.Lattributes_writeback,@function
.Lattributes_writeback:
.Lfunc_begin2:
	.cfi_startproc
	.cfi_personality 155, DW.ref.__nac3_personality
	.cfi_lsda 27, .Lexception2
	addi	sp, sp, -16
	.cfi_def_cfa_offset 16
	sw	ra, 12(sp)
	sw	s0, 8(sp)
	sw	s1, 4(sp)
	.cfi_offset ra, -4
	.cfi_offset s0, -8
	.cfi_offset s1, -12
	addi	s0, sp, 16
	.cfi_def_cfa s0, 0
	mv	s1, sp
	mv	a2, sp
.LBB2_1:
	auipc	a1, %pcrel_hi(.L2400020124162657182)
	addi	a1, a1, %pcrel_lo(.LBB2_1)
	mv	a0, zero
	call	rpc_send@plt
	mv	sp, s1
	mv	a0, zero
	call	rpc_recv@plt
	addi	sp, s0, -16
	lw	s1, 4(sp)
	lw	s0, 8(sp)
	lw	ra, 12(sp)
	addi	sp, sp, 16
	ret
.Lfunc_end2:
	.size	.Lattributes_writeback, .Lfunc_end2-.Lattributes_writeback
	.cfi_endproc
	.section	.gcc_except_table,"a",@progbits
	.p2align	2
GCC_except_table2:
.Lexception2:
	.byte	255
	.byte	255
	.byte	3
	.uleb128 .Lcst_end2-.Lcst_begin2
.Lcst_begin2:
	.word	.Lfunc_begin2-.Lfunc_begin2
	.word	.Lfunc_end2-.Lfunc_begin2
	.word	0
	.byte	0
.Lcst_end2:
	.p2align	2

	.type	.Ltagptr59,@object
	.section	.sdata,"aw",@progbits
.Ltagptr59:
	.ascii	":i"
	.size	.Ltagptr59, 2

	.type	.L2054164975901393949,@object
	.p2align	3
.L2054164975901393949:
	.word	.Ltagptr59
	.word	2
	.size	.L2054164975901393949, 8

	.type	139988775571360,@object
	.globl	139988775571360
	.p2align	3
139988775571360:
	.word	139988775570640
	.size	139988775571360, 4

	.type	139988775570640,@object
	.globl	139988775570640
	.p2align	3
139988775570640:
	.quad	0x3e112e0be826d695
	.size	139988775570640, 8

	.type	.Ltagptr0,@object
.Ltagptr0:
	.ascii	":n"
	.size	.Ltagptr0, 2

	.type	.L2400020124162657182,@object
	.p2align	3
.L2400020124162657182:
	.word	.Ltagptr0
	.word	2
	.size	.L2400020124162657182, 8

	.hidden	DW.ref.__nac3_personality
	.weak	DW.ref.__nac3_personality
	.section	.data.DW.ref.__nac3_personality,"aGw",@progbits,DW.ref.__nac3_personality,comdat
	.p2align	2
	.type	DW.ref.__nac3_personality,@object
	.size	DW.ref.__nac3_personality, 4
DW.ref.__nac3_personality:
	.word	__nac3_personality
	.section	".note.GNU-stack","",@progbits
assembly for cortexa9
	.text
	.syntax unified
	.eabi_attribute	67, "2.09"
	.eabi_attribute	6, 10
	.eabi_attribute	7, 65
	.eabi_attribute	8, 1
	.eabi_attribute	9, 2
	.fpu	neon-fp16
	.eabi_attribute	36, 1
	.eabi_attribute	34, 1
	.eabi_attribute	15, 1
	.eabi_attribute	16, 1
	.eabi_attribute	17, 2
	.eabi_attribute	20, 1
	.eabi_attribute	21, 0
	.eabi_attribute	23, 3
	.eabi_attribute	24, 1
	.eabi_attribute	25, 1
	.eabi_attribute	28, 1
	.eabi_attribute	38, 1
	.eabi_attribute	14, 0
	.file	"main"
	.globl	__modinit__
	.p2align	2
	.type	__modinit__,%function
	.code	32
__modinit__:
.Lfunc_begin0:
	.fnstart
	.save	{r11, lr}
	push	{r11, lr}
	ldr	r0, .LCPI0_0
.LPC0_0:
	ldr	r0, [pc, r0]
	bl	.L__main__.Demo.run.0
	bl	.Lattributes_writeback
	pop	{r11, pc}
	.p2align	2
.LCPI0_0:
.Ltmp0:
	.long	139790169087904(GOT_PREL)-((.LPC0_0+8)-.Ltmp0)
.Lfunc_end0:
	.size	__modinit__, .Lfunc_end0-__modinit__
	.globl	__nac3_personality
	.personality __nac3_personality
	.handlerdata
	.p2align	2
GCC_except_table0:
.Lexception0:
	.byte	255
	.byte	255
	.byte	1
	.uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
	.uleb128 .Lfunc_begin0-.Lfunc_begin0
	.uleb128 .Lfunc_end0-.Lfunc_begin0
	.byte	0
	.byte	0
.Lcst_end0:
	.p2align	2
	.fnend

	.p2align	2
	.type	.L__main__.Demo.run.0,%function
	.code	32
.L__main__.Demo.run.0:
.Lfunc_begin1:
	.fnstart
	.save	{r4, r5, r6, r7, r11, lr}
	push	{r4, r5, r6, r7, r11, lr}
	.setfp	r11, sp, #16
	add	r11, sp, #16
	.pad	#24
	sub	sp, sp, #24
	ldr	r1, .LCPI1_0
	sub	r2, r11, #32
	str	r0, [r11, #-24]
	mov	r0, #59
.LPC1_0:
	add	r1, pc, r1
	mov	r4, sp
	bl	rpc_send
	sub	r0, r11, #36
	mov	sp, r4
	mov	r5, #7
.LBB1_1:
	bl	rpc_recv
	cmp	r0, #0
	beq	.LBB1_3
	add	r0, r5, r0, lsl #2
	bic	r0, r0, #7
	sub	r0, sp, r0
	mov	sp, r0
	b	.LBB1_1
.LBB1_3:
	vmov.f64	d0, #3.000000e+00
	ldrsh	r0, [r11, #-36]
	bl	__powidf2
	mov	sp, r4
	bl	print_float
	sub	sp, r11, #16
	pop	{r4, r5, r6, r7, r11, pc}
	.p2align	2
.LCPI1_0:
	.long	.L2054164975901393949-(.LPC1_0+8)
.Lfunc_end1:
	.size	.L__main__.Demo.run.0, .Lfunc_end1-.L__main__.Demo.run.0
	.globl	__nac3_personality
	.personality __nac3_personality
	.handlerdata
	.p2align	2
GCC_except_table1:
.Lexception1:
	.byte	255
	.byte	255
	.byte	1
	.uleb128 .Lcst_end1-.Lcst_begin1
.Lcst_begin1:
	.uleb128 .Lfunc_begin1-.Lfunc_begin1
	.uleb128 .Lfunc_end1-.Lfunc_begin1
	.byte	0
	.byte	0
.Lcst_end1:
	.p2align	2
	.fnend

	.p2align	2
	.type	.Lattributes_writeback,%function
	.code	32
.Lattributes_writeback:
.Lfunc_begin2:
	.fnstart
	.save	{r4, lr}
	push	{r4, lr}
	.pad	#8
	sub	sp, sp, #8
	ldr	r1, .LCPI2_0
	mov	r2, sp
	mov	r0, #0
	mov	r4, sp
.LPC2_0:
	add	r1, pc, r1
	bl	rpc_send
	mov	sp, r4
	mov	r0, #0
	bl	rpc_recv
	add	sp, sp, #8
	pop	{r4, pc}
	.p2align	2
.LCPI2_0:
	.long	.L2400020124162657182-(.LPC2_0+8)
.Lfunc_end2:
	.size	.Lattributes_writeback, .Lfunc_end2-.Lattributes_writeback
	.globl	__nac3_personality
	.personality __nac3_personality
	.handlerdata
	.p2align	2
GCC_except_table2:
.Lexception2:
	.byte	255
	.byte	255
	.byte	1
	.uleb128 .Lcst_end2-.Lcst_begin2
.Lcst_begin2:
	.uleb128 .Lfunc_begin2-.Lfunc_begin2
	.uleb128 .Lfunc_end2-.Lfunc_begin2
	.byte	0
	.byte	0
.Lcst_end2:
	.p2align	2
	.fnend

	.type	.Ltagptr59,%object
	.data
.Ltagptr59:
	.ascii	":i"
	.size	.Ltagptr59, 2

	.type	.L2054164975901393949,%object
	.p2align	2
.L2054164975901393949:
	.long	.Ltagptr59
	.long	2
	.size	.L2054164975901393949, 8

	.type	139790169087184,%object
	.globl	139790169087184
	.p2align	3
139790169087184:
	.long	3894859413
	.long	1041313291
	.size	139790169087184, 8

	.type	139790169087904,%object
	.globl	139790169087904
	.p2align	2
139790169087904:
	.long	139790169087184
	.size	139790169087904, 4

	.type	.Ltagptr0,%object
.Ltagptr0:
	.ascii	":n"
	.size	.Ltagptr0, 2

	.type	.L2400020124162657182,%object
	.p2align	2
.L2400020124162657182:
	.long	.Ltagptr0
	.long	2
	.size	.L2400020124162657182, 8

	.section	".note.GNU-stack","",%progbits
	.eabi_attribute	30, 1

IR
; ModuleID = 'main'
source_filename = "main"

%__main__.Demo = type { %min_artiq.Core* }
%min_artiq.Core = type { double }

@tagptr83 = private global [2 x i8] c":i"
@"2054164975901393949" = private global { i8*, i64 } { i8* getelementptr inbounds ([2 x i8], [2 x i8]* @tagptr83, i32 0, i32 0), i64 2 }
@"140014876172192" = global %__main__.Demo { %min_artiq.Core* @"140014876171472" }
@"140014876171472" = global %min_artiq.Core { double 1.000000e-09 }
@tagptr0 = private global [2 x i8] c":n"
@"2400020124162657182" = private global { i8*, i64 } { i8* getelementptr inbounds ([2 x i8], [2 x i8]* @tagptr0, i32 0, i32 0), i64 2 }

define void @__modinit__() personality i32 (...)* @__nac3_personality {
init:
  br label %body

body:                                             ; preds = %init
  call void @__main__.Demo.run.0(%__main__.Demo* @"140014876172192")
  call void @attributes_writeback()
  ret void
}

declare i32 @__nac3_personality(...)

define private void @__main__.Demo.run.0(%__main__.Demo* %0) personality i32 (...)* @__nac3_personality {
init:
  %self = alloca %__main__.Demo*, align 8
  store %__main__.Demo* %0, %__main__.Demo** %self, align 8
  br label %body

body:                                             ; preds = %init
  %rpc.stack = call i8* @llvm.stacksave()
  %argptr = alloca i8*, i32 0, align 8
  call void @rpc_send(i32 83, { i8*, i64 }* @"2054164975901393949", i8** %argptr)
  call void @llvm.stackrestore(i8* %rpc.stack)
  %rpc.ret.slot = alloca i32, align 4
  %rpc.ret.ptr = bitcast i32* %rpc.ret.slot to i8*
  br label %rpc.head

rpc.head:                                         ; preds = %rpc.continue, %body
  %rpc.ptr = phi i8* [ %rpc.ret.ptr, %body ], [ %rpc.alloc.ptr, %rpc.continue ]
  %rpc.size.next = call i32 @rpc_recv(i8* %rpc.ptr)
  %rpc.done = icmp eq i32 0, %rpc.size.next
  br i1 %rpc.done, label %rpc.tail, label %rpc.continue

rpc.continue:                                     ; preds = %rpc.head
  %rpc.alloc = alloca i8*, i32 %rpc.size.next, align 8
  %rpc.alloc.ptr = bitcast i8** %rpc.alloc to i8*
  br label %rpc.head

rpc.tail:                                         ; preds = %rpc.head
  %rpc.result = load i32, i32* %rpc.ret.slot, align 4
  call void @llvm.stackrestore(i8* %rpc.stack)
  %r_pow = trunc i32 %rpc.result to i16
  %f_pow_i = call double @llvm.powi.f64.i16(double 3.000000e+00, i16 %r_pow)
  call void @print_float(double %f_pow_i)
  ret void
}

; Function Attrs: nofree nosync nounwind willreturn
declare i8* @llvm.stacksave() #0

declare void @rpc_send(i32, { i8*, i64 }*, i8**)

; Function Attrs: nofree nosync nounwind willreturn
declare void @llvm.stackrestore(i8*) #0

declare i32 @rpc_recv(i8*)

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare double @llvm.powi.f64.i16(double, i16) #1

declare void @print_float(double)
optimized (current nac3 settings)
assembly for x86_64
	.text
	.file	"main"
	.globl	__modinit__
	.p2align	4, 0x90
	.type	__modinit__,@function
__modinit__:
.Lfunc_begin0:
	.cfi_startproc
	.cfi_personality 155, DW.ref.__nac3_personality
	.cfi_lsda 27, .Lexception0
	pushq	%rbx
	.cfi_def_cfa_offset 16
	subq	$16, %rsp
	.cfi_def_cfa_offset 32
	.cfi_offset %rbx, -16
	callq	.L__main__.Demo.run.0
	movq	%rsp, %rbx
	leaq	.L2400020124162657182(%rip), %rsi
	leaq	8(%rsp), %rdx
	xorl	%edi, %edi
	callq	rpc_send@PLT
	movq	%rbx, %rsp
	xorl	%edi, %edi
	callq	rpc_recv@PLT
	addq	$16, %rsp
	.cfi_def_cfa_offset 16
	popq	%rbx
	.cfi_def_cfa_offset 8
	retq
.Lfunc_end0:
	.size	__modinit__, .Lfunc_end0-__modinit__
	.cfi_endproc
	.section	.gcc_except_table,"a",@progbits
	.p2align	2
GCC_except_table0:
.Lexception0:
	.byte	255
	.byte	255
	.byte	1
	.uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
	.uleb128 .Lfunc_begin0-.Lfunc_begin0
	.uleb128 .Lfunc_end0-.Lfunc_begin0
	.byte	0
	.byte	0
.Lcst_end0:
	.p2align	2

	.text
	.p2align	4, 0x90
	.type	.L__main__.Demo.run.0,@function
.L__main__.Demo.run.0:
.Lfunc_begin1:
	.cfi_startproc
	.cfi_personality 155, DW.ref.__nac3_personality
	.cfi_lsda 27, .Lexception1
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	pushq	%rbx
	subq	$24, %rsp
	.cfi_offset %rbx, -24
	movq	%rsp, %rbx
	leaq	.L2054164975901393949(%rip), %rsi
	leaq	-16(%rbp), %rdx
	movl	$83, %edi
	callq	rpc_send@PLT
	movq	%rbx, %rsp
	leaq	-20(%rbp), %rdi
	.p2align	4, 0x90
.LBB1_2:
	callq	rpc_recv@PLT
	testl	%eax, %eax
	je	.LBB1_3
	movl	%eax, %eax
	movq	%rsp, %rdi
	leaq	15(,%rax,8), %rax
	andq	$-16, %rax
	subq	%rax, %rdi
	movq	%rdi, %rsp
	jmp	.LBB1_2
.LBB1_3:
	movq	%rbx, %rsp
	callq	print_float@PLT
	leaq	-8(%rbp), %rsp
	popq	%rbx
	popq	%rbp
	.cfi_def_cfa %rsp, 8
	retq
.Lfunc_end1:
	.size	.L__main__.Demo.run.0, .Lfunc_end1-.L__main__.Demo.run.0
	.cfi_endproc
	.section	.gcc_except_table,"a",@progbits
	.p2align	2
GCC_except_table1:
.Lexception1:
	.byte	255
	.byte	255
	.byte	1
	.uleb128 .Lcst_end1-.Lcst_begin1
.Lcst_begin1:
	.uleb128 .Lfunc_begin1-.Lfunc_begin1
	.uleb128 .Lfunc_end1-.Lfunc_begin1
	.byte	0
	.byte	0
.Lcst_end1:
	.p2align	2

	.type	.Ltagptr83,@object
	.data
.Ltagptr83:
	.ascii	":i"
	.size	.Ltagptr83, 2

	.type	.L2054164975901393949,@object
	.p2align	3
.L2054164975901393949:
	.quad	.Ltagptr83
	.quad	2
	.size	.L2054164975901393949, 16

	.type	139916149378256,@object
	.globl	139916149378256
	.p2align	3
139916149378256:
	.quad	0x3e112e0be826d695
	.size	139916149378256, 8

	.type	139916149378976,@object
	.globl	139916149378976
	.p2align	3
139916149378976:
	.quad	139916149378256
	.size	139916149378976, 8

	.type	.Ltagptr0,@object
.Ltagptr0:
	.ascii	":n"
	.size	.Ltagptr0, 2

	.type	.L2400020124162657182,@object
	.p2align	3
.L2400020124162657182:
	.quad	.Ltagptr0
	.quad	2
	.size	.L2400020124162657182, 16

	.hidden	DW.ref.__nac3_personality
	.weak	DW.ref.__nac3_personality
	.section	.data.DW.ref.__nac3_personality,"aGw",@progbits,DW.ref.__nac3_personality,comdat
	.p2align	3
	.type	DW.ref.__nac3_personality,@object
	.size	DW.ref.__nac3_personality, 8
DW.ref.__nac3_personality:
	.quad	__nac3_personality
	.section	".note.GNU-stack","",@progbits

assembly for rv32g
	.text
	.attribute	4, 16
	.attribute	5, "rv32i2p0_m2p0_a2p0_f2p0_d2p0"
	.file	"main"
	.globl	__modinit__
	.p2align	2
	.type	__modinit__,@function
__modinit__:
.Lfunc_begin0:
	.cfi_startproc
	.cfi_personality 155, DW.ref.__nac3_personality
	.cfi_lsda 27, .Lexception0
	addi	sp, sp, -16
	.cfi_def_cfa_offset 16
	sw	ra, 12(sp)
	sw	s0, 8(sp)
	.cfi_offset ra, -4
	.cfi_offset s0, -8
	call	.L__main__.Demo.run.0
	mv	s0, sp
.LBB0_1:
	auipc	a1, %pcrel_hi(.L2400020124162657182)
	addi	a1, a1, %pcrel_lo(.LBB0_1)
	mv	a2, sp
	mv	a0, zero
	call	rpc_send@plt
	mv	sp, s0
	mv	a0, zero
	call	rpc_recv@plt
	lw	s0, 8(sp)
	lw	ra, 12(sp)
	addi	sp, sp, 16
	ret
.Lfunc_end0:
	.size	__modinit__, .Lfunc_end0-__modinit__
	.cfi_endproc
	.section	.gcc_except_table,"a",@progbits
	.p2align	2
GCC_except_table0:
.Lexception0:
	.byte	255
	.byte	255
	.byte	3
	.uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
	.word	.Lfunc_begin0-.Lfunc_begin0
	.word	.Lfunc_end0-.Lfunc_begin0
	.word	0
	.byte	0
.Lcst_end0:
	.p2align	2

	.section	.sdata,"aw",@progbits
	.p2align	3
.LCPI1_0:
	.quad	0x4008000000000000
	.text
	.p2align	2
	.type	.L__main__.Demo.run.0,@function
.L__main__.Demo.run.0:
.Lfunc_begin1:
	.cfi_startproc
	.cfi_personality 155, DW.ref.__nac3_personality
	.cfi_lsda 27, .Lexception1
	addi	sp, sp, -32
	.cfi_def_cfa_offset 32
	sw	ra, 28(sp)
	sw	s0, 24(sp)
	sw	s1, 20(sp)
	.cfi_offset ra, -4
	.cfi_offset s0, -8
	.cfi_offset s1, -12
	addi	s0, sp, 32
	.cfi_def_cfa s0, 0
	mv	s1, sp
.LBB1_3:
	auipc	a1, %pcrel_hi(.L2054164975901393949)
	addi	a1, a1, %pcrel_lo(.LBB1_3)
	addi	a0, zero, 59
	addi	a2, s0, -16
	call	rpc_send@plt
	mv	sp, s1
	addi	a0, s0, -20
	call	rpc_recv@plt
	beqz	a0, .LBB1_2
.LBB1_1:
	slli	a0, a0, 2
	addi	a0, a0, 15
	andi	a0, a0, -16
	sub	a0, sp, a0
	mv	sp, a0
	call	rpc_recv@plt
	bnez	a0, .LBB1_1
.LBB1_2:
	lh	a0, -20(s0)
	mv	sp, s1
.LBB1_4:
	auipc	a1, %pcrel_hi(.LCPI1_0)
	addi	a1, a1, %pcrel_lo(.LBB1_4)
	fld	fa0, 0(a1)
	call	__powidf2@plt
	call	print_float@plt
	addi	sp, s0, -32
	lw	s1, 20(sp)
	lw	s0, 24(sp)
	lw	ra, 28(sp)
	addi	sp, sp, 32
	ret
.Lfunc_end1:
	.size	.L__main__.Demo.run.0, .Lfunc_end1-.L__main__.Demo.run.0
	.cfi_endproc
	.section	.gcc_except_table,"a",@progbits
	.p2align	2
GCC_except_table1:
.Lexception1:
	.byte	255
	.byte	255
	.byte	3
	.uleb128 .Lcst_end1-.Lcst_begin1
.Lcst_begin1:
	.word	.Lfunc_begin1-.Lfunc_begin1
	.word	.Lfunc_end1-.Lfunc_begin1
	.word	0
	.byte	0
.Lcst_end1:
	.p2align	2

	.type	.Ltagptr59,@object
	.section	.sdata,"aw",@progbits
.Ltagptr59:
	.ascii	":i"
	.size	.Ltagptr59, 2

	.type	.L2054164975901393949,@object
	.p2align	3
.L2054164975901393949:
	.word	.Ltagptr59
	.word	2
	.size	.L2054164975901393949, 8

	.type	140339428781264,@object
	.globl	140339428781264
	.p2align	3
140339428781264:
	.quad	0x3e112e0be826d695
	.size	140339428781264, 8

	.type	140339428781984,@object
	.globl	140339428781984
	.p2align	3
140339428781984:
	.word	140339428781264
	.size	140339428781984, 4

	.type	.Ltagptr0,@object
.Ltagptr0:
	.ascii	":n"
	.size	.Ltagptr0, 2

	.type	.L2400020124162657182,@object
	.p2align	3
.L2400020124162657182:
	.word	.Ltagptr0
	.word	2
	.size	.L2400020124162657182, 8

	.hidden	DW.ref.__nac3_personality
	.weak	DW.ref.__nac3_personality
	.section	.data.DW.ref.__nac3_personality,"aGw",@progbits,DW.ref.__nac3_personality,comdat
	.p2align	2
	.type	DW.ref.__nac3_personality,@object
	.size	DW.ref.__nac3_personality, 4
DW.ref.__nac3_personality:
	.word	__nac3_personality
	.section	".note.GNU-stack","",@progbits

assembly for cortexa9
	.text
	.syntax unified
	.eabi_attribute	67, "2.09"
	.eabi_attribute	6, 10
	.eabi_attribute	7, 65
	.eabi_attribute	8, 1
	.eabi_attribute	9, 2
	.fpu	neon-fp16
	.eabi_attribute	36, 1
	.eabi_attribute	34, 1
	.eabi_attribute	15, 1
	.eabi_attribute	16, 1
	.eabi_attribute	17, 2
	.eabi_attribute	20, 1
	.eabi_attribute	21, 0
	.eabi_attribute	23, 3
	.eabi_attribute	24, 1
	.eabi_attribute	25, 1
	.eabi_attribute	28, 1
	.eabi_attribute	38, 1
	.eabi_attribute	14, 0
	.file	"main"
	.globl	__modinit__
	.p2align	2
	.type	__modinit__,%function
	.code	32
__modinit__:
.Lfunc_begin0:
	.fnstart
	.save	{r4, lr}
	push	{r4, lr}
	.pad	#8
	sub	sp, sp, #8
	bl	.L__main__.Demo.run.0
	ldr	r1, .LCPI0_0
	mov	r2, sp
	mov	r0, #0
	mov	r4, sp
.LPC0_0:
	add	r1, pc, r1
	bl	rpc_send
	mov	sp, r4
	mov	r0, #0
	bl	rpc_recv
	add	sp, sp, #8
	pop	{r4, pc}
	.p2align	2
.LCPI0_0:
	.long	.L2400020124162657182-(.LPC0_0+8)
.Lfunc_end0:
	.size	__modinit__, .Lfunc_end0-__modinit__
	.globl	__nac3_personality
	.personality __nac3_personality
	.handlerdata
	.p2align	2
GCC_except_table0:
.Lexception0:
	.byte	255
	.byte	255
	.byte	1
	.uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
	.uleb128 .Lfunc_begin0-.Lfunc_begin0
	.uleb128 .Lfunc_end0-.Lfunc_begin0
	.byte	0
	.byte	0
.Lcst_end0:
	.p2align	2
	.fnend

	.p2align	2
	.type	.L__main__.Demo.run.0,%function
	.code	32
.L__main__.Demo.run.0:
.Lfunc_begin1:
	.fnstart
	.save	{r4, r5, r6, r7, r11, lr}
	push	{r4, r5, r6, r7, r11, lr}
	.setfp	r11, sp, #16
	add	r11, sp, #16
	.pad	#16
	sub	sp, sp, #16
	ldr	r1, .LCPI1_0
	sub	r2, r11, #24
	mov	r0, #83
	mov	r4, sp
.LPC1_0:
	add	r1, pc, r1
	bl	rpc_send
	mov	sp, r4
	sub	r0, r11, #28
	bl	rpc_recv
	cmp	r0, #0
	beq	.LBB1_3
	mov	r5, #7
.LBB1_2:
	add	r0, r5, r0, lsl #2
	bic	r0, r0, #7
	sub	r0, sp, r0
	mov	sp, r0
	bl	rpc_recv
	cmp	r0, #0
	bne	.LBB1_2
.LBB1_3:
	vmov.f64	d0, #3.000000e+00
	ldrsh	r0, [r11, #-28]
	bl	__powidf2
	mov	sp, r4
	bl	print_float
	sub	sp, r11, #16
	pop	{r4, r5, r6, r7, r11, pc}
	.p2align	2
.LCPI1_0:
	.long	.L2054164975901393949-(.LPC1_0+8)
.Lfunc_end1:
	.size	.L__main__.Demo.run.0, .Lfunc_end1-.L__main__.Demo.run.0
	.globl	__nac3_personality
	.personality __nac3_personality
	.handlerdata
	.p2align	2
GCC_except_table1:
.Lexception1:
	.byte	255
	.byte	255
	.byte	1
	.uleb128 .Lcst_end1-.Lcst_begin1
.Lcst_begin1:
	.uleb128 .Lfunc_begin1-.Lfunc_begin1
	.uleb128 .Lfunc_end1-.Lfunc_begin1
	.byte	0
	.byte	0
.Lcst_end1:
	.p2align	2
	.fnend

	.type	.Ltagptr83,%object
	.data
.Ltagptr83:
	.ascii	":i"
	.size	.Ltagptr83, 2

	.type	.L2054164975901393949,%object
	.p2align	2
.L2054164975901393949:
	.long	.Ltagptr83
	.long	2
	.size	.L2054164975901393949, 8

	.type	140716975435680,%object
	.globl	140716975435680
	.p2align	2
140716975435680:
	.long	140716975434960
	.size	140716975435680, 4

	.type	140716975434960,%object
	.globl	140716975434960
	.p2align	3
140716975434960:
	.long	3894859413
	.long	1041313291
	.size	140716975434960, 8

	.type	.Ltagptr0,%object
.Ltagptr0:
	.ascii	":n"
	.size	.Ltagptr0, 2

	.type	.L2400020124162657182,%object
	.p2align	2
.L2400020124162657182:
	.long	.Ltagptr0
	.long	2
	.size	.L2400020124162657182, 8

	.section	".note.GNU-stack","",%progbits
	.eabi_attribute	30, 1

IR
; ModuleID = 'main'
source_filename = "main"

%min_artiq.Core = type { double }
%__main__.Demo = type { %min_artiq.Core* }

@tagptr83 = private global [2 x i8] c":i"
@"2054164975901393949" = private global { i8*, i32 } { i8* getelementptr inbounds ([2 x i8], [2 x i8]* @tagptr83, i32 0, i32 0), i32 2 }
@"139839314730192" = global %min_artiq.Core { double 1.000000e-09 }
@"139839314730912" = local_unnamed_addr global %__main__.Demo { %min_artiq.Core* @"139839314730192" }
@tagptr0 = private global [2 x i8] c":n"
@"2400020124162657182" = private global { i8*, i32 } { i8* getelementptr inbounds ([2 x i8], [2 x i8]* @tagptr0, i32 0, i32 0), i32 2 }

define void @__modinit__() local_unnamed_addr personality i32 (...)* @__nac3_personality {
init:
  %argptr1.i = alloca [0 x i8*], align 8
  tail call fastcc void @__main__.Demo.run.0()
  %0 = bitcast [0 x i8*]* %argptr1.i to i8*
  call void @llvm.lifetime.start.p0i8(i64 0, i8* nonnull %0)
  %rpc.stack.i = tail call i8* @llvm.stacksave()
  %argptr1.sub.i = getelementptr inbounds [0 x i8*], [0 x i8*]* %argptr1.i, i64 0, i64 0
  call void @rpc_send(i32 0, { i8*, i32 }* nonnull @"2400020124162657182", i8** nonnull %argptr1.sub.i)
  call void @llvm.stackrestore(i8* %rpc.stack.i)
  %rpc_recv.i = call i32 @rpc_recv(i8* null)
  call void @llvm.lifetime.end.p0i8(i64 0, i8* nonnull %0)
  ret void
}

declare i32 @__nac3_personality(...)

define private fastcc void @__main__.Demo.run.0() unnamed_addr personality i32 (...)* @__nac3_personality {
init:
  %argptr1 = alloca [0 x i8*], align 8
  %rpc.stack = tail call i8* @llvm.stacksave()
  %argptr1.sub = getelementptr inbounds [0 x i8*], [0 x i8*]* %argptr1, i64 0, i64 0
  call void @rpc_send(i32 83, { i8*, i32 }* nonnull @"2054164975901393949", i8** nonnull %argptr1.sub)
  call void @llvm.stackrestore(i8* %rpc.stack)
  %rpc.ret.slot = alloca i32, align 4
  %rpc.ret.ptr = bitcast i32* %rpc.ret.slot to i8*
  %rpc.size.next2 = call i32 @rpc_recv(i8* nonnull %rpc.ret.ptr)
  %rpc.done3 = icmp eq i32 %rpc.size.next2, 0
  br i1 %rpc.done3, label %rpc.tail, label %rpc.continue

rpc.continue:                                     ; preds = %init, %rpc.continue
  %rpc.size.next4 = phi i32 [ %rpc.size.next, %rpc.continue ], [ %rpc.size.next2, %init ]
  %0 = zext i32 %rpc.size.next4 to i64
  %rpc.alloc = alloca i8*, i64 %0, align 8
  %rpc.alloc.ptr = bitcast i8** %rpc.alloc to i8*
  %rpc.size.next = call i32 @rpc_recv(i8* nonnull %rpc.alloc.ptr)
  %rpc.done = icmp eq i32 %rpc.size.next, 0
  br i1 %rpc.done, label %rpc.tail, label %rpc.continue

rpc.tail:                                         ; preds = %rpc.continue, %init
  %rpc.result = load i32, i32* %rpc.ret.slot, align 4
  call void @llvm.stackrestore(i8* %rpc.stack)
  %r_pow = trunc i32 %rpc.result to i16
  %f_pow_i = call double @llvm.powi.f64.i16(double 3.000000e+00, i16 %r_pow)
  call void @print_float(double %f_pow_i)
  ret void
}

; Function Attrs: mustprogress nofree nosync nounwind willreturn
declare i8* @llvm.stacksave() #0

declare void @rpc_send(i32, { i8*, i32 }*, i8**) local_unnamed_addr

; Function Attrs: mustprogress nofree nosync nounwind willreturn
declare void @llvm.stackrestore(i8*) #0

declare i32 @rpc_recv(i8*) local_unnamed_addr

; Function Attrs: mustprogress nofree nosync nounwind readnone speculatable willreturn
declare double @llvm.powi.f64.i16(double, i16) #1

declare void @print_float(double) local_unnamed_addr

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2

attributes #0 = { mustprogress nofree nosync nounwind willreturn }
attributes #1 = { mustprogress nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { argmemonly nofree nosync nounwind willreturn }

I did not manage to see a lot from the x86_64 assembly.. but from the assembly of rv32g and cortexa9, it seems that the function __powidf2 is called, and the definition of __powidf2 I can find is here, which seems fine though...

Thanks for the suggestions! I have tested on wsl2 on my laptop, on zeus by `runkernel` and rv32g by the device on `192.168.1.50`, they all give the errornous output. For this code: ```python @nac3 class Demo: core: KernelInvariant[Core] def build(self): self.core = Core() @kernel def run(self): my_print(3.0 ** get_pow()) T = TypeVar('T') @rpc def my_print(v: T): print(v) @rpc def get_pow() -> int32: return int32(input("get pow: ")) ``` **outputs prior to this patch:** <details> <summary> unoptimzed (by setting <code>OptimizationLevel::None</code> at all places in nac3) </summary> <details> <summary>assembly for x86_64</summary> ```asm .text .file "main" .globl __modinit__ .p2align 4, 0x90 .type __modinit__,@function __modinit__: .Lfunc_begin0: .cfi_startproc .cfi_personality 155, DW.ref.__nac3_personality .cfi_lsda 27, .Lexception0 pushq %rax .cfi_def_cfa_offset 16 movq 140014876172192@GOTPCREL(%rip), %rdi callq .L__main__.Demo.run.0 callq .Lattributes_writeback popq %rax .cfi_def_cfa_offset 8 retq .Lfunc_end0: .size __modinit__, .Lfunc_end0-__modinit__ .cfi_endproc .section .gcc_except_table,"a",@progbits .p2align 2 GCC_except_table0: .Lexception0: .byte 255 .byte 255 .byte 1 .uleb128 .Lcst_end0-.Lcst_begin0 .Lcst_begin0: .uleb128 .Lfunc_begin0-.Lfunc_begin0 .uleb128 .Lfunc_end0-.Lfunc_begin0 .byte 0 .byte 0 .Lcst_end0: .p2align 2 .text .p2align 4, 0x90 .type .L__main__.Demo.run.0,@function .L__main__.Demo.run.0: .Lfunc_begin1: .cfi_startproc .cfi_personality 155, DW.ref.__nac3_personality .cfi_lsda 27, .Lexception1 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp pushq %rbx pushq %rax .cfi_offset %rbx, -24 movq %rdi, -16(%rbp) movq %rsp, %rbx movq %rsp, %rdx leaq .L2054164975901393949(%rip), %rsi movl $83, %edi callq rpc_send@PLT movq %rbx, %rsp movq %rsp, %rdi addq $-16, %rdi .p2align 4, 0x90 .LBB1_1: movq %rdi, %rsp callq rpc_recv@PLT testl %eax, %eax je .LBB1_3 movq %rsp, %rdi movl %eax, %eax leaq 15(,%rax,8), %rax andq $-16, %rax subq %rax, %rdi jmp .LBB1_1 .LBB1_3: movq %rbx, %rsp callq print_float@PLT leaq -8(%rbp), %rsp popq %rbx popq %rbp .cfi_def_cfa %rsp, 8 retq .Lfunc_end1: .size .L__main__.Demo.run.0, .Lfunc_end1-.L__main__.Demo.run.0 .cfi_endproc .section .gcc_except_table,"a",@progbits .p2align 2 GCC_except_table1: .Lexception1: .byte 255 .byte 255 .byte 1 .uleb128 .Lcst_end1-.Lcst_begin1 .Lcst_begin1: .uleb128 .Lfunc_begin1-.Lfunc_begin1 .uleb128 .Lfunc_end1-.Lfunc_begin1 .byte 0 .byte 0 .Lcst_end1: .p2align 2 .text .p2align 4, 0x90 .type .Lattributes_writeback,@function .Lattributes_writeback: .Lfunc_begin2: .cfi_startproc .cfi_personality 155, DW.ref.__nac3_personality .cfi_lsda 27, .Lexception2 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp pushq %rbx pushq %rax .cfi_offset %rbx, -24 movq %rsp, %rbx movq %rsp, %rdx leaq .L2400020124162657182(%rip), %rsi xorl %edi, %edi callq rpc_send@PLT movq %rbx, %rsp xorl %edi, %edi callq rpc_recv@PLT leaq -8(%rbp), %rsp popq %rbx popq %rbp .cfi_def_cfa %rsp, 8 retq .Lfunc_end2: .size .Lattributes_writeback, .Lfunc_end2-.Lattributes_writeback .cfi_endproc .section .gcc_except_table,"a",@progbits .p2align 2 GCC_except_table2: .Lexception2: .byte 255 .byte 255 .byte 1 .uleb128 .Lcst_end2-.Lcst_begin2 .Lcst_begin2: .uleb128 .Lfunc_begin2-.Lfunc_begin2 .uleb128 .Lfunc_end2-.Lfunc_begin2 .byte 0 .byte 0 .Lcst_end2: .p2align 2 .type .Ltagptr83,@object .data .Ltagptr83: .ascii ":i" .size .Ltagptr83, 2 .type .L2054164975901393949,@object .p2align 3 .L2054164975901393949: .quad .Ltagptr83 .quad 2 .size .L2054164975901393949, 16 .type 140014876172192,@object .globl 140014876172192 .p2align 3 140014876172192: .quad 140014876171472 .size 140014876172192, 8 .type 140014876171472,@object .globl 140014876171472 .p2align 3 140014876171472: .quad 0x3e112e0be826d695 .size 140014876171472, 8 .type .Ltagptr0,@object .Ltagptr0: .ascii ":n" .size .Ltagptr0, 2 .type .L2400020124162657182,@object .p2align 3 .L2400020124162657182: .quad .Ltagptr0 .quad 2 .size .L2400020124162657182, 16 .hidden DW.ref.__nac3_personality .weak DW.ref.__nac3_personality .section .data.DW.ref.__nac3_personality,"aGw",@progbits,DW.ref.__nac3_personality,comdat .p2align 3 .type DW.ref.__nac3_personality,@object .size DW.ref.__nac3_personality, 8 DW.ref.__nac3_personality: .quad __nac3_personality .section ".note.GNU-stack","",@progbits ``` </details> <details> <summary>assembly for rv32g</summary> ```asm .text .attribute 4, 16 .attribute 5, "rv32i2p0_m2p0_a2p0_f2p0_d2p0" .file "main" .globl __modinit__ .p2align 2 .type __modinit__,@function __modinit__: .Lfunc_begin0: .cfi_startproc .cfi_personality 155, DW.ref.__nac3_personality .cfi_lsda 27, .Lexception0 addi sp, sp, -16 .cfi_def_cfa_offset 16 sw ra, 12(sp) .cfi_offset ra, -4 .LBB0_1: auipc a0, %got_pcrel_hi(139988775571360) lw a0, %pcrel_lo(.LBB0_1)(a0) call .L__main__.Demo.run.0 call .Lattributes_writeback lw ra, 12(sp) addi sp, sp, 16 ret .Lfunc_end0: .size __modinit__, .Lfunc_end0-__modinit__ .cfi_endproc .section .gcc_except_table,"a",@progbits .p2align 2 GCC_except_table0: .Lexception0: .byte 255 .byte 255 .byte 3 .uleb128 .Lcst_end0-.Lcst_begin0 .Lcst_begin0: .word .Lfunc_begin0-.Lfunc_begin0 .word .Lfunc_end0-.Lfunc_begin0 .word 0 .byte 0 .Lcst_end0: .p2align 2 .section .sdata,"aw",@progbits .p2align 3 .LCPI1_0: .quad 0x4008000000000000 .text .p2align 2 .type .L__main__.Demo.run.0,@function .L__main__.Demo.run.0: .Lfunc_begin1: .cfi_startproc .cfi_personality 155, DW.ref.__nac3_personality .cfi_lsda 27, .Lexception1 addi sp, sp, -32 .cfi_def_cfa_offset 32 sw ra, 28(sp) sw s0, 24(sp) sw s1, 20(sp) sw s2, 16(sp) .cfi_offset ra, -4 .cfi_offset s0, -8 .cfi_offset s1, -12 .cfi_offset s2, -16 addi s0, sp, 32 .cfi_def_cfa s0, 0 sw a0, -24(s0) mv s2, sp mv a2, sp .LBB1_3: auipc a1, %pcrel_hi(.L2054164975901393949) addi a1, a1, %pcrel_lo(.LBB1_3) addi a0, zero, 59 call rpc_send@plt mv sp, s2 addi s1, sp, -16 mv sp, s1 mv a0, s1 call rpc_recv@plt beqz a0, .LBB1_2 .LBB1_1: slli a0, a0, 2 addi a0, a0, 15 andi a0, a0, -16 sub a0, sp, a0 mv sp, a0 call rpc_recv@plt bnez a0, .LBB1_1 .LBB1_2: lh a0, 0(s1) mv sp, s2 .LBB1_4: auipc a1, %pcrel_hi(.LCPI1_0) addi a1, a1, %pcrel_lo(.LBB1_4) fld fa0, 0(a1) call __powidf2@plt call print_float@plt addi sp, s0, -32 lw s2, 16(sp) lw s1, 20(sp) lw s0, 24(sp) lw ra, 28(sp) addi sp, sp, 32 ret .Lfunc_end1: .size .L__main__.Demo.run.0, .Lfunc_end1-.L__main__.Demo.run.0 .cfi_endproc .section .gcc_except_table,"a",@progbits .p2align 2 GCC_except_table1: .Lexception1: .byte 255 .byte 255 .byte 3 .uleb128 .Lcst_end1-.Lcst_begin1 .Lcst_begin1: .word .Lfunc_begin1-.Lfunc_begin1 .word .Lfunc_end1-.Lfunc_begin1 .word 0 .byte 0 .Lcst_end1: .p2align 2 .text .p2align 2 .type .Lattributes_writeback,@function .Lattributes_writeback: .Lfunc_begin2: .cfi_startproc .cfi_personality 155, DW.ref.__nac3_personality .cfi_lsda 27, .Lexception2 addi sp, sp, -16 .cfi_def_cfa_offset 16 sw ra, 12(sp) sw s0, 8(sp) sw s1, 4(sp) .cfi_offset ra, -4 .cfi_offset s0, -8 .cfi_offset s1, -12 addi s0, sp, 16 .cfi_def_cfa s0, 0 mv s1, sp mv a2, sp .LBB2_1: auipc a1, %pcrel_hi(.L2400020124162657182) addi a1, a1, %pcrel_lo(.LBB2_1) mv a0, zero call rpc_send@plt mv sp, s1 mv a0, zero call rpc_recv@plt addi sp, s0, -16 lw s1, 4(sp) lw s0, 8(sp) lw ra, 12(sp) addi sp, sp, 16 ret .Lfunc_end2: .size .Lattributes_writeback, .Lfunc_end2-.Lattributes_writeback .cfi_endproc .section .gcc_except_table,"a",@progbits .p2align 2 GCC_except_table2: .Lexception2: .byte 255 .byte 255 .byte 3 .uleb128 .Lcst_end2-.Lcst_begin2 .Lcst_begin2: .word .Lfunc_begin2-.Lfunc_begin2 .word .Lfunc_end2-.Lfunc_begin2 .word 0 .byte 0 .Lcst_end2: .p2align 2 .type .Ltagptr59,@object .section .sdata,"aw",@progbits .Ltagptr59: .ascii ":i" .size .Ltagptr59, 2 .type .L2054164975901393949,@object .p2align 3 .L2054164975901393949: .word .Ltagptr59 .word 2 .size .L2054164975901393949, 8 .type 139988775571360,@object .globl 139988775571360 .p2align 3 139988775571360: .word 139988775570640 .size 139988775571360, 4 .type 139988775570640,@object .globl 139988775570640 .p2align 3 139988775570640: .quad 0x3e112e0be826d695 .size 139988775570640, 8 .type .Ltagptr0,@object .Ltagptr0: .ascii ":n" .size .Ltagptr0, 2 .type .L2400020124162657182,@object .p2align 3 .L2400020124162657182: .word .Ltagptr0 .word 2 .size .L2400020124162657182, 8 .hidden DW.ref.__nac3_personality .weak DW.ref.__nac3_personality .section .data.DW.ref.__nac3_personality,"aGw",@progbits,DW.ref.__nac3_personality,comdat .p2align 2 .type DW.ref.__nac3_personality,@object .size DW.ref.__nac3_personality, 4 DW.ref.__nac3_personality: .word __nac3_personality .section ".note.GNU-stack","",@progbits ``` </details> <details> <summary> assembly for cortexa9 </summary> ```asm .text .syntax unified .eabi_attribute 67, "2.09" .eabi_attribute 6, 10 .eabi_attribute 7, 65 .eabi_attribute 8, 1 .eabi_attribute 9, 2 .fpu neon-fp16 .eabi_attribute 36, 1 .eabi_attribute 34, 1 .eabi_attribute 15, 1 .eabi_attribute 16, 1 .eabi_attribute 17, 2 .eabi_attribute 20, 1 .eabi_attribute 21, 0 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 28, 1 .eabi_attribute 38, 1 .eabi_attribute 14, 0 .file "main" .globl __modinit__ .p2align 2 .type __modinit__,%function .code 32 __modinit__: .Lfunc_begin0: .fnstart .save {r11, lr} push {r11, lr} ldr r0, .LCPI0_0 .LPC0_0: ldr r0, [pc, r0] bl .L__main__.Demo.run.0 bl .Lattributes_writeback pop {r11, pc} .p2align 2 .LCPI0_0: .Ltmp0: .long 139790169087904(GOT_PREL)-((.LPC0_0+8)-.Ltmp0) .Lfunc_end0: .size __modinit__, .Lfunc_end0-__modinit__ .globl __nac3_personality .personality __nac3_personality .handlerdata .p2align 2 GCC_except_table0: .Lexception0: .byte 255 .byte 255 .byte 1 .uleb128 .Lcst_end0-.Lcst_begin0 .Lcst_begin0: .uleb128 .Lfunc_begin0-.Lfunc_begin0 .uleb128 .Lfunc_end0-.Lfunc_begin0 .byte 0 .byte 0 .Lcst_end0: .p2align 2 .fnend .p2align 2 .type .L__main__.Demo.run.0,%function .code 32 .L__main__.Demo.run.0: .Lfunc_begin1: .fnstart .save {r4, r5, r6, r7, r11, lr} push {r4, r5, r6, r7, r11, lr} .setfp r11, sp, #16 add r11, sp, #16 .pad #24 sub sp, sp, #24 ldr r1, .LCPI1_0 sub r2, r11, #32 str r0, [r11, #-24] mov r0, #59 .LPC1_0: add r1, pc, r1 mov r4, sp bl rpc_send sub r0, r11, #36 mov sp, r4 mov r5, #7 .LBB1_1: bl rpc_recv cmp r0, #0 beq .LBB1_3 add r0, r5, r0, lsl #2 bic r0, r0, #7 sub r0, sp, r0 mov sp, r0 b .LBB1_1 .LBB1_3: vmov.f64 d0, #3.000000e+00 ldrsh r0, [r11, #-36] bl __powidf2 mov sp, r4 bl print_float sub sp, r11, #16 pop {r4, r5, r6, r7, r11, pc} .p2align 2 .LCPI1_0: .long .L2054164975901393949-(.LPC1_0+8) .Lfunc_end1: .size .L__main__.Demo.run.0, .Lfunc_end1-.L__main__.Demo.run.0 .globl __nac3_personality .personality __nac3_personality .handlerdata .p2align 2 GCC_except_table1: .Lexception1: .byte 255 .byte 255 .byte 1 .uleb128 .Lcst_end1-.Lcst_begin1 .Lcst_begin1: .uleb128 .Lfunc_begin1-.Lfunc_begin1 .uleb128 .Lfunc_end1-.Lfunc_begin1 .byte 0 .byte 0 .Lcst_end1: .p2align 2 .fnend .p2align 2 .type .Lattributes_writeback,%function .code 32 .Lattributes_writeback: .Lfunc_begin2: .fnstart .save {r4, lr} push {r4, lr} .pad #8 sub sp, sp, #8 ldr r1, .LCPI2_0 mov r2, sp mov r0, #0 mov r4, sp .LPC2_0: add r1, pc, r1 bl rpc_send mov sp, r4 mov r0, #0 bl rpc_recv add sp, sp, #8 pop {r4, pc} .p2align 2 .LCPI2_0: .long .L2400020124162657182-(.LPC2_0+8) .Lfunc_end2: .size .Lattributes_writeback, .Lfunc_end2-.Lattributes_writeback .globl __nac3_personality .personality __nac3_personality .handlerdata .p2align 2 GCC_except_table2: .Lexception2: .byte 255 .byte 255 .byte 1 .uleb128 .Lcst_end2-.Lcst_begin2 .Lcst_begin2: .uleb128 .Lfunc_begin2-.Lfunc_begin2 .uleb128 .Lfunc_end2-.Lfunc_begin2 .byte 0 .byte 0 .Lcst_end2: .p2align 2 .fnend .type .Ltagptr59,%object .data .Ltagptr59: .ascii ":i" .size .Ltagptr59, 2 .type .L2054164975901393949,%object .p2align 2 .L2054164975901393949: .long .Ltagptr59 .long 2 .size .L2054164975901393949, 8 .type 139790169087184,%object .globl 139790169087184 .p2align 3 139790169087184: .long 3894859413 .long 1041313291 .size 139790169087184, 8 .type 139790169087904,%object .globl 139790169087904 .p2align 2 139790169087904: .long 139790169087184 .size 139790169087904, 4 .type .Ltagptr0,%object .Ltagptr0: .ascii ":n" .size .Ltagptr0, 2 .type .L2400020124162657182,%object .p2align 2 .L2400020124162657182: .long .Ltagptr0 .long 2 .size .L2400020124162657182, 8 .section ".note.GNU-stack","",%progbits .eabi_attribute 30, 1 ``` </details> <details> <summary> IR </summary> ```llvm ; ModuleID = 'main' source_filename = "main" %__main__.Demo = type { %min_artiq.Core* } %min_artiq.Core = type { double } @tagptr83 = private global [2 x i8] c":i" @"2054164975901393949" = private global { i8*, i64 } { i8* getelementptr inbounds ([2 x i8], [2 x i8]* @tagptr83, i32 0, i32 0), i64 2 } @"140014876172192" = global %__main__.Demo { %min_artiq.Core* @"140014876171472" } @"140014876171472" = global %min_artiq.Core { double 1.000000e-09 } @tagptr0 = private global [2 x i8] c":n" @"2400020124162657182" = private global { i8*, i64 } { i8* getelementptr inbounds ([2 x i8], [2 x i8]* @tagptr0, i32 0, i32 0), i64 2 } define void @__modinit__() personality i32 (...)* @__nac3_personality { init: br label %body body: ; preds = %init call void @__main__.Demo.run.0(%__main__.Demo* @"140014876172192") call void @attributes_writeback() ret void } declare i32 @__nac3_personality(...) define private void @__main__.Demo.run.0(%__main__.Demo* %0) personality i32 (...)* @__nac3_personality { init: %self = alloca %__main__.Demo*, align 8 store %__main__.Demo* %0, %__main__.Demo** %self, align 8 br label %body body: ; preds = %init %rpc.stack = call i8* @llvm.stacksave() %argptr = alloca i8*, i32 0, align 8 call void @rpc_send(i32 83, { i8*, i64 }* @"2054164975901393949", i8** %argptr) call void @llvm.stackrestore(i8* %rpc.stack) %rpc.ret.slot = alloca i32, align 4 %rpc.ret.ptr = bitcast i32* %rpc.ret.slot to i8* br label %rpc.head rpc.head: ; preds = %rpc.continue, %body %rpc.ptr = phi i8* [ %rpc.ret.ptr, %body ], [ %rpc.alloc.ptr, %rpc.continue ] %rpc.size.next = call i32 @rpc_recv(i8* %rpc.ptr) %rpc.done = icmp eq i32 0, %rpc.size.next br i1 %rpc.done, label %rpc.tail, label %rpc.continue rpc.continue: ; preds = %rpc.head %rpc.alloc = alloca i8*, i32 %rpc.size.next, align 8 %rpc.alloc.ptr = bitcast i8** %rpc.alloc to i8* br label %rpc.head rpc.tail: ; preds = %rpc.head %rpc.result = load i32, i32* %rpc.ret.slot, align 4 call void @llvm.stackrestore(i8* %rpc.stack) %r_pow = trunc i32 %rpc.result to i16 %f_pow_i = call double @llvm.powi.f64.i16(double 3.000000e+00, i16 %r_pow) call void @print_float(double %f_pow_i) ret void } ; Function Attrs: nofree nosync nounwind willreturn declare i8* @llvm.stacksave() #0 declare void @rpc_send(i32, { i8*, i64 }*, i8**) ; Function Attrs: nofree nosync nounwind willreturn declare void @llvm.stackrestore(i8*) #0 declare i32 @rpc_recv(i8*) ; Function Attrs: nofree nosync nounwind readnone speculatable willreturn declare double @llvm.powi.f64.i16(double, i16) #1 declare void @print_float(double) ``` </details> </details> <details> <summary> optimized (current nac3 settings) </summary> <details> <summary> assembly for x86_64 </summary> ```asm .text .file "main" .globl __modinit__ .p2align 4, 0x90 .type __modinit__,@function __modinit__: .Lfunc_begin0: .cfi_startproc .cfi_personality 155, DW.ref.__nac3_personality .cfi_lsda 27, .Lexception0 pushq %rbx .cfi_def_cfa_offset 16 subq $16, %rsp .cfi_def_cfa_offset 32 .cfi_offset %rbx, -16 callq .L__main__.Demo.run.0 movq %rsp, %rbx leaq .L2400020124162657182(%rip), %rsi leaq 8(%rsp), %rdx xorl %edi, %edi callq rpc_send@PLT movq %rbx, %rsp xorl %edi, %edi callq rpc_recv@PLT addq $16, %rsp .cfi_def_cfa_offset 16 popq %rbx .cfi_def_cfa_offset 8 retq .Lfunc_end0: .size __modinit__, .Lfunc_end0-__modinit__ .cfi_endproc .section .gcc_except_table,"a",@progbits .p2align 2 GCC_except_table0: .Lexception0: .byte 255 .byte 255 .byte 1 .uleb128 .Lcst_end0-.Lcst_begin0 .Lcst_begin0: .uleb128 .Lfunc_begin0-.Lfunc_begin0 .uleb128 .Lfunc_end0-.Lfunc_begin0 .byte 0 .byte 0 .Lcst_end0: .p2align 2 .text .p2align 4, 0x90 .type .L__main__.Demo.run.0,@function .L__main__.Demo.run.0: .Lfunc_begin1: .cfi_startproc .cfi_personality 155, DW.ref.__nac3_personality .cfi_lsda 27, .Lexception1 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp pushq %rbx subq $24, %rsp .cfi_offset %rbx, -24 movq %rsp, %rbx leaq .L2054164975901393949(%rip), %rsi leaq -16(%rbp), %rdx movl $83, %edi callq rpc_send@PLT movq %rbx, %rsp leaq -20(%rbp), %rdi .p2align 4, 0x90 .LBB1_2: callq rpc_recv@PLT testl %eax, %eax je .LBB1_3 movl %eax, %eax movq %rsp, %rdi leaq 15(,%rax,8), %rax andq $-16, %rax subq %rax, %rdi movq %rdi, %rsp jmp .LBB1_2 .LBB1_3: movq %rbx, %rsp callq print_float@PLT leaq -8(%rbp), %rsp popq %rbx popq %rbp .cfi_def_cfa %rsp, 8 retq .Lfunc_end1: .size .L__main__.Demo.run.0, .Lfunc_end1-.L__main__.Demo.run.0 .cfi_endproc .section .gcc_except_table,"a",@progbits .p2align 2 GCC_except_table1: .Lexception1: .byte 255 .byte 255 .byte 1 .uleb128 .Lcst_end1-.Lcst_begin1 .Lcst_begin1: .uleb128 .Lfunc_begin1-.Lfunc_begin1 .uleb128 .Lfunc_end1-.Lfunc_begin1 .byte 0 .byte 0 .Lcst_end1: .p2align 2 .type .Ltagptr83,@object .data .Ltagptr83: .ascii ":i" .size .Ltagptr83, 2 .type .L2054164975901393949,@object .p2align 3 .L2054164975901393949: .quad .Ltagptr83 .quad 2 .size .L2054164975901393949, 16 .type 139916149378256,@object .globl 139916149378256 .p2align 3 139916149378256: .quad 0x3e112e0be826d695 .size 139916149378256, 8 .type 139916149378976,@object .globl 139916149378976 .p2align 3 139916149378976: .quad 139916149378256 .size 139916149378976, 8 .type .Ltagptr0,@object .Ltagptr0: .ascii ":n" .size .Ltagptr0, 2 .type .L2400020124162657182,@object .p2align 3 .L2400020124162657182: .quad .Ltagptr0 .quad 2 .size .L2400020124162657182, 16 .hidden DW.ref.__nac3_personality .weak DW.ref.__nac3_personality .section .data.DW.ref.__nac3_personality,"aGw",@progbits,DW.ref.__nac3_personality,comdat .p2align 3 .type DW.ref.__nac3_personality,@object .size DW.ref.__nac3_personality, 8 DW.ref.__nac3_personality: .quad __nac3_personality .section ".note.GNU-stack","",@progbits ``` </details> <details> <summary> assembly for rv32g </summary> ```asm .text .attribute 4, 16 .attribute 5, "rv32i2p0_m2p0_a2p0_f2p0_d2p0" .file "main" .globl __modinit__ .p2align 2 .type __modinit__,@function __modinit__: .Lfunc_begin0: .cfi_startproc .cfi_personality 155, DW.ref.__nac3_personality .cfi_lsda 27, .Lexception0 addi sp, sp, -16 .cfi_def_cfa_offset 16 sw ra, 12(sp) sw s0, 8(sp) .cfi_offset ra, -4 .cfi_offset s0, -8 call .L__main__.Demo.run.0 mv s0, sp .LBB0_1: auipc a1, %pcrel_hi(.L2400020124162657182) addi a1, a1, %pcrel_lo(.LBB0_1) mv a2, sp mv a0, zero call rpc_send@plt mv sp, s0 mv a0, zero call rpc_recv@plt lw s0, 8(sp) lw ra, 12(sp) addi sp, sp, 16 ret .Lfunc_end0: .size __modinit__, .Lfunc_end0-__modinit__ .cfi_endproc .section .gcc_except_table,"a",@progbits .p2align 2 GCC_except_table0: .Lexception0: .byte 255 .byte 255 .byte 3 .uleb128 .Lcst_end0-.Lcst_begin0 .Lcst_begin0: .word .Lfunc_begin0-.Lfunc_begin0 .word .Lfunc_end0-.Lfunc_begin0 .word 0 .byte 0 .Lcst_end0: .p2align 2 .section .sdata,"aw",@progbits .p2align 3 .LCPI1_0: .quad 0x4008000000000000 .text .p2align 2 .type .L__main__.Demo.run.0,@function .L__main__.Demo.run.0: .Lfunc_begin1: .cfi_startproc .cfi_personality 155, DW.ref.__nac3_personality .cfi_lsda 27, .Lexception1 addi sp, sp, -32 .cfi_def_cfa_offset 32 sw ra, 28(sp) sw s0, 24(sp) sw s1, 20(sp) .cfi_offset ra, -4 .cfi_offset s0, -8 .cfi_offset s1, -12 addi s0, sp, 32 .cfi_def_cfa s0, 0 mv s1, sp .LBB1_3: auipc a1, %pcrel_hi(.L2054164975901393949) addi a1, a1, %pcrel_lo(.LBB1_3) addi a0, zero, 59 addi a2, s0, -16 call rpc_send@plt mv sp, s1 addi a0, s0, -20 call rpc_recv@plt beqz a0, .LBB1_2 .LBB1_1: slli a0, a0, 2 addi a0, a0, 15 andi a0, a0, -16 sub a0, sp, a0 mv sp, a0 call rpc_recv@plt bnez a0, .LBB1_1 .LBB1_2: lh a0, -20(s0) mv sp, s1 .LBB1_4: auipc a1, %pcrel_hi(.LCPI1_0) addi a1, a1, %pcrel_lo(.LBB1_4) fld fa0, 0(a1) call __powidf2@plt call print_float@plt addi sp, s0, -32 lw s1, 20(sp) lw s0, 24(sp) lw ra, 28(sp) addi sp, sp, 32 ret .Lfunc_end1: .size .L__main__.Demo.run.0, .Lfunc_end1-.L__main__.Demo.run.0 .cfi_endproc .section .gcc_except_table,"a",@progbits .p2align 2 GCC_except_table1: .Lexception1: .byte 255 .byte 255 .byte 3 .uleb128 .Lcst_end1-.Lcst_begin1 .Lcst_begin1: .word .Lfunc_begin1-.Lfunc_begin1 .word .Lfunc_end1-.Lfunc_begin1 .word 0 .byte 0 .Lcst_end1: .p2align 2 .type .Ltagptr59,@object .section .sdata,"aw",@progbits .Ltagptr59: .ascii ":i" .size .Ltagptr59, 2 .type .L2054164975901393949,@object .p2align 3 .L2054164975901393949: .word .Ltagptr59 .word 2 .size .L2054164975901393949, 8 .type 140339428781264,@object .globl 140339428781264 .p2align 3 140339428781264: .quad 0x3e112e0be826d695 .size 140339428781264, 8 .type 140339428781984,@object .globl 140339428781984 .p2align 3 140339428781984: .word 140339428781264 .size 140339428781984, 4 .type .Ltagptr0,@object .Ltagptr0: .ascii ":n" .size .Ltagptr0, 2 .type .L2400020124162657182,@object .p2align 3 .L2400020124162657182: .word .Ltagptr0 .word 2 .size .L2400020124162657182, 8 .hidden DW.ref.__nac3_personality .weak DW.ref.__nac3_personality .section .data.DW.ref.__nac3_personality,"aGw",@progbits,DW.ref.__nac3_personality,comdat .p2align 2 .type DW.ref.__nac3_personality,@object .size DW.ref.__nac3_personality, 4 DW.ref.__nac3_personality: .word __nac3_personality .section ".note.GNU-stack","",@progbits ``` </details> <details> <summary> assembly for cortexa9 </summary> ```asm .text .syntax unified .eabi_attribute 67, "2.09" .eabi_attribute 6, 10 .eabi_attribute 7, 65 .eabi_attribute 8, 1 .eabi_attribute 9, 2 .fpu neon-fp16 .eabi_attribute 36, 1 .eabi_attribute 34, 1 .eabi_attribute 15, 1 .eabi_attribute 16, 1 .eabi_attribute 17, 2 .eabi_attribute 20, 1 .eabi_attribute 21, 0 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 28, 1 .eabi_attribute 38, 1 .eabi_attribute 14, 0 .file "main" .globl __modinit__ .p2align 2 .type __modinit__,%function .code 32 __modinit__: .Lfunc_begin0: .fnstart .save {r4, lr} push {r4, lr} .pad #8 sub sp, sp, #8 bl .L__main__.Demo.run.0 ldr r1, .LCPI0_0 mov r2, sp mov r0, #0 mov r4, sp .LPC0_0: add r1, pc, r1 bl rpc_send mov sp, r4 mov r0, #0 bl rpc_recv add sp, sp, #8 pop {r4, pc} .p2align 2 .LCPI0_0: .long .L2400020124162657182-(.LPC0_0+8) .Lfunc_end0: .size __modinit__, .Lfunc_end0-__modinit__ .globl __nac3_personality .personality __nac3_personality .handlerdata .p2align 2 GCC_except_table0: .Lexception0: .byte 255 .byte 255 .byte 1 .uleb128 .Lcst_end0-.Lcst_begin0 .Lcst_begin0: .uleb128 .Lfunc_begin0-.Lfunc_begin0 .uleb128 .Lfunc_end0-.Lfunc_begin0 .byte 0 .byte 0 .Lcst_end0: .p2align 2 .fnend .p2align 2 .type .L__main__.Demo.run.0,%function .code 32 .L__main__.Demo.run.0: .Lfunc_begin1: .fnstart .save {r4, r5, r6, r7, r11, lr} push {r4, r5, r6, r7, r11, lr} .setfp r11, sp, #16 add r11, sp, #16 .pad #16 sub sp, sp, #16 ldr r1, .LCPI1_0 sub r2, r11, #24 mov r0, #83 mov r4, sp .LPC1_0: add r1, pc, r1 bl rpc_send mov sp, r4 sub r0, r11, #28 bl rpc_recv cmp r0, #0 beq .LBB1_3 mov r5, #7 .LBB1_2: add r0, r5, r0, lsl #2 bic r0, r0, #7 sub r0, sp, r0 mov sp, r0 bl rpc_recv cmp r0, #0 bne .LBB1_2 .LBB1_3: vmov.f64 d0, #3.000000e+00 ldrsh r0, [r11, #-28] bl __powidf2 mov sp, r4 bl print_float sub sp, r11, #16 pop {r4, r5, r6, r7, r11, pc} .p2align 2 .LCPI1_0: .long .L2054164975901393949-(.LPC1_0+8) .Lfunc_end1: .size .L__main__.Demo.run.0, .Lfunc_end1-.L__main__.Demo.run.0 .globl __nac3_personality .personality __nac3_personality .handlerdata .p2align 2 GCC_except_table1: .Lexception1: .byte 255 .byte 255 .byte 1 .uleb128 .Lcst_end1-.Lcst_begin1 .Lcst_begin1: .uleb128 .Lfunc_begin1-.Lfunc_begin1 .uleb128 .Lfunc_end1-.Lfunc_begin1 .byte 0 .byte 0 .Lcst_end1: .p2align 2 .fnend .type .Ltagptr83,%object .data .Ltagptr83: .ascii ":i" .size .Ltagptr83, 2 .type .L2054164975901393949,%object .p2align 2 .L2054164975901393949: .long .Ltagptr83 .long 2 .size .L2054164975901393949, 8 .type 140716975435680,%object .globl 140716975435680 .p2align 2 140716975435680: .long 140716975434960 .size 140716975435680, 4 .type 140716975434960,%object .globl 140716975434960 .p2align 3 140716975434960: .long 3894859413 .long 1041313291 .size 140716975434960, 8 .type .Ltagptr0,%object .Ltagptr0: .ascii ":n" .size .Ltagptr0, 2 .type .L2400020124162657182,%object .p2align 2 .L2400020124162657182: .long .Ltagptr0 .long 2 .size .L2400020124162657182, 8 .section ".note.GNU-stack","",%progbits .eabi_attribute 30, 1 ``` </details> <details> <summary> IR </summary> ```llvm ; ModuleID = 'main' source_filename = "main" %min_artiq.Core = type { double } %__main__.Demo = type { %min_artiq.Core* } @tagptr83 = private global [2 x i8] c":i" @"2054164975901393949" = private global { i8*, i32 } { i8* getelementptr inbounds ([2 x i8], [2 x i8]* @tagptr83, i32 0, i32 0), i32 2 } @"139839314730192" = global %min_artiq.Core { double 1.000000e-09 } @"139839314730912" = local_unnamed_addr global %__main__.Demo { %min_artiq.Core* @"139839314730192" } @tagptr0 = private global [2 x i8] c":n" @"2400020124162657182" = private global { i8*, i32 } { i8* getelementptr inbounds ([2 x i8], [2 x i8]* @tagptr0, i32 0, i32 0), i32 2 } define void @__modinit__() local_unnamed_addr personality i32 (...)* @__nac3_personality { init: %argptr1.i = alloca [0 x i8*], align 8 tail call fastcc void @__main__.Demo.run.0() %0 = bitcast [0 x i8*]* %argptr1.i to i8* call void @llvm.lifetime.start.p0i8(i64 0, i8* nonnull %0) %rpc.stack.i = tail call i8* @llvm.stacksave() %argptr1.sub.i = getelementptr inbounds [0 x i8*], [0 x i8*]* %argptr1.i, i64 0, i64 0 call void @rpc_send(i32 0, { i8*, i32 }* nonnull @"2400020124162657182", i8** nonnull %argptr1.sub.i) call void @llvm.stackrestore(i8* %rpc.stack.i) %rpc_recv.i = call i32 @rpc_recv(i8* null) call void @llvm.lifetime.end.p0i8(i64 0, i8* nonnull %0) ret void } declare i32 @__nac3_personality(...) define private fastcc void @__main__.Demo.run.0() unnamed_addr personality i32 (...)* @__nac3_personality { init: %argptr1 = alloca [0 x i8*], align 8 %rpc.stack = tail call i8* @llvm.stacksave() %argptr1.sub = getelementptr inbounds [0 x i8*], [0 x i8*]* %argptr1, i64 0, i64 0 call void @rpc_send(i32 83, { i8*, i32 }* nonnull @"2054164975901393949", i8** nonnull %argptr1.sub) call void @llvm.stackrestore(i8* %rpc.stack) %rpc.ret.slot = alloca i32, align 4 %rpc.ret.ptr = bitcast i32* %rpc.ret.slot to i8* %rpc.size.next2 = call i32 @rpc_recv(i8* nonnull %rpc.ret.ptr) %rpc.done3 = icmp eq i32 %rpc.size.next2, 0 br i1 %rpc.done3, label %rpc.tail, label %rpc.continue rpc.continue: ; preds = %init, %rpc.continue %rpc.size.next4 = phi i32 [ %rpc.size.next, %rpc.continue ], [ %rpc.size.next2, %init ] %0 = zext i32 %rpc.size.next4 to i64 %rpc.alloc = alloca i8*, i64 %0, align 8 %rpc.alloc.ptr = bitcast i8** %rpc.alloc to i8* %rpc.size.next = call i32 @rpc_recv(i8* nonnull %rpc.alloc.ptr) %rpc.done = icmp eq i32 %rpc.size.next, 0 br i1 %rpc.done, label %rpc.tail, label %rpc.continue rpc.tail: ; preds = %rpc.continue, %init %rpc.result = load i32, i32* %rpc.ret.slot, align 4 call void @llvm.stackrestore(i8* %rpc.stack) %r_pow = trunc i32 %rpc.result to i16 %f_pow_i = call double @llvm.powi.f64.i16(double 3.000000e+00, i16 %r_pow) call void @print_float(double %f_pow_i) ret void } ; Function Attrs: mustprogress nofree nosync nounwind willreturn declare i8* @llvm.stacksave() #0 declare void @rpc_send(i32, { i8*, i32 }*, i8**) local_unnamed_addr ; Function Attrs: mustprogress nofree nosync nounwind willreturn declare void @llvm.stackrestore(i8*) #0 declare i32 @rpc_recv(i8*) local_unnamed_addr ; Function Attrs: mustprogress nofree nosync nounwind readnone speculatable willreturn declare double @llvm.powi.f64.i16(double, i16) #1 declare void @print_float(double) local_unnamed_addr ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2 ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2 attributes #0 = { mustprogress nofree nosync nounwind willreturn } attributes #1 = { mustprogress nofree nosync nounwind readnone speculatable willreturn } attributes #2 = { argmemonly nofree nosync nounwind willreturn } ``` </details> </details> ----- I did not manage to see a lot from the x86_64 assembly.. but from the assembly of rv32g and cortexa9, it seems that the function `__powidf2` is called, and the definition of `__powidf2` I can find is [here](https://github.com/llvm/llvm-project/blob/1483fb33b314ae02ec96b735c5ff15ce595322bf/compiler-rt/lib/builtins/powidf2.c), which seems fine though...
Author
Collaborator

Update:

It seems that the root of the problem is only in x86_64.. and the reason why previously the problem appears in rv32g is because that the constant optimization is done in x86_64..?

in rv32g, when using @rpc function to get the negative power like the code above, the result is fine, but directy calling my_print(3.0 ** -1) will cause the problem and print inf, and the assembly also reflects that:

@nac3
class Demo(EnvExperiment):
    core: KernelInvariant[Core]
    def build(self):
        self.setattr_device("core")
    @kernel
    def run(self):
        my_print(3.0 ** -1)

T = TypeVar('T')
@rpc
def my_print(v: T):
    print(v)
output assembly for rv32g (even with OptimizationLevel::None this constant optimization still seems there...)
	.text
	.attribute	4, 16
	.attribute	5, "rv32i2p0_m2p0_a2p0_f2p0_d2p0"
	.file	"main"
	.section	.sdata,"aw",@progbits
	.p2align	3
.LCPI0_0:
	.quad	0x7ff0000000000000
	.text
	.globl	__modinit__
	.p2align	2
	.type	__modinit__,@function
__modinit__:
.Lfunc_begin0:
	.cfi_startproc
	.cfi_personality 155, DW.ref.__nac3_personality
	.cfi_lsda 27, .Lexception0
	addi	sp, sp, -16
	.cfi_def_cfa_offset 16
	sw	ra, 12(sp)
	sw	s0, 8(sp)
	.cfi_offset ra, -4
	.cfi_offset s0, -8
.LBB0_1:
	auipc	a0, %pcrel_hi(.LCPI0_0)
	addi	a0, a0, %pcrel_lo(.LBB0_1)
	fld	fa0, 0(a0)
	call	print_float@plt
	mv	s0, sp
.LBB0_2:
	auipc	a1, %pcrel_hi(.L2400020124162657182)
	addi	a1, a1, %pcrel_lo(.LBB0_2)
	mv	a2, sp
	mv	a0, zero
	call	rpc_send@plt
	mv	sp, s0
	mv	a0, zero
	call	rpc_recv@plt
	lw	s0, 8(sp)
	lw	ra, 12(sp)
	addi	sp, sp, 16
	ret
.Lfunc_end0:
	.size	__modinit__, .Lfunc_end0-__modinit__
	.cfi_endproc
	.section	.gcc_except_table,"a",@progbits
	.p2align	2
GCC_except_table0:
.Lexception0:
	.byte	255
	.byte	255
	.byte	3
	.uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
	.word	.Lfunc_begin0-.Lfunc_begin0
	.word	.Lfunc_end0-.Lfunc_begin0
	.word	0
	.byte	0
.Lcst_end0:
	.p2align	2

	.type	139655143423904,@object
	.section	.sdata,"aw",@progbits
	.globl	139655143423904
	.p2align	3
139655143423904:
	.word	139655143423088
	.size	139655143423904, 4

	.type	139655143423088,@object
	.globl	139655143423088
	.p2align	3
139655143423088:
	.quad	0x3e112e0be826d695
	.size	139655143423088, 8

	.type	.Ltagptr0,@object
.Ltagptr0:
	.ascii	":n"
	.size	.Ltagptr0, 2

	.type	.L2400020124162657182,@object
	.p2align	3
.L2400020124162657182:
	.word	.Ltagptr0
	.word	2
	.size	.L2400020124162657182, 8

	.hidden	DW.ref.__nac3_personality
	.weak	DW.ref.__nac3_personality
	.section	.data.DW.ref.__nac3_personality,"aGw",@progbits,DW.ref.__nac3_personality,comdat
	.p2align	2
	.type	DW.ref.__nac3_personality,@object
	.size	DW.ref.__nac3_personality, 4
DW.ref.__nac3_personality:
	.word	__nac3_personality
	.section	".note.GNU-stack","",@progbits

note the 0x7ff0000000000000 representing inf for f64.

Update: It seems that the root of the problem is only in x86_64.. and the reason why previously the problem appears in rv32g is because that the constant optimization is done in x86_64..? in rv32g, when using `@rpc` function to get the negative power like the code above, the result is fine, but directy calling `my_print(3.0 ** -1)` will cause the problem and print `inf`, and the assembly also reflects that: ```python @nac3 class Demo(EnvExperiment): core: KernelInvariant[Core] def build(self): self.setattr_device("core") @kernel def run(self): my_print(3.0 ** -1) T = TypeVar('T') @rpc def my_print(v: T): print(v) ``` <details> <summary>output assembly for rv32g (even with <code>OptimizationLevel::None</code> this constant optimization still seems there...) </summary> ```asm .text .attribute 4, 16 .attribute 5, "rv32i2p0_m2p0_a2p0_f2p0_d2p0" .file "main" .section .sdata,"aw",@progbits .p2align 3 .LCPI0_0: .quad 0x7ff0000000000000 .text .globl __modinit__ .p2align 2 .type __modinit__,@function __modinit__: .Lfunc_begin0: .cfi_startproc .cfi_personality 155, DW.ref.__nac3_personality .cfi_lsda 27, .Lexception0 addi sp, sp, -16 .cfi_def_cfa_offset 16 sw ra, 12(sp) sw s0, 8(sp) .cfi_offset ra, -4 .cfi_offset s0, -8 .LBB0_1: auipc a0, %pcrel_hi(.LCPI0_0) addi a0, a0, %pcrel_lo(.LBB0_1) fld fa0, 0(a0) call print_float@plt mv s0, sp .LBB0_2: auipc a1, %pcrel_hi(.L2400020124162657182) addi a1, a1, %pcrel_lo(.LBB0_2) mv a2, sp mv a0, zero call rpc_send@plt mv sp, s0 mv a0, zero call rpc_recv@plt lw s0, 8(sp) lw ra, 12(sp) addi sp, sp, 16 ret .Lfunc_end0: .size __modinit__, .Lfunc_end0-__modinit__ .cfi_endproc .section .gcc_except_table,"a",@progbits .p2align 2 GCC_except_table0: .Lexception0: .byte 255 .byte 255 .byte 3 .uleb128 .Lcst_end0-.Lcst_begin0 .Lcst_begin0: .word .Lfunc_begin0-.Lfunc_begin0 .word .Lfunc_end0-.Lfunc_begin0 .word 0 .byte 0 .Lcst_end0: .p2align 2 .type 139655143423904,@object .section .sdata,"aw",@progbits .globl 139655143423904 .p2align 3 139655143423904: .word 139655143423088 .size 139655143423904, 4 .type 139655143423088,@object .globl 139655143423088 .p2align 3 139655143423088: .quad 0x3e112e0be826d695 .size 139655143423088, 8 .type .Ltagptr0,@object .Ltagptr0: .ascii ":n" .size .Ltagptr0, 2 .type .L2400020124162657182,@object .p2align 3 .L2400020124162657182: .word .Ltagptr0 .word 2 .size .L2400020124162657182, 8 .hidden DW.ref.__nac3_personality .weak DW.ref.__nac3_personality .section .data.DW.ref.__nac3_personality,"aGw",@progbits,DW.ref.__nac3_personality,comdat .p2align 2 .type DW.ref.__nac3_personality,@object .size DW.ref.__nac3_personality, 4 DW.ref.__nac3_personality: .word __nac3_personality .section ".note.GNU-stack","",@progbits ``` </details> note the `0x7ff0000000000000` representing `inf` for f64.
Owner

I have tested on wsl2 on my laptop

Unrelated to the problem at hand, but you could use the MSYS2 version now, with native Windows executables.

> I have tested on wsl2 on my laptop Unrelated to the problem at hand, but you could use the MSYS2 version now, with native Windows executables.
Owner

Trying to see if other LLVM programs are also affected: Numba has its own implementation (int_power_impl) and I can't seem to get clang to use llvm.powi... what else could use it?

Trying to see if other LLVM programs are also affected: Numba has its own implementation (``int_power_impl``) and I can't seem to get clang to use ``llvm.powi``... what else could use it?
Contributor

Interestingly this works:

define void @test() {
body:
  %f_pow_i = call double @llvm.powi.f64(double 2.000000e+00, i32 -2)
  call void @output_float(double %f_pow_i)
  ret void
}

will be optimized to

define void @test() local_unnamed_addr {
body:
  call void @output_float(double 2.500000e-01)
  ret void
}

I wonder if the documentation is wrong:

Generally, the only supported type for the exponent is the one matching with the C type int.

the listed exponent types are i32 with just 1 i16. And if llvm mistakenly treats llvm.powi.f64.i16 as llvm.powi.f64.i32 by doing an zext for the exponent to convert it to i32, I guess the inf output does make sense in that case...

Interestingly this works: ``` define void @test() { body: %f_pow_i = call double @llvm.powi.f64(double 2.000000e+00, i32 -2) call void @output_float(double %f_pow_i) ret void } ``` will be optimized to ``` define void @test() local_unnamed_addr { body: call void @output_float(double 2.500000e-01) ret void } ``` I wonder if the documentation is wrong: > Generally, the only supported type for the exponent is the one matching with the C type int. the listed exponent types are i32 with just 1 i16. And if llvm mistakenly treats `llvm.powi.f64.i16` as `llvm.powi.f64.i32` by doing an `zext` for the exponent to convert it to `i32`, I guess the `inf` output does make sense in that case...
Contributor

I guess we should change the powi.f64.i16 to powi.f64.i32 to fix this bug. Not sure if this can be considered as a bug in LLVM because they said

the only supported type for the exponent is the one matching with the C type int

so perhaps passing a type that does not match C type int is considered undefined behavior? not sure about this.

I guess we should change the `powi.f64.i16` to `powi.f64.i32` to fix this bug. Not sure if this can be considered as a bug in LLVM because they said > the only supported type for the exponent is the one matching with the C type int so perhaps passing a type that does not match C type int is considered undefined behavior? not sure about this.
Author
Collaborator

On msys2, print_float(3.0 ** -1) is also translated into call void @print_float(double 0x7FF0000000000000), so the error seems to be also there.

and interestingly this code

from min_artiq import *
from numpy import int32

@extern
def print_float(x: float):
    ...

@nac3
class Demo:
    core: KernelInvariant[Core]
    a: Kernel[int32]
    
    def __init__(self):
        self.core = Core()
        self.a = 1
        
    @kernel
    def run(self):
        print_float(3.0 ** self.a)

if __name__ == "__main__":
    Demo().run()

(when turnning on the optimization) gives the following ir, where no constant folding is done:

output ir
; ModuleID = 'main'
source_filename = "main"

%min_artiq.Core = type { double }
%__main__.Demo = type { %min_artiq.Core*, i32 }

@"140066231653856" = global %min_artiq.Core { double 1.000000e-09 }
@"140066231656064" = global %__main__.Demo { %min_artiq.Core* @"140066231653856", i32 1 }
@tagptr0 = private global [3 x i8] c"i:n"
@"5228770092274280312" = private global { i8*, i64 } { i8* getelementptr inbounds ([3 x i8], [3 x i8]* @tagptr0, i32 0, i32 0), i64 3 }

declare i32 @__nac3_personality(...)

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare double @llvm.powi.f64.i16(double, i16) #0

declare void @print_float(double)

define void @__modinit__() personality i32 (...)* @__nac3_personality {
init:
  %self.i = alloca %__main__.Demo*, align 8
  br label %body

body:                                             ; preds = %init
  %0 = bitcast %__main__.Demo** %self.i to i8*
  call void @llvm.lifetime.start.p0i8(i64 8, i8* %0)
  store %__main__.Demo* @"140066231656064", %__main__.Demo** %self.i, align 8
  %load.i = load i32, i32* getelementptr inbounds (%__main__.Demo, %__main__.Demo* @"140066231656064", i32 0, i32 1), align 4
  %r_pow.i = trunc i32 %load.i to i16
  %f_pow_i.i = call double @llvm.powi.f64.i16(double 3.000000e+00, i16 %r_pow.i)
  call void @print_float(double %f_pow_i.i)
  %1 = bitcast %__main__.Demo** %self.i to i8*
  call void @llvm.lifetime.end.p0i8(i64 8, i8* %1)
  call void @attributes_writeback()
  ret void
}

and runkernel gives print_float: 0.0 regardless of what value we set to self.a.

running the above code on rv32g gives correct output.

And using powi.f64.i32 also fixes the problem too. So I also think that maybe the documentation is wrong and somehow llvm did not complain about the use of powi.f64.i16 for i16 not being a C type int. I even tried to use powi.f64.i5 and llvm just also gives garbage results silently.

On msys2, `print_float(3.0 ** -1)` is also translated into `call void @print_float(double 0x7FF0000000000000)`, so the error seems to be also there. and interestingly this code ```python from min_artiq import * from numpy import int32 @extern def print_float(x: float): ... @nac3 class Demo: core: KernelInvariant[Core] a: Kernel[int32] def __init__(self): self.core = Core() self.a = 1 @kernel def run(self): print_float(3.0 ** self.a) if __name__ == "__main__": Demo().run() ``` (when turnning on the optimization) gives the following ir, where no constant folding is done: <details> <summary> output ir </summary> ```llvm ; ModuleID = 'main' source_filename = "main" %min_artiq.Core = type { double } %__main__.Demo = type { %min_artiq.Core*, i32 } @"140066231653856" = global %min_artiq.Core { double 1.000000e-09 } @"140066231656064" = global %__main__.Demo { %min_artiq.Core* @"140066231653856", i32 1 } @tagptr0 = private global [3 x i8] c"i:n" @"5228770092274280312" = private global { i8*, i64 } { i8* getelementptr inbounds ([3 x i8], [3 x i8]* @tagptr0, i32 0, i32 0), i64 3 } declare i32 @__nac3_personality(...) ; Function Attrs: nofree nosync nounwind readnone speculatable willreturn declare double @llvm.powi.f64.i16(double, i16) #0 declare void @print_float(double) define void @__modinit__() personality i32 (...)* @__nac3_personality { init: %self.i = alloca %__main__.Demo*, align 8 br label %body body: ; preds = %init %0 = bitcast %__main__.Demo** %self.i to i8* call void @llvm.lifetime.start.p0i8(i64 8, i8* %0) store %__main__.Demo* @"140066231656064", %__main__.Demo** %self.i, align 8 %load.i = load i32, i32* getelementptr inbounds (%__main__.Demo, %__main__.Demo* @"140066231656064", i32 0, i32 1), align 4 %r_pow.i = trunc i32 %load.i to i16 %f_pow_i.i = call double @llvm.powi.f64.i16(double 3.000000e+00, i16 %r_pow.i) call void @print_float(double %f_pow_i.i) %1 = bitcast %__main__.Demo** %self.i to i8* call void @llvm.lifetime.end.p0i8(i64 8, i8* %1) call void @attributes_writeback() ret void } ``` </details> and runkernel gives `print_float: 0.0` regardless of what value we set to `self.a`. running the above code on rv32g gives correct output. And using `powi.f64.i32` also fixes the problem too. So I also think that maybe the documentation is wrong and somehow llvm did not complain about the use of `powi.f64.i16` for `i16` not being a C type int. I even tried to use `powi.f64.i5` and llvm just also gives garbage results silently.
Contributor

Use powi.f64.i32 then.

Use `powi.f64.i32` then.
ychenfo force-pushed neg_powi_fix from f4e9c2eb31 to 28a759202e 2022-04-04 22:10:25 +08:00 Compare
ychenfo force-pushed neg_powi_fix from 28a759202e to 23b7f4ef18 2022-04-04 22:10:59 +08:00 Compare
Author
Collaborator

Ok, use llvm.powi.f64.i32 now and rebased on the current master branch

Ok, use `llvm.powi.f64.i32` now and rebased on the current master branch
pca006132 merged commit 0d10044d66 into master 2022-04-04 22:43:20 +08:00
ychenfo deleted branch neg_powi_fix 2022-04-04 23:24:14 +08:00
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#254
No description provided.