Fix float**int with negative power #254
No reviewers
Labels
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: M-Labs/nac3#254
Loading…
Reference in New Issue
No description provided.
Delete Branch "neg_powi_fix"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Although according to llvm langref here, the
llvm.powi.f64.i16
can handle negative power, but when testing it produces garbage value.. and after checking I did not manage to see the problem in our code.. So I attempt to correct it currently by taking the reciprocal when doing negative power.current behavior:
output:
On which architectures did you see the problem? It could just be a bug in LLVM or elsewhere outside NAC3.
And what is the assembly code generated? I think with constant folding optimizations disabled, it should be just a call to libm - if the bug is still present then, it would point to a problem with libm?
what are the unoptimized and optimized IRs for this code prior to this patch?
Thanks for the suggestions! I have tested on wsl2 on my laptop, on zeus by
runkernel
and rv32g by the device on192.168.1.50
, they all give the errornous output.For this code:
outputs prior to this patch:
unoptimzed (by setting
OptimizationLevel::None
at all places in nac3)assembly for x86_64
assembly for rv32g
assembly for cortexa9
IR
optimized (current nac3 settings)
assembly for x86_64
assembly for rv32g
assembly for cortexa9
IR
I did not manage to see a lot from the x86_64 assembly.. but from the assembly of rv32g and cortexa9, it seems that the function
__powidf2
is called, and the definition of__powidf2
I can find is here, which seems fine though...Update:
It seems that the root of the problem is only in x86_64.. and the reason why previously the problem appears in rv32g is because that the constant optimization is done in x86_64..?
in rv32g, when using
@rpc
function to get the negative power like the code above, the result is fine, but directy callingmy_print(3.0 ** -1)
will cause the problem and printinf
, and the assembly also reflects that:output assembly for rv32g (even with
OptimizationLevel::None
this constant optimization still seems there...)note the
0x7ff0000000000000
representinginf
for f64.Unrelated to the problem at hand, but you could use the MSYS2 version now, with native Windows executables.
Trying to see if other LLVM programs are also affected: Numba has its own implementation (
int_power_impl
) and I can't seem to get clang to usellvm.powi
... what else could use it?Interestingly this works:
will be optimized to
I wonder if the documentation is wrong:
the listed exponent types are i32 with just 1 i16. And if llvm mistakenly treats
llvm.powi.f64.i16
asllvm.powi.f64.i32
by doing anzext
for the exponent to convert it toi32
, I guess theinf
output does make sense in that case...I guess we should change the
powi.f64.i16
topowi.f64.i32
to fix this bug. Not sure if this can be considered as a bug in LLVM because they saidso perhaps passing a type that does not match C type int is considered undefined behavior? not sure about this.
On msys2,
print_float(3.0 ** -1)
is also translated intocall void @print_float(double 0x7FF0000000000000)
, so the error seems to be also there.and interestingly this code
(when turnning on the optimization) gives the following ir, where no constant folding is done:
output ir
and runkernel gives
print_float: 0.0
regardless of what value we set toself.a
.running the above code on rv32g gives correct output.
And using
powi.f64.i32
also fixes the problem too. So I also think that maybe the documentation is wrong and somehow llvm did not complain about the use ofpowi.f64.i16
fori16
not being a C type int. I even tried to usepowi.f64.i5
and llvm just also gives garbage results silently.Use
powi.f64.i32
then.f4e9c2eb31
to28a759202e
28a759202e
to23b7f4ef18
Ok, use
llvm.powi.f64.i32
now and rebased on the current master branch