Implement core_log and rtio_log to ARTIQ #488

Merged
sb10q merged 8 commits from feature/core-rtio-log into master 2024-08-13 15:19:03 +08:00
Collaborator

I had to rollback a lot of script changes to check_demo.sh because it breaks how custom arguments are passed into NAC3, including regressions to the SSE problem encountered with mandelbrot.py.

Fixes #297.

I had to rollback a lot of script changes to `check_demo.sh` because it breaks how custom arguments are passed into NAC3, including regressions to the SSE problem encountered with `mandelbrot.py`. Fixes #297.
derppening added 1 commit 2024-08-12 20:49:50 +08:00
derppening requested review from sb10q 2024-08-12 20:50:11 +08:00
sb10q reviewed 2024-08-12 20:55:18 +08:00
@ -60,3 +61,3 @@
clang -o demo module.o demo.o $DEMO_LINALG_STUB -lm -Wl,--no-warn-search-mismatch
else
$nac3standalone --triple i686-unknown-linux-gnu "${nac3args[@]}"
$nac3standalone --triple i686-unknown-linux-gnu --target-features +sse2 "${nac3args[@]}"
Owner

@abdul124 I thought you said this sse2 option was no longer required when using i686 instead of i386?

@abdul124 I thought you said this sse2 option was no longer required when using i686 instead of i386?
Collaborator

@abdul124 I thought you said this sse2 option was no longer required when using i686 instead of i386?

sse2 should be enforced by default on rustc i686 target https://github.com/rust-lang/rust/issues/82435#issuecomment-783939789

> @abdul124 I thought you said this sse2 option was no longer required when using i686 instead of i386? sse2 should be enforced by default on rustc i686 target https://github.com/rust-lang/rust/issues/82435#issuecomment-783939789
Owner

Okay, but that comment is about Rust, not LLVM. nac3 is only using the latter here.

Okay, but that comment is about Rust, not LLVM. nac3 is only using the latter here.
Collaborator

Okay, but that comment is about Rust, not LLVM. nac3 is only using the latter here.

LLVM does interpret "i686" as implying sse2 for most part (Debian LLVM requires patch though). This commit 56d3ad9d23 describes the LLVM situation for sse2 support on LLVM i686 target.

> Okay, but that comment is about Rust, not LLVM. nac3 is only using the latter here. LLVM does interpret "i686" as implying sse2 for most part (Debian LLVM requires patch though). This commit https://github.com/google/boringssl/commit/56d3ad9d23bc130aa9404bfdd1957fe81b3ba498 describes the LLVM situation for sse2 support on LLVM i686 target.
Owner

So the present change should be reverted i.e. there is no need to add --target-features +sse2 ?

So the present change should be reverted i.e. there is no need to add ``--target-features +sse2`` ?
Collaborator

So the present change should be reverted i.e. there is no need to add --target-features +sse2 ?

Yes, since the feature is already implied by i686 target, --target-feature +sse2 is redundant.

> So the present change should be reverted i.e. there is no need to add ``--target-features +sse2`` ? Yes, since the feature is already implied by i686 target, ``--target-feature +sse2`` is redundant.
Author
Collaborator

I don't think any of that is true. The reason why BoringSSL requires SSE2 on i686 targets is because (from the commit message)

As far as I know, all our supported 32-bit x86 consumers require SSE2.

So it should be seen as OpenSSL's own project requirements, rather than a behavior of the compiler.

In fact, the quoted issue in LLVM's repo says that while SSE2 support is implied by i686, it is ultimately treated as an i386 target.

-m32 on Clang is i386 too.

When compiling without any flags, all below f-prefixed instructions are from the X87 ISA (List of X87 Instructions).

$ ../../target/debug/nac3standalone --triple i686-unknown-linux-gnu src/mandelbrot.py
$ llvm-objdump -Cd -M intel --no-show-raw-insn module.o
[...]
      14:       fldz
      16:       fld     dword ptr [ebx]
      1c:       fstp    qword ptr [esp + 32]
      20:       fst     qword ptr [esp + 16]
[...]
      30:       fstp    st(1)
      32:       mov     dword ptr [esp], 4294967295
      39:       fstp    qword ptr [esp + 8]
      3d:       call    0x3e <run+0x3e>
      42:       fld     qword ptr [esp + 24]
      46:       fld     qword ptr [esp + 8]
      4a:       faddp   st(1), st
[...]
      56:       fst     qword ptr [esp + 24]
      5a:       fmul    qword ptr [ebx]
      60:       fdiv    dword ptr [ebx]
      66:       fadd    qword ptr [ebx]
      6c:       fstp    qword ptr [esp + 8]
      70:       fld     qword ptr [esp + 16]
      74:       xor     edi, edi
      76:       fldz
[...]

With --target-features +sse2, all the above X87 instructions are replaced with their SSE2 counterparts (note the use of xmm registers):

$ ../../target/debug/nac3standalone --triple i686-unknown-linux-gnu --target-features +sse2 src/mandelbrot.py
$ llvm-objdump -Cd -M intel --no-show-raw-insn module.o
[...]
      14:       xorpd   xmm0, xmm0
      18:       movsd   xmm1, qword ptr [ebx]   # xmm1 = mem[0],zero
      20:       movsd   qword ptr [esp + 32], xmm1
      26:       movsd   xmm1, qword ptr [ebx]   # xmm1 = mem[0],zero
      2e:       movsd   qword ptr [esp + 24], xmm1
      34:       movsd   xmm1, qword ptr [ebx]   # xmm1 = mem[0],zero
      3c:       movsd   qword ptr [esp + 16], xmm1
      42:       movsd   xmm1, qword ptr [ebx]   # xmm1 = mem[0],zero
      4a:       movsd   qword ptr [esp + 72], xmm1
      50:       movsd   xmm1, qword ptr [ebx]   # xmm1 = mem[0],zero
      58:       movsd   qword ptr [esp + 64], xmm1
      5e:       movsd   xmm1, qword ptr [ebx]   # xmm1 = mem[0],zero
      66:       movsd   qword ptr [esp + 56], xmm1
      6c:       movsd   xmm1, qword ptr [ebx]   # xmm1 = mem[0],zero
      74:       movsd   qword ptr [esp + 48], xmm1
      7a:       movsd   xmm1, qword ptr [ebx]   # xmm1 = mem[0],zero
      82:       movsd   qword ptr [esp + 8], xmm1
[...]
      9c:       movsd   xmm0, qword ptr [esp + 40] # xmm0 = mem[0],zero
      a2:       addsd   xmm0, qword ptr [esp + 8]
[...]
      b2:       movsd   qword ptr [esp + 40], xmm0
      b8:       movapd  xmm7, xmm0
      bc:       mulsd   xmm7, qword ptr [esp + 32]
      c2:       divsd   xmm7, qword ptr [esp + 24]
      c8:       addsd   xmm7, qword ptr [esp + 16]
      ce:       xorpd   xmm0, xmm0
      d2:       xor     edi, edi
      d4:       movsd   qword ptr [esp + 80], xmm7
[...]

This also explains the test failures when removing --target-features +sse2 with src/mandelbrot.py.

I don't think any of that is true. The reason why BoringSSL requires SSE2 on i686 targets is because (from the commit message) > As far as I know, all our supported 32-bit x86 consumers require SSE2. So it should be seen as OpenSSL's own project requirements, rather than a behavior of the compiler. In fact, the [quoted issue in LLVM's repo](https://github.com/llvm/llvm-project/issues/61347) says that while SSE2 support is implied by `i686`, it is ultimately treated as an `i386` target. > -m32 on Clang is i386 too. When compiling without any flags, all below `f`-prefixed instructions are from the X87 ISA ([List of X87 Instructions](https://www2.math.uni-wuppertal.de/~fpf/Uebungen/GdR-SS02/opcode_f.html)). ``` $ ../../target/debug/nac3standalone --triple i686-unknown-linux-gnu src/mandelbrot.py $ llvm-objdump -Cd -M intel --no-show-raw-insn module.o [...] 14: fldz 16: fld dword ptr [ebx] 1c: fstp qword ptr [esp + 32] 20: fst qword ptr [esp + 16] [...] 30: fstp st(1) 32: mov dword ptr [esp], 4294967295 39: fstp qword ptr [esp + 8] 3d: call 0x3e <run+0x3e> 42: fld qword ptr [esp + 24] 46: fld qword ptr [esp + 8] 4a: faddp st(1), st [...] 56: fst qword ptr [esp + 24] 5a: fmul qword ptr [ebx] 60: fdiv dword ptr [ebx] 66: fadd qword ptr [ebx] 6c: fstp qword ptr [esp + 8] 70: fld qword ptr [esp + 16] 74: xor edi, edi 76: fldz [...] ``` With `--target-features +sse2`, all the above X87 instructions are replaced with their SSE2 counterparts (note the use of `xmm` registers): ``` $ ../../target/debug/nac3standalone --triple i686-unknown-linux-gnu --target-features +sse2 src/mandelbrot.py $ llvm-objdump -Cd -M intel --no-show-raw-insn module.o [...] 14: xorpd xmm0, xmm0 18: movsd xmm1, qword ptr [ebx] # xmm1 = mem[0],zero 20: movsd qword ptr [esp + 32], xmm1 26: movsd xmm1, qword ptr [ebx] # xmm1 = mem[0],zero 2e: movsd qword ptr [esp + 24], xmm1 34: movsd xmm1, qword ptr [ebx] # xmm1 = mem[0],zero 3c: movsd qword ptr [esp + 16], xmm1 42: movsd xmm1, qword ptr [ebx] # xmm1 = mem[0],zero 4a: movsd qword ptr [esp + 72], xmm1 50: movsd xmm1, qword ptr [ebx] # xmm1 = mem[0],zero 58: movsd qword ptr [esp + 64], xmm1 5e: movsd xmm1, qword ptr [ebx] # xmm1 = mem[0],zero 66: movsd qword ptr [esp + 56], xmm1 6c: movsd xmm1, qword ptr [ebx] # xmm1 = mem[0],zero 74: movsd qword ptr [esp + 48], xmm1 7a: movsd xmm1, qword ptr [ebx] # xmm1 = mem[0],zero 82: movsd qword ptr [esp + 8], xmm1 [...] 9c: movsd xmm0, qword ptr [esp + 40] # xmm0 = mem[0],zero a2: addsd xmm0, qword ptr [esp + 8] [...] b2: movsd qword ptr [esp + 40], xmm0 b8: movapd xmm7, xmm0 bc: mulsd xmm7, qword ptr [esp + 32] c2: divsd xmm7, qword ptr [esp + 24] c8: addsd xmm7, qword ptr [esp + 16] ce: xorpd xmm0, xmm0 d2: xor edi, edi d4: movsd qword ptr [esp + 80], xmm7 [...] ``` This also explains the test failures when removing `--target-features +sse2` with `src/mandelbrot.py`.
sb10q reviewed 2024-08-12 20:56:10 +08:00
@ -727,0 +740,4 @@
/// * `suffix` - String to terminate the printed string, if any.
/// * `as_repr` - Whether the `repr()` output of values instead of `str()`.
/// * `as_rtio` - Whether to print to `rtio_log` instead of `core_log`.
fn polymorphic_print<'ctx>(
Owner

I suppose this should make it straightforward now to implement the full print_rpc() with several arguments?

I suppose this should make it straightforward now to implement the full ``print_rpc()`` with several arguments?
Author
Collaborator

Not really. This only makes it straightforward to implement print_rpc/core_log/rtio_log with several arguments of the same type, as is required by the current implementation of varargs (Varargs are typechecked against its expected type, e.g. *int32 requires all arguments in the variadic position to be int32).

In order to implement these functions with variadic arguments of different types, we still need a type category which accepts any type (think TypeEnum::TAny), and that needs to be first implemented. The closest equivalent we have is *T (where T is a typevar), but this still requires all arguments to be of the same type T.

Not really. This only makes it straightforward to implement `print_rpc`/`core_log`/`rtio_log` with several arguments *of the same type*, as is required by the current implementation of varargs (Varargs are typechecked against its expected type, e.g. `*int32` requires all arguments in the variadic position to be `int32`). In order to implement these functions with variadic arguments of *different* types, we still need a type category which accepts any type (think `TypeEnum::TAny`), and that needs to be first implemented. The closest equivalent we have is `*T` (where `T` is a typevar), but this still requires all arguments to be of the same type `T`.
Owner

with several arguments of the same type, as is required by the current implementation of varargs

So the _log functions currently have the same limitation, correct?

> with several arguments of the same type, as is required by the current implementation of varargs So the _log functions currently have the same limitation, correct?
Author
Collaborator

_log functions are currently implemented with a single non-vararg parameter. I think we can merge this first before expanding support to homogeneous varargs?

`_log` functions are currently implemented with a single non-vararg parameter. I think we can merge this first before expanding support to homogeneous varargs?
Owner

Yes, just trying to understand where we are.

Yes, just trying to understand where we are.
sb10q reviewed 2024-08-12 20:57:51 +08:00
@ -1097,0 +1100,4 @@
/// Note that, similar to format constants in `<inttypes.h>`, these constants need to be prepended
/// with `%`.
#[must_use]
pub fn get_fprintf_format_constant<'ctx>(
Owner

Shouldn't this go into nac3artiq? I cannot think of a use for it in core.

Shouldn't this go into nac3artiq? I cannot think of a use for it in core.
Author
Collaborator

I have the opposite POV. It is because these format constants are not nac3artiq-specific which is why I put it in nac3core. My thought is that only ARTIQ-specific logic would be put in that module, and fprintf constants are "global" in the sense that it can be used by artiq or standalone alike.

I have the opposite POV. It is because these format constants are not `nac3artiq`-specific which is why I put it in `nac3core`. My thought is that only ARTIQ-specific logic would be put in that module, and `fprintf` constants are "global" in the sense that it can be used by artiq or standalone alike.
Owner

But only ARTIQ needs them in the forseeable future. nac3core is a compiler library, not a string formatter library. Keep it as simple as possible.

But only ARTIQ needs them in the forseeable future. nac3core is a compiler library, not a string formatter library. Keep it as simple as possible.
Author
Collaborator

Moved get_fprintf_format_constant to codegen.rs in nac3artiq.

Moved `get_fprintf_format_constant` to `codegen.rs` in `nac3artiq`.
derppening changed title from Implement `core_log` and `rtio_log` to ARTIQ to WIP: Implement `core_log` and `rtio_log` to ARTIQ 2024-08-13 12:24:49 +08:00
Author
Collaborator

Found a bug with the standalone scripts.

Found a bug with the standalone scripts.
derppening force-pushed feature/core-rtio-log from 91506f05f6 to 9b06640dec 2024-08-13 12:34:50 +08:00 Compare
derppening changed title from WIP: Implement `core_log` and `rtio_log` to ARTIQ to Implement `core_log` and `rtio_log` to ARTIQ 2024-08-13 12:34:57 +08:00
Author
Collaborator

Ready for re-review.

Ready for re-review.
derppening force-pushed feature/core-rtio-log from 9b06640dec to 274f493fe9 2024-08-13 15:18:00 +08:00 Compare
sb10q merged commit 6beff7a268 into master 2024-08-13 15:19:03 +08:00
sb10q deleted branch feature/core-rtio-log 2024-08-13 15:19:04 +08:00
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#488
No description provided.