Fix codegen issues for 32-bit targets #477

Merged
sb10q merged 1 commits from fix/32bit-codegen-issues into master 2024-08-17 17:37:21 +08:00
Collaborator
  • Reimplementation of #468
    • Adds --mx32 and --m64 flags to standalone for emulating 32-bit and enforcing 64-bit size_t types respectively
    • Adds 32-bit execution tests to check_demo.sh
  • Update format specifiers for output_exception (Emitted by clang warnings)
  • Add cslice32 to demo.c to better reflect actual data type of list and str
  • Determine size_t using TargetMachine passed into LLVM (if not specified in command-line)
  • Add support for running with 32-bit size_t on 64-bit targets.

The primary cause of codegen errors is due to nac3core previously assuming that the bit width of size_t types and the bit width of the native word of the target platform (which is the type of sizeof expressions) is the same. This is true with nac3standalone (where size_t is defaulted to the host environment, i.e. 64-bit) and nac3artiq (where size_t is defaulted to the cross-compilation target, i.e. 32-bit). However, with the introduction of 32-bit size_t tests on 64-bit platform, there are now mismatches between this. As such, casts were added to promote size_t to the target platform word size where needed.

Fixes #469 - #476; Supersedes #468.

- Reimplementation of #468 - Adds `--mx32` and `--m64` flags to standalone for emulating 32-bit and enforcing 64-bit `size_t` types respectively - Adds 32-bit execution tests to `check_demo.sh` - Update format specifiers for `output_exception` (Emitted by clang warnings) - Add `cslice32` to `demo.c` to better reflect actual data type of `list` and `str` - Determine `size_t` using `TargetMachine` passed into LLVM (if not specified in command-line) - Add support for running with 32-bit `size_t` on 64-bit targets. The primary cause of codegen errors is due to `nac3core` previously assuming that the bit width of `size_t` types and the bit width of the native word of the target platform (which is the type of `sizeof` expressions) is the same. This is true with `nac3standalone` (where `size_t` is defaulted to the host environment, i.e. 64-bit) and `nac3artiq` (where `size_t` is defaulted to the cross-compilation target, i.e. 32-bit). However, with the introduction of 32-bit `size_t` tests on 64-bit platform, there are now mismatches between this. As such, casts were added to promote `size_t` to the target platform word size where needed. Fixes #469 - #476; Supersedes #468.
sb10q reviewed 2024-07-22 08:27:51 +08:00
@ -62,16 +62,21 @@ void output_asciiart(int32_t x) {
}
}
struct cslice32 {
Owner

Uh, why?

Uh, why?
Collaborator

I think this struct cslice is made to be identical to https://docs.rs/cslice/0.3.0/src/cslice/lib.rs.html#33-37:

#[repr(C)]
#[derive(Clone, Copy)]
pub struct CSlice<'a, T> {
    base: *const T,
    len: usize, // <--- size_t
    marker: PhantomData<&'a ()>
}

Extra info: struct Exception depends on cslice like so: b0d2705c38/artiq/firmware/libeh/eh_artiq.rs (L1C1-L17C1)

// ARTIQ Exception struct declaration
use cslice::CSlice;

// Note: CSlice within an exception may not be actual cslice, they may be strings that exist only
// in the host. If the length == usize:MAX, the pointer is actually a string key in the host.
#[repr(C)]
#[derive(Copy, Clone)]
pub struct Exception<'a> {
    pub id:       u32,
    pub file:     CSlice<'a, u8>, // <--- size_t
    pub line:     u32,
    pub column:   u32,
    pub function: CSlice<'a, u8>, // <--- size_t
    pub message:  CSlice<'a, u8>, // <--- size_t
    pub param:    [i64; 3]
}
I *think* this `struct cslice` is made to be identical to https://docs.rs/cslice/0.3.0/src/cslice/lib.rs.html#33-37: ```rust #[repr(C)] #[derive(Clone, Copy)] pub struct CSlice<'a, T> { base: *const T, len: usize, // <--- size_t marker: PhantomData<&'a ()> } ``` Extra info: `struct Exception` depends on `cslice` like so: https://github.com/m-labs/artiq/blob/b0d2705c385f64b6e6711c1726cd9178f40b598e/artiq/firmware/libeh/eh_artiq.rs#L1C1-L17C1 ```rust // ARTIQ Exception struct declaration use cslice::CSlice; // Note: CSlice within an exception may not be actual cslice, they may be strings that exist only // in the host. If the length == usize:MAX, the pointer is actually a string key in the host. #[repr(C)] #[derive(Copy, Clone)] pub struct Exception<'a> { pub id: u32, pub file: CSlice<'a, u8>, // <--- size_t pub line: u32, pub column: u32, pub function: CSlice<'a, u8>, // <--- size_t pub message: CSlice<'a, u8>, // <--- size_t pub param: [i64; 3] } ```
Owner

Sure, but with this PR introduces two cslices, one with uint32 and one with usize. Looks messy if not incorrect.

Sure, but with this PR introduces two cslices, one with uint32 and one with usize. Looks messy if not incorrect.
derppening marked this conversation as resolved
sb10q reviewed 2024-07-22 08:29:02 +08:00
@ -79,0 +84,4 @@
/// Generates code by setting `size_t` to 32 bits.
#[arg(long)]
mx32: bool,
Owner

Why the x?

Why the ``x``?
Author
Collaborator

This has been removed, but for clarity, GCC has an -mx32 flag for x86 targets which:

The -mx32 option sets "int", "long", and pointer types to 32 bits, and generates code for the x86-64 architecture.

Which better matches what we were doing back then, generating using 32-bit size_t but targeting 64-bit archs.

This has been removed, but for clarity, GCC has an `-mx32` flag for x86 targets which: ``` The -mx32 option sets "int", "long", and pointer types to 32 bits, and generates code for the x86-64 architecture. ``` Which better matches what we were doing back then, generating using 32-bit `size_t` but targeting 64-bit archs.
Owner

Ack.
We don't have to (and should not) implement every GCC option. And compiling for 686 is a better match to what is happening when we target the (current) ARTIQ devices.

Ack. We don't have to (and should not) implement every GCC option. And compiling for 686 is a better match to what is happening when we target the (current) ARTIQ devices.
derppening marked this conversation as resolved
sb10q reviewed 2024-07-22 08:34:46 +08:00
@ -277,0 +300,4 @@
(false, false) => None,
(false, true) => Some(32),
(true, false) => Some(64),
(true, true) => {
Owner

Quite messy. And also this isn't what clang -m32 does - the latter selects 32-bit x86 ISA, so there is no need to name such flags in the same way. The previous CLI with -s was better.

Or just use the existing --target and build a complete 32-bit binary, this way we don't have non-standard edge cases with 32-bit size_t on 64-bit.

Quite messy. And also this isn't what ``clang -m32`` does - the latter selects 32-bit x86 ISA, so there is no need to name such flags in the same way. The previous CLI with ``-s`` was better. Or just use the existing ``--target`` and build a complete 32-bit binary, this way we don't have non-standard edge cases with 32-bit size_t on 64-bit.
derppening marked this conversation as resolved
derppening force-pushed fix/32bit-codegen-issues from fe43ec00fa to 7f07017fc6 2024-07-25 16:01:53 +08:00 Compare
sb10q reviewed 2024-07-25 21:07:51 +08:00
flake.nix Outdated
@ -149,1 +149,4 @@
buildInputs = with pkgs; [
pkgsi686Linux.glibc
pkgsi686Linux.libgcc
pkgsi686Linux.binutils
Owner

Are all those three needed? My code was just an unrefined proof of concept.

Are all those three needed? My code was just an unrefined proof of concept.
Author
Collaborator

Updated to remove these packages.

Updated to remove these packages.
derppening marked this conversation as resolved
sb10q reviewed 2024-07-25 21:09:42 +08:00
@ -26,0 +21,4 @@
./interpret_demo.py "$demo" > interpreted.log
echo "...... Trying NAC3's 32-bit code generator output"
./run_demo.sh -m32 --out run_32.log "${nac3args[@]}" "$demo"
Owner

Won't this break the automatic tests? They don't have the 686 packages unless the flake is updated.

Won't this break the automatic tests? They don't have the 686 packages unless the flake is updated.
Author
Collaborator

nix build -L .#packages.x86_64-linux.nac3artiq works with only (pkgs.wrapClangMulti llvmPackages_14.clang) changed. I don't actually know why though.

`nix build -L .#packages.x86_64-linux.nac3artiq` works with only `(pkgs.wrapClangMulti llvmPackages_14.clang)` changed. I don't actually know why though.
sb10q reviewed 2024-07-25 21:10:15 +08:00
@ -65,3 +65,3 @@
struct cslice {
void *data;
usize len;
uint32_t len;
Owner

Is this correct?

Is this correct?
Author
Collaborator

Yes, both string slices and list slices use uint32_t as its length.

Yes, both string slices and list slices use uint32_t as its length.
Owner
This contradicts https://docs.rs/cslice/latest/cslice/
Author
Collaborator

This cslice struct is for list and string in NAC3 which are defined this way, not Rust's cslice. Could you propose a more suitable name for this struct?

This `cslice` struct is for `list` and `string` in NAC3 which are defined this way, not Rust's `cslice`. Could you propose a more suitable name for this struct?
Owner

Is there a reason not to follow the Rust cslice?

The firmware uses Rust cslice and this will become a problem when we start using it on 64-bit CPUs like Zynq MPSoC.

Is there a reason not to follow the Rust cslice? The firmware uses Rust cslice and this will become a problem when we start using it on 64-bit CPUs like Zynq MPSoC.
Author
Collaborator

Because list and str are currently defined as such in the compiler. I guess the solution moving forward would be the change the definition of list and str in the compiler to have its size as size_t instead?

Edit: Or maybe not, seems like list already uses 32-bit ints and my assumption is incorrect.

Because `list` and `str` are currently defined as such in the compiler. I guess the solution moving forward would be the change the definition of `list` and `str` in the compiler to have its size as `size_t` instead? Edit: Or maybe not, seems like `list` already uses 32-bit ints and my assumption is incorrect.
Owner

Yes, just like they are defined in any other serious compiler.

Yes, just like they are defined in any other serious compiler.
sb10q reviewed 2024-07-25 21:11:21 +08:00
@ -278,0 +304,4 @@
.get_bit_width()
});
if !matches!(size_t, 32 | 64) {
Owner

This isn't only about size_t.

This isn't only about size_t.
derppening marked this conversation as resolved
Author
Collaborator

Sorry for not marking this PR as WIP. Please give me tomorrow to clean up this patchset before submitting it for review.

Sorry for not marking this PR as WIP. Please give me tomorrow to clean up this patchset before submitting it for review.
derppening changed title from Fix codegen issues for 32-bit size_t on 64-bit targets to WIP: Fix codegen issues for 32-bit size_t on 64-bit targets 2024-07-25 21:12:00 +08:00
sb10q reviewed 2024-07-25 21:13:15 +08:00
@ -278,0 +306,4 @@
if !matches!(size_t, 32 | 64) {
let mut cmd = CommandLineArgs::command();
cmd.error(ErrorKind::InvalidValue, "Only -m32 or -m64 is supported").exit();
Owner

I don't think we need this -m flag at all. It looks superfluous, complex, redundant, and error prone. Just use -target and then get the information such as the bitwidth of size_t by querying the LLVM target object.

I don't think we need this -m flag at all. It looks superfluous, complex, redundant, and error prone. Just use ``-target`` and then get the information such as the bitwidth of size_t by querying the LLVM target object.
Owner

It's also not only about the bitwidth of size_t. The code suggests it is.

It's also not only about the bitwidth of size_t. The code suggests it is.
derppening marked this conversation as resolved
derppening force-pushed fix/32bit-codegen-issues from 7f07017fc6 to 3468f30985 2024-07-26 12:14:10 +08:00 Compare
derppening changed title from WIP: Fix codegen issues for 32-bit size_t on 64-bit targets to WIP: Fix codegen issues for 32-bit targets 2024-07-26 12:14:30 +08:00
Author
Collaborator

Only issue remaining is mandelbrot.py not working in 32-bit mode with -O0.

### Checking src/mandelbrot.py...
>>>>>> Running src/mandelbrot.py with the Python interpreter
...... Trying NAC3's 32-bit code generator output
--- interpreted.log     2024-07-26 12:15:26.196480612 +0800
+++ run_32.log  2024-07-26 12:15:26.413149164 +0800
@@ -6,7 +6,7 @@
        .......,,,,,,,,,,,,,,,,,,,,,--------:::;+MMhM :------,,,,,.............
       .....,,,,,,,,,,,,,,,,,,,,,---------::::ii+M $hi;::------,,,,,,..........
      ....,,,,,,,,,,,,,,,,,,,,,---------:::;;h      * +;::::----,,,,,,,........
-    ....,,,,,,,,,,,,,,,,,,,---------::;;;;ii+$      @hi;;:::::---,,,,,,.......
+     ...,,,,,,,,,,,,,,,,,,,---------::;;;;ii+$      @hi;;:::::---,,,,,,.......
     ..,,,,,,,,,,,,,,,,,,,-------:::;h+Mh++HHH$      *Hhh+i;;ihi:--,,,,,,,.....
    ..,,,,,,,,,,,,,,,,,------:::::;;i     #               HH M*M+:--,,,,,,,....
   ..,,,,,,,,,,,,,,,,---::::::::;;;i+h#                        h;::-,,,,,,,,...
@@ -26,7 +26,7 @@
   ..,,,,,,,,,,,,,,,,---::::::::;;;i+h#                        h;::-,,,,,,,,...
    ..,,,,,,,,,,,,,,,,,------:::::;;i     #               HH M*M+:--,,,,,,,....
     ..,,,,,,,,,,,,,,,,,,,-------:::;h+Mh++HHH$      *Hhh+i;;ihi:--,,,,,,,.....
-    ....,,,,,,,,,,,,,,,,,,,---------::;;;;ii+$      @hi;;:::::---,,,,,,.......
+     ...,,,,,,,,,,,,,,,,,,,---------::;;;;ii+$      @hi;;:::::---,,,,,,.......
      ....,,,,,,,,,,,,,,,,,,,,,---------:::;;h      * +;::::----,,,,,,,........
       .....,,,,,,,,,,,,,,,,,,,,,---------::::ii+M $hi;::------,,,,,,..........
        .......,,,,,,,,,,,,,,,,,,,,,--------:::;+MMhM :------,,,,,.............
Only issue remaining is `mandelbrot.py` not working in 32-bit mode with `-O0`. ``` ### Checking src/mandelbrot.py... >>>>>> Running src/mandelbrot.py with the Python interpreter ...... Trying NAC3's 32-bit code generator output --- interpreted.log 2024-07-26 12:15:26.196480612 +0800 +++ run_32.log 2024-07-26 12:15:26.413149164 +0800 @@ -6,7 +6,7 @@ .......,,,,,,,,,,,,,,,,,,,,,--------:::;+MMhM :------,,,,,............. .....,,,,,,,,,,,,,,,,,,,,,---------::::ii+M $hi;::------,,,,,,.......... ....,,,,,,,,,,,,,,,,,,,,,---------:::;;h * +;::::----,,,,,,,........ - ....,,,,,,,,,,,,,,,,,,,---------::;;;;ii+$ @hi;;:::::---,,,,,,....... + ...,,,,,,,,,,,,,,,,,,,---------::;;;;ii+$ @hi;;:::::---,,,,,,....... ..,,,,,,,,,,,,,,,,,,,-------:::;h+Mh++HHH$ *Hhh+i;;ihi:--,,,,,,,..... ..,,,,,,,,,,,,,,,,,------:::::;;i # HH M*M+:--,,,,,,,.... ..,,,,,,,,,,,,,,,,---::::::::;;;i+h# h;::-,,,,,,,,... @@ -26,7 +26,7 @@ ..,,,,,,,,,,,,,,,,---::::::::;;;i+h# h;::-,,,,,,,,... ..,,,,,,,,,,,,,,,,,------:::::;;i # HH M*M+:--,,,,,,,.... ..,,,,,,,,,,,,,,,,,,,-------:::;h+Mh++HHH$ *Hhh+i;;ihi:--,,,,,,,..... - ....,,,,,,,,,,,,,,,,,,,---------::;;;;ii+$ @hi;;:::::---,,,,,,....... + ...,,,,,,,,,,,,,,,,,,,---------::;;;;ii+$ @hi;;:::::---,,,,,,....... ....,,,,,,,,,,,,,,,,,,,,,---------:::;;h * +;::::----,,,,,,,........ .....,,,,,,,,,,,,,,,,,,,,,---------::::ii+M $hi;::------,,,,,,.......... .......,,,,,,,,,,,,,,,,,,,,,--------:::;+MMhM :------,,,,,............. ```
Owner

float32 vs. float64 precision?

float32 vs. float64 precision?
Author
Collaborator

float32 vs. float64 precision?

I don't think we use float32 anywhere in codegen. Might be worth investigating though.

> float32 vs. float64 precision? I don't think we use `float32` anywhere in codegen. Might be worth investigating though.
derppening force-pushed fix/32bit-codegen-issues from 3468f30985 to 0b28c19dd9 2024-07-26 12:26:00 +08:00 Compare
Author
Collaborator

Looks like a rounding error to me at if z_r*z_r + z_i*z_i > 4.0.

./check_demo.sh --debug -- -O0 src/mandelbrot.py
### Checking src/mandelbrot.py...
>>>>>> Running src/mandelbrot.py with the Python interpreter
...... Trying NAC3's 32-bit code generator output
--- interpreted.log     2024-07-26 12:37:48.904156742 +0800
+++ run_32.log  2024-07-26 12:37:49.094158169 +0800
@@ -7346,13 +7346,10 @@
 
 4        # x
 8        # y
 0        # i
 4.000000 # z_r*z_r + z_i*z_i
-False    # z_r*z_r + z_i*z_i > 4.0
-1
-5.230769
 True
 
 5
 8
 0
@@ -54592,13 +54589,10 @@
 
 4
 28
 0
 4.000000
-False
-1
-5.230769
 True
 
 5
 28
 0

Under normal circumstances, that evaluates to False and will continue one iteration, but compiling with i386 target instead evaluates that to True and terminates the loop early.

Looks like a rounding error to me at `if z_r*z_r + z_i*z_i > 4.0`. ``` ./check_demo.sh --debug -- -O0 src/mandelbrot.py ### Checking src/mandelbrot.py... >>>>>> Running src/mandelbrot.py with the Python interpreter ...... Trying NAC3's 32-bit code generator output --- interpreted.log 2024-07-26 12:37:48.904156742 +0800 +++ run_32.log 2024-07-26 12:37:49.094158169 +0800 @@ -7346,13 +7346,10 @@ 4 # x 8 # y 0 # i 4.000000 # z_r*z_r + z_i*z_i -False # z_r*z_r + z_i*z_i > 4.0 -1 -5.230769 True 5 8 0 @@ -54592,13 +54589,10 @@ 4 28 0 4.000000 -False -1 -5.230769 True 5 28 0 ``` Under normal circumstances, that evaluates to `False` and will continue one iteration, but compiling with `i386` target instead evaluates that to `True` and terminates the loop early.
Owner

Note that the x87 FPU internally has higher-precision registers with 80 bits. Might have to do with that perhaps. I think use of the higher precision can be disabled.

Note that the x87 FPU internally has higher-precision registers with 80 bits. Might have to do with that perhaps. I think use of the higher precision can be disabled.
Author
Collaborator

Note that the x87 FPU internally has higher-precision registers with 80 bits. Might have to do with that perhaps. I think use of the higher precision can be disabled.

Works if I use --mcpu=pentium4 to compile the module. Should I set that (or another CPU model supporting SSE) or use --target-features sse4.1?

> Note that the x87 FPU internally has higher-precision registers with 80 bits. Might have to do with that perhaps. I think use of the higher precision can be disabled. Works if I use `--mcpu=pentium4` to compile the module. Should I set that (or another CPU model supporting SSE) or use `--target-features sse4.1`?
Owner

The second option sounds more specific and therefore better.

The second option sounds more specific and therefore better.
derppening force-pushed fix/32bit-codegen-issues from 0b28c19dd9 to f74e9de682 2024-07-26 13:31:44 +08:00 Compare
derppening force-pushed fix/32bit-codegen-issues from f74e9de682 to b655d2dd39 2024-07-26 13:33:23 +08:00 Compare
derppening changed title from WIP: Fix codegen issues for 32-bit targets to Fix codegen issues for 32-bit targets 2024-07-26 13:33:32 +08:00
derppening requested review from sb10q 2024-07-26 13:33:39 +08:00
derppening force-pushed fix/32bit-codegen-issues from b655d2dd39 to 8c5ba37d09 2024-07-26 13:36:04 +08:00 Compare
sb10q reviewed 2024-07-27 21:53:31 +08:00
@ -12,3 +12,3 @@
case "$1" in
--help)
echo "Usage: run_demo.sh [--help] [--out OUTFILE] [--lli] [--debug] -- [NAC3ARGS...]"
echo "Usage: run_demo.sh [--help] [--out OUTFILE] [--debug] [-m32] -- [NAC3ARGS...]"
Owner

call it -i386
-m32 is generic, this one is not.

call it ``-i386`` ``-m32`` is generic, this one is not.
Owner

This is generally an improvement over a pretty terrible situation, so merging now. The two issues still need fixing.

This is generally an improvement over a pretty terrible situation, so merging now. The two issues still need fixing.
sb10q merged commit 8c5ba37d09 into master 2024-07-27 21:55:36 +08:00
sb10q deleted branch fix/32bit-codegen-issues 2024-07-27 21:55:37 +08:00
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#477
No description provided.