Replace ld.lld with nac3ld for non-host targets #292

Merged
sb10q merged 4 commits from occheung/nac3:nac3ld into master 2022-06-06 17:13:25 +08:00

Description

This PR adds the nac3ld module to NAC3. Non-host targets (i.e. ARM & RISC-V) are supported. A list of PRs are needed to make nac3ld functional.

#291, ARTIQ PR 1889

GOT/PLT prevention

ARM

By default, LLVM generates short branch veneer for branches. It can result in the lack of jump range when dealing with large binary. Since a short branch veneer only takes 1 instruction, it is not possible to find an inplace workaround that can reliabily jump to any address. With this patch, the long-calls LLVM feature is enabled to generate long branch veener. PLT is unnecessary in this case, as the range of a long branch veneer can cover the entire address space.

See patches regarding R_ARM_TARGET2 relocation & linkage demotion for GOT preventions.

RISC-V

RISC-V codegen always accounts for the possibility that the required address could be store in a very far away address. Therefore, PLT/GOT related relocations is constructed using 2 instructions:

  • Load PC (or 0) offset by upper immediate to a register
  • Offset the register with a lower 12-bits value, and potentially jump to (could be PLT) / laod from (could be GOT) / hold the resulting address.

Instead of performing the relocation truthfully, it does the following workarounds:

  • PLT: Instead of jumping to a PLT entry, directly jump to the desired address.
  • GOT: Instead of loading the GOT entry, just calculate the memory location of the value.

The same relocation is propagated to the runtime linker if it is not resolvable (i.e. symbols without defined value). The same workaround is performed in the mainline ARTIQ runtime linker patch to the NAC3 branch.

Test

The nac3devices example in mainline ARTIQ can be linked successfully by both the static linker (nac3ld) and the runtime linker in both Kasli & Kasli-SOC.

## Description This PR adds the `nac3ld` module to NAC3. Non-host targets (i.e. ARM & RISC-V) are supported. A list of PRs are needed to make nac3ld functional. ### Related PRs #291, [ARTIQ PR 1889](https://github.com/m-labs/artiq/pull/1899) ## GOT/PLT prevention ### ARM By default, LLVM generates short branch veneer for branches. It can result in the lack of jump range when dealing with large binary. Since a short branch veneer only takes 1 instruction, it is not possible to find an inplace workaround that can reliabily jump to any address. With this patch, the `long-calls` LLVM feature is enabled to generate long branch veener. PLT is unnecessary in this case, as the range of a long branch veneer can cover the entire address space. See patches regarding `R_ARM_TARGET2` relocation & linkage demotion for GOT preventions. ### RISC-V RISC-V codegen always accounts for the possibility that the required address could be store in a very far away address. Therefore, PLT/GOT related relocations is constructed using 2 instructions: - Load PC (or 0) offset by upper immediate to a register - Offset the register with a lower 12-bits value, and potentially jump to (could be PLT) / laod from (could be GOT) / hold the resulting address. Instead of performing the relocation truthfully, it does the following workarounds: - PLT: Instead of jumping to a PLT entry, directly jump to the desired address. - GOT: Instead of loading the GOT entry, just calculate the memory location of the value. The same relocation is propagated to the runtime linker if it is not resolvable (i.e. symbols without defined value). The same workaround is performed in the mainline ARTIQ runtime linker patch to the NAC3 branch. ## Test The nac3devices example in mainline ARTIQ can be linked successfully by both the static linker (nac3ld) and the runtime linker in both Kasli & Kasli-SOC.

We don't want to write to a file when using nac3ld - the code was previously written like this because (sadly) files are the only way to communicate with the regular linkers.

We don't want to write to a file when using nac3ld - the code was previously written like this because (sadly) files are the only way to communicate with the regular linkers.
occheung force-pushed nac3ld from da4504a208 to 80120a5229 2022-05-31 15:18:51 +08:00 Compare
occheung force-pushed nac3ld from 80120a5229 to 65e152dd66 2022-05-31 15:21:50 +08:00 Compare
Poster
Owner

65e152d.

Identical to da4504a & 80120a5, but repushed because of spelling and not merging properly.

compile_method_to_file used to be a sub-function of compile_method_to_mem. It leads to file write in compile_method_to_mem when linking with nac3ld.

This commit factors out the part which makes the module into a separate function compile_method, which takes a closure that completes the linking part (using ld.lld or nac3ld).

65e152d. Identical to da4504a & 80120a5, but repushed because of spelling and not merging properly. `compile_method_to_file` used to be a sub-function of `compile_method_to_mem`. It leads to file write in `compile_method_to_mem` when linking with nac3ld. This commit factors out the part which makes the module into a separate function `compile_method`, which takes a closure that completes the linking part (using ld.lld or nac3ld).
Poster
Owner

Performance:
Measured the time taken to run artiq_compile on the nac3devices.py example on mainline ARTIQ. Completed 100 trials for each result. The mean & standard deviation on RISC-V & ARM targets are tabulated below.

ld.lld RV32IMA RV32G CortexA9
Mean (s) 0.39056 0.40184 0.38955
Stdev (s) 0.02417 0.04277 0.01816
nac3ld RV32IMA RV32G CortexA9
Mean (s) 0.46504 0.44645 0.38136
Stdev (s) 0.15365 0.13773 0.02006
Performance: Measured the time taken to run `artiq_compile` on the `nac3devices.py` example on mainline ARTIQ. Completed 100 trials for each result. The mean & standard deviation on RISC-V & ARM targets are tabulated below. | ld.lld | RV32IMA | RV32G | CortexA9 | | --------- | -------- | -------- | -------- | | Mean (s) | 0.39056 | 0.40184 | 0.38955 | | Stdev (s) | 0.02417 | 0.04277 | 0.01816 | | nac3ld | RV32IMA | RV32G | CortexA9 | | --------- | -------- | -------- | -------- | | Mean (s) | 0.46504 | 0.44645 | 0.38136 | | Stdev (s) | 0.15365 | 0.13773 | 0.02006 |
occheung force-pushed nac3ld from 0ad389f52f to 50ed04b787 2022-06-06 15:05:53 +08:00 Compare
Poster
Owner

Replaced the unsafe code in dwarf.rs using ByteOrder. Performance of RISC-V targets improved.

nac3ld RV32IMA RV32G CortexA9
Mean (s) 0.37124 0.38281 0.38526
Stdev (s) 0.00310 0.02594 0.02493

Rebased to master after 0ad389f5.

Replaced the unsafe code in `dwarf.rs` using ByteOrder. Performance of RISC-V targets improved. | nac3ld | RV32IMA | RV32G | CortexA9 | | --------- | -------- | -------- | -------- | | Mean (s) | 0.37124 | 0.38281 | 0.38526 | | Stdev (s) | 0.00310 | 0.02594 | 0.02493 | Rebased to master after 0ad389f5.
sb10q merged commit 50ed04b787 into master 2022-06-06 17:13:25 +08:00
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#292
There is no content yet.