get rid of LLD #18

Closed
opened 2021-09-24 10:44:59 +08:00 by sb10q · 20 comments

Currently we are creating temporary files and calling ld.lld as a subprocess. This is not very clean and may also impact performance due to unnecessary filesystem access (on Windoze this also potentially goes through the antivirus).
I inherited this approach from the current ARTIQ compiler because it is known to work and simple to implement - not much of the infrastructure seems to be in place for using lld from Rust and documentation is scarce.
Still, the LLD documentation claims You can embed LLD to your program by linking against it and calling the linker’s entry point function lld::elf::link.

Currently we are creating temporary files and calling ld.lld as a subprocess. This is not very clean and may also impact performance due to unnecessary filesystem access (on Windoze this also potentially goes through the antivirus). I inherited this approach from the current ARTIQ compiler because it is known to work and simple to implement - not much of the infrastructure seems to be in place for using lld from Rust and documentation is scarce. Still, [the LLD documentation](https://lld.llvm.org/NewLLD.html) claims ``You can embed LLD to your program by linking against it and calling the linker’s entry point function lld::elf::link.``
ychenfo was assigned by sb10q 2021-09-24 10:44:59 +08:00
sb10q added the
low-priority
label 2021-10-03 17:06:57 +08:00
Poster
Owner
function defined here: https://github.com/llvm/llvm-project/blob/main/lld/ELF/Driver.cpp
Poster
Owner

It's a bit silly, we'd still have to create temporary files and pass command line arguments into the library function call... maybe LLVM would accept a patch that improves the situation...

lld even has zero support for reading anything from the standard input.

It's a bit silly, we'd still have to create temporary files and pass command line arguments into the library function call... maybe LLVM would accept a patch that improves the situation... lld even has zero support for reading anything from the standard input.
Poster
Owner

Alternatively we can perhaps link at the LLVM IR level with module.link_in_module():

  • gets rid of temporary files
  • gets rid of LLD
  • requires modifying the ARTIQ runtime kernel loader to accept the ELF object file directly from LLVM instead of a shared library (the very first version of ARTIQ did that and it worked fine, though apparently that object file format is not "standard")
  • impact on performance?
  • impact on LTO?
Alternatively we can perhaps link at the LLVM IR level with ``module.link_in_module()``: * gets rid of temporary files * gets rid of LLD * requires modifying the ARTIQ runtime kernel loader to accept the ELF object file directly from LLVM instead of a shared library (the very first version of ARTIQ did that and it worked fine, though apparently that object file format is not "standard") * impact on performance? * impact on LTO?
sb10q changed title from use lld as a library to get rid of LLD 2021-12-10 09:33:47 +08:00
Poster
Owner

Alternatively we can perhaps link at the LLVM IR level with module.link_in_module():

Seems we are going towards that.
The runtime could perhaps accept both ELF formats, it is sometimes useful to switch between legacy ARTIQ and NAC3 during debugging without reflashing the device.

> Alternatively we can perhaps link at the LLVM IR level with module.link_in_module(): Seems we are going towards that. The runtime could perhaps accept both ELF formats, it is sometimes useful to switch between legacy ARTIQ and NAC3 during debugging without reflashing the device.

An incomplete prototype to accept simple kernel objects (e.g. blinky).
It is probably better to add a PLT for symbols that might need rebinding in runtime.

An incomplete prototype to accept simple kernel objects (e.g. blinky). It is probably better to add a PLT for symbols that might need rebinding in runtime.

It's a bit silly, we'd still have to create temporary files and pass command line arguments into the library function call... maybe LLVM would accept a patch that improves the situation...

lld even has zero support for reading anything from the standard input.

Maybe not. This is the function they use to read the files into memory buffers: b1d9136da1/lld/ELF/Driver.cpp (L210)

void LinkerDriver::addFile(StringRef path, bool withLOption) {
  using namespace sys::fs;

  Optional<MemoryBufferRef> buffer = readFile(path);
  if (!buffer.hasValue())
    return;
  MemoryBufferRef mbref = *buffer;

Maybe we could somehow modify lld to pass memory buffers into it? It might be easier than modifying our runtime to accept object files instead of ELF...

> It's a bit silly, we'd still have to create temporary files and pass command line arguments into the library function call... maybe LLVM would accept a patch that improves the situation... > > lld even has zero support for reading anything from the standard input. Maybe not. This is the function they use to read the files into memory buffers: https://github.com/llvm/llvm-project/blob/b1d9136da176152f9c2767b88faef6ee8b400b5b/lld/ELF/Driver.cpp#L210 ```cpp void LinkerDriver::addFile(StringRef path, bool withLOption) { using namespace sys::fs; Optional<MemoryBufferRef> buffer = readFile(path); if (!buffer.hasValue()) return; MemoryBufferRef mbref = *buffer; ``` Maybe we could somehow modify lld to pass memory buffers into it? It might be easier than modifying our runtime to accept object files instead of ELF...
Poster
Owner

@occheung What is the issue with more complex kernels?

@pca006132 I don't see a good justification for adding all this LLD bloat; plus many parts of LLD will need modifications to support specifying the linker script and other options. What function does LLD really perform in our case? We are only "linking" one object file.

@occheung What is the issue with more complex kernels? @pca006132 I don't see a good justification for adding all this LLD bloat; plus many parts of LLD will need modifications to support specifying the linker script and other options. What function does LLD really perform in our case? We are only "linking" one object file.

https://github.com/occheung/artiq/tree/dyld-obj

Pushed the modifications of runtime to the dyld-obj branch of my artiq fork from the nac3 branch.
libdyld is practically rewritten while mostly keeping the original interfaces.

So far the memory layout of the loaded image is as such:

+---+------------------------------------------+ <- image starting addr
| 1 | PT_LOAD section header                   |
| 2 | PT_GNU_EH_FRAME sec. head.               |
| 3 | Compacted sec. table for PROGBITS & RELA |
| 4 | All PROGBITS & RELA from object file     |
| 5 | .eh_frame_hdr                            |
+---+------------------------------------------+

The PT_LOAD, PT_GNU_EH_FRAME and .eh_Frame_hdr are there for llvm-libunwind to work properly.
The compacted section table is to allow RELA section sections to match its target PROGBITS section. Loading RELA into the image is probably unnecssary.

Tried raising an exception in the following code (no particular reason of choosing DMAError):

from artiq.experiment import *
from artiq.coredevice.core import Core
from artiq.coredevice.exceptions import *


@nac3
class NAC3Devices(EnvExperiment):
    core: KernelInvariant[Core]

    def build(self):
        self.setattr_device("core")

    @kernel
    def run(self):
        self.syscall_print()
        try:
            raise DMAError("error raised")
        except DMAError:
            raise

    @rpc
    def syscall_print(self):
        print("Hello World!")

    @rpc
    def syscall_err(self):
        print("Index Error!")

Got a traceback as the following:

WARNING:artiq.coredevice.comm_kernel:Mismatch between gateware (8.unknown.beta) and software (8.0.unknown.beta) versions
Hello World!
Core Device Traceback:
Traceback (most recent call first):
  File "nac3devices.py", line 17, column 14, in artiq_run_nac3devices.NAC3Devices.run.0
    raise DMAError("error raised")
     ^
  File "main", line 0, in .LBB1_7 (RA=+0x354)
    <unknown>
  File "main", line 0, in .LBB1_7 (RA=+0x410)
    <unknown>
artiq.coredevice.exceptions.DMAError(0): error raised

End of Core Device Traceback

Traceback (most recent call last):
  File "/nix/store/rdfbgldbpkqx1h1bsx7jrbbpn9nfgcxc-python3.9-artiq-8.0.unknown.beta/bin/.artiq_run-wrapped", line 9, in <module>
    sys.exit(main())
  File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 171, in main
    return run(with_file=True)
  File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 157, in run
    raise exn
  File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 152, in run
    exp_inst.run()
  File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/language/core.py", line 75, in run_on_core
    self.core.run(fake_method, args, kwargs)
  File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/coredevice/core.py", line 96, in run
    self.comm.serve(self.embedding_map, symbolizer)
  File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/coredevice/comm_kernel.py", line 790, in serve
    self._serve_exception(embedding_map, symbolizer)
  File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/coredevice/comm_kernel.py", line 772, in _serve_exception
    raise python_exn
artiq.coredevice.exceptions.DMAError: error raised

Note: DMA is still not supported as the PLT generation is not programmed in.

https://github.com/occheung/artiq/tree/dyld-obj Pushed the modifications of runtime to the dyld-obj branch of my artiq fork from the nac3 branch. libdyld is practically rewritten while mostly keeping the original interfaces. So far the memory layout of the loaded image is as such: ``` +---+------------------------------------------+ <- image starting addr | 1 | PT_LOAD section header | | 2 | PT_GNU_EH_FRAME sec. head. | | 3 | Compacted sec. table for PROGBITS & RELA | | 4 | All PROGBITS & RELA from object file | | 5 | .eh_frame_hdr | +---+------------------------------------------+ ``` The PT_LOAD, PT_GNU_EH_FRAME and .eh_Frame_hdr are there for llvm-libunwind to work properly. The compacted section table is to allow RELA section sections to match its target PROGBITS section. Loading RELA into the image is probably unnecssary. Tried raising an exception in the following code (no particular reason of choosing DMAError): ``` from artiq.experiment import * from artiq.coredevice.core import Core from artiq.coredevice.exceptions import * @nac3 class NAC3Devices(EnvExperiment): core: KernelInvariant[Core] def build(self): self.setattr_device("core") @kernel def run(self): self.syscall_print() try: raise DMAError("error raised") except DMAError: raise @rpc def syscall_print(self): print("Hello World!") @rpc def syscall_err(self): print("Index Error!") ``` Got a traceback as the following: ``` WARNING:artiq.coredevice.comm_kernel:Mismatch between gateware (8.unknown.beta) and software (8.0.unknown.beta) versions Hello World! Core Device Traceback: Traceback (most recent call first): File "nac3devices.py", line 17, column 14, in artiq_run_nac3devices.NAC3Devices.run.0 raise DMAError("error raised") ^ File "main", line 0, in .LBB1_7 (RA=+0x354) <unknown> File "main", line 0, in .LBB1_7 (RA=+0x410) <unknown> artiq.coredevice.exceptions.DMAError(0): error raised End of Core Device Traceback Traceback (most recent call last): File "/nix/store/rdfbgldbpkqx1h1bsx7jrbbpn9nfgcxc-python3.9-artiq-8.0.unknown.beta/bin/.artiq_run-wrapped", line 9, in <module> sys.exit(main()) File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 171, in main return run(with_file=True) File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 157, in run raise exn File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/frontend/artiq_run.py", line 152, in run exp_inst.run() File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/language/core.py", line 75, in run_on_core self.core.run(fake_method, args, kwargs) File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/coredevice/core.py", line 96, in run self.comm.serve(self.embedding_map, symbolizer) File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/coredevice/comm_kernel.py", line 790, in serve self._serve_exception(embedding_map, symbolizer) File "/nix/store/ks1ra1nis8qfsm1x8vwcssclcpfbqkl5-python3-3.9.6-env/lib/python3.9/site-packages/artiq/coredevice/comm_kernel.py", line 772, in _serve_exception raise python_exn artiq.coredevice.exceptions.DMAError: error raised ``` Note: DMA is still not supported as the PLT generation is not programmed in.

Also, .eh_frame and .eh_frame_hdr are in dwarf format. Maybe we should factor out the code for reading/writing dwarf (from libeh/dwarf.rs) and perhaps name it libdwarf like artiq-zynq.

So far I just make a copy in libdyld and modify as needed.

Also, .eh_frame and .eh_frame_hdr are in dwarf format. Maybe we should factor out the code for reading/writing dwarf (from `libeh/dwarf.rs`) and perhaps name it libdwarf like artiq-zynq. So far I just make a copy in libdyld and modify as needed.
Poster
Owner
  File "main", line 0, in .LBB1_7 (RA=+0x354)
    <unknown>
  File "main", line 0, in .LBB1_7 (RA=+0x410)
    <unknown>

Those look problematic. What is .LBB1_7 about? Maybe llvm-addr2line gets confused when reading the backtrace?

Note: DMA is still not supported as the PLT generation is not programmed in.

What is the connection between DMA and PLT?

``` File "main", line 0, in .LBB1_7 (RA=+0x354) <unknown> File "main", line 0, in .LBB1_7 (RA=+0x410) <unknown> ``` Those look problematic. What is ``.LBB1_7`` about? Maybe ``llvm-addr2line`` gets confused when reading the backtrace? ``` Note: DMA is still not supported as the PLT generation is not programmed in. ``` What is the connection between DMA and PLT?
Poster
Owner

Let's try to get this for prealpha - seems to be the only remaining issue on Windows (MSYS2 lld breaks when called by NAC3 for some reason, so installing it isn't a quick workaround), and supporting Windows would significantly increase the user base for testing.

Let's try to get this for prealpha - seems to be the only remaining issue on Windows (MSYS2 lld breaks when called by NAC3 for some reason, so installing it isn't a quick workaround), and supporting Windows would significantly increase the user base for testing.
occheung was assigned by sb10q 2022-03-24 22:44:50 +08:00
ychenfo was unassigned by sb10q 2022-03-24 22:44:51 +08:00
sb10q removed the
low-priority
label 2022-03-24 22:44:56 +08:00
sb10q added this to the Prealpha milestone 2022-03-24 22:45:11 +08:00

Regarding problem with lld on Windows: Can we compile lld in our LLVM build? Does that work?

Regarding problem with lld on Windows: Can we compile lld in our LLVM build? Does that work?
Poster
Owner

Maybe - but LLD is bloated anyway and we should ditch it. I don't see a good reason for having a linker when we have only one object file.

Maybe - but LLD is bloated anyway and we should ditch it. I don't see a good reason for having a linker when we have only one object file.
Poster
Owner

If the linker was small and well-written it would be acceptable, but LLD is neither. And GNU LD is even worse.

If the linker was small and well-written it would be acceptable, but LLD is neither. And GNU LD is even worse.
Poster
Owner

LLD is bloated

And especially so on Windows where:

  1. it would link LLVM statically, resulting in 10s of MB or even 100MB of bloat in the executable.
  2. it might make full copies of the executable (lld.exe, ld.lld.exe, wasm-ld.exe...) due to poor symlink support. Sure, we can delete them, but it's yet another hack.

You could fit an entire Linux distro in the space taken by LLD.

> LLD is bloated And especially so on Windows where: 1. it would link LLVM statically, resulting in 10s of MB or even 100MB of bloat in the executable. 2. it might make full copies of the executable (lld.exe, ld.lld.exe, wasm-ld.exe...) due to poor symlink support. Sure, we can delete them, but it's yet another hack. You could fit an entire Linux distro in the space taken by LLD.
Poster
Owner

This is exactly what happens:

Z:\home\sb\lld-13.0.1.src\build\bin>dir
Volume in drive Z has no label.
Volume Serial Number is 0000-0000

Directory of Z:\home\sb\lld-13.0.1.src\build\bin

 3/27/2022   8:44 AM  <DIR>         .
 3/27/2022   8:43 AM  <DIR>         ..
 3/27/2022   8:44 AM    70,069,451  ld.lld.exe
 3/27/2022   8:44 AM    70,069,451  ld64.lld.darwinnew.exe
 3/27/2022   8:44 AM    70,069,451  ld64.lld.darwinold.exe
 3/27/2022   8:44 AM    70,069,451  ld64.lld.exe
 3/27/2022   8:44 AM    70,069,451  lld.exe
 3/27/2022   8:44 AM    70,069,451  lld-link.exe
 3/27/2022   8:44 AM    70,069,451  wasm-ld.exe
       7 files              490,486,157 bytes
       2 directories  8,029,806,133,248 bytes free

And this is with MachO support disabled (as it didn't compile), which I had to do by editing the source, since it doesn't even have options to disable this sort of feature.

I'll see if simply copying ld.lld.exe actually works, maybe 70MB of additional bloat and questionable LLVM code is an acceptable temporary workaround.

This is exactly what happens: ``` Z:\home\sb\lld-13.0.1.src\build\bin>dir Volume in drive Z has no label. Volume Serial Number is 0000-0000 Directory of Z:\home\sb\lld-13.0.1.src\build\bin 3/27/2022 8:44 AM <DIR> . 3/27/2022 8:43 AM <DIR> .. 3/27/2022 8:44 AM 70,069,451 ld.lld.exe 3/27/2022 8:44 AM 70,069,451 ld64.lld.darwinnew.exe 3/27/2022 8:44 AM 70,069,451 ld64.lld.darwinold.exe 3/27/2022 8:44 AM 70,069,451 ld64.lld.exe 3/27/2022 8:44 AM 70,069,451 lld.exe 3/27/2022 8:44 AM 70,069,451 lld-link.exe 3/27/2022 8:44 AM 70,069,451 wasm-ld.exe 7 files 490,486,157 bytes 2 directories 8,029,806,133,248 bytes free ``` And this is with MachO support disabled (as it didn't compile), which I had to do by editing the source, since it doesn't even have options to disable this sort of feature. I'll see if simply copying ``ld.lld.exe`` actually works, maybe 70MB of additional bloat and questionable LLVM code is an acceptable temporary workaround.
Poster
Owner

It doesn't...

# artiq_compile nac3devices.py
Traceback (most recent call last):
  File "C:\msys64\mingw64\bin\artiq_compile-script.py", line 33, in <module>
    sys.exit(load_entry_point('artiq==8.0b0', 'console_scripts', 'artiq_compile')())
  File "C:/msys64/home/wfvm/artiq/artiq/frontend/artiq_compile.py", line 66, in main
    exp_inst.core.compile(exp_inst.run, [], {}, embedding_map, file_output=output)
  File "C:/msys64/home/wfvm/artiq/artiq/coredevice/core.py", line 80, in compile
    self.compiler.compile_method_to_file(obj, name, args, file_output, embedding_map)
nac3artiq.CompileError: linker returned non-zero status code
It doesn't... ``` # artiq_compile nac3devices.py Traceback (most recent call last): File "C:\msys64\mingw64\bin\artiq_compile-script.py", line 33, in <module> sys.exit(load_entry_point('artiq==8.0b0', 'console_scripts', 'artiq_compile')()) File "C:/msys64/home/wfvm/artiq/artiq/frontend/artiq_compile.py", line 66, in main exp_inst.core.compile(exp_inst.run, [], {}, embedding_map, file_output=output) File "C:/msys64/home/wfvm/artiq/artiq/coredevice/core.py", line 80, in compile self.compiler.compile_method_to_file(obj, name, args, file_output, embedding_map) nac3artiq.CompileError: linker returned non-zero status code ```
Poster
Owner

Just needs .exe suffix.

Just needs .exe suffix.
sb10q removed this from the Prealpha milestone 2022-03-27 18:43:56 +08:00
Poster
Owner

We probably want to get rid of the PLT which is slowing things down (such as RTIO speed), and the GOT which is unnecessary. Since the ARTIQ runtime only loads one copy of the library, direct .text and .data relocations have no disadvantage.

The legacy compiler also uses the PLT, so this will be a runtime speed improvement over it.

We probably want to get rid of the PLT which is slowing things down (such as RTIO speed), and the GOT which is unnecessary. Since the ARTIQ runtime only loads one copy of the library, direct ``.text`` and ``.data`` relocations have no disadvantage. The legacy compiler also uses the PLT, so this will be a runtime speed improvement over it.
Poster
Owner
https://git.m-labs.hk/M-Labs/nac3/pulls/292
sb10q closed this issue 2022-06-06 17:13:52 +08:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#18
There is no content yet.