Random segmentation faults on running artiq_sinara_tester #275

Closed
opened 2022-04-26 12:14:12 +08:00 by mwojcik · 3 comments

It only happens on startup - regardless if the device is connected or not, and usually running it again helps (or running it few times causes the SIGSEGV to pop up again). I found it happens regardless of device_db file, but I still attached one just in case.

The output I get is this:

[spaqin@hera:~/m-labs/artiq_nac3]$ artiq_sinara_tester 
****** Sinara system tester ******

Segmentation fault (core dumped)

and that's it.

I attach the device_db I've been using for this and would like to also attach the core dump file but it's bigger than 4MB limit (just ask me if you need it, I'll send it over by other means)

It only happens on startup - regardless if the device is connected or not, and usually running it again helps (or running it few times causes the SIGSEGV to pop up again). I found it happens regardless of device_db file, but I still attached one just in case. The output I get is this: ``` [spaqin@hera:~/m-labs/artiq_nac3]$ artiq_sinara_tester ****** Sinara system tester ****** Segmentation fault (core dumped) ``` and that's it. I attach the device_db I've been using for this and would like to also attach the core dump file but it's bigger than 4MB limit (just ask me if you need it, I'll send it over by other means)
sb10q added the
high-priority
label 2022-04-27 10:56:25 +08:00
sb10q added this to the Alpha milestone 2022-04-27 10:56:27 +08:00
ychenfo was assigned by sb10q 2022-04-27 10:56:36 +08:00
Collaborator

It's a bit weird but this line of setting up the population pass seems to be the cause of the problem: 8e6e4d6715/nac3core/src/codegen/mod.rs (L224)

Simply commenting it out solves the problem, at the expense of lesser optimization.

Replacing this line with the following, as written in the inkwell example

passes.add_instruction_combining_pass();
passes.add_reassociate_pass();
passes.add_gvn_pass();
passes.add_cfg_simplification_pass();
passes.add_basic_alias_analysis_pass();
passes.add_promote_memory_to_register_pass();
passes.add_instruction_combining_pass();
passes.add_reassociate_pass();
passes.initialize();

also will not cause segfault, and will produce more optimized code which seems still not as optimized as the current output, though.

It's a bit weird but this line of setting up the population pass seems to be the cause of the problem: https://git.m-labs.hk/M-Labs/nac3/src/commit/8e6e4d6715bdfa4a5639068e4a78e89b714fb8d7/nac3core/src/codegen/mod.rs#L224 Simply commenting it out solves the problem, at the expense of lesser optimization. Replacing this line with the following, as written in the [inkwell example](https://github.com/TheDan64/inkwell/blob/7253f9250d8a5e5e83125f1f6f85ba8b80d12acc/examples/kaleidoscope/main.rs#L1268-L1277) ```rust passes.add_instruction_combining_pass(); passes.add_reassociate_pass(); passes.add_gvn_pass(); passes.add_cfg_simplification_pass(); passes.add_basic_alias_analysis_pass(); passes.add_promote_memory_to_register_pass(); passes.add_instruction_combining_pass(); passes.add_reassociate_pass(); passes.initialize(); ``` also will not cause segfault, and will produce more optimized code which seems still not as optimized as the current output, though.

Check the source code of populateFunctionPassManager in LLVM?

Check the source code of ``populateFunctionPassManager`` in LLVM?
sb10q removed the
high-priority
label 2022-07-04 18:09:17 +08:00
ychenfo was unassigned by sb10q 2022-07-04 18:09:22 +08:00
z78078 was assigned by sb10q 2022-07-04 18:09:22 +08:00
sb10q removed this from the Alpha milestone 2022-07-04 18:09:29 +08:00
Collaborator

As of e49b760e, no segfaults have been observed when running artiq_sinata_tester. I suspect the changes to improve multithreading in LLVM have addressed this issue.

As of e49b760e, no segfaults have been observed when running `artiq_sinata_tester`. I suspect the changes to improve multithreading in LLVM have addressed this issue.
sb10q closed this issue 2023-10-30 15:11:32 +08:00
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#275
There is no content yet.