Phaser: "Call parameter does not match function signature!" #276

Closed
opened 2022-04-26 17:26:19 +08:00 by mwojcik · 3 comments
...
!0 = !{!"branch_weights", i32 2000, i32 1}
Got an error: 
explicit panic
"Call parameter type does not match function signature!\n  %load6 = load %artiq.coredevice.phaser.Phaser.0*, %artiq.coredevice.phaser.Phaser.0** %gep5, align 8\n %artiq.coredevice.phaser.Phaser*  call void @artiq.coredevice.phaser.Phaser.write8.0(%artiq.coredevice.phaser.Phaser.0* %load6, i32 %add, i32 %rshift)\nCall parameter type does not match function signature!\n  %load12 = load %artiq.coredevice.phaser.Phaser.0*, %artiq.coredevice.phaser.Phaser.0** %gep5, align 8\n %artiq.coredevice.phaser.Phaser*  call void @artiq.coredevice.phaser.Phaser.write8.0(%artiq.coredevice.phaser.Phaser.0* %load12, i32 %add8, i32 %1)\n"
thread '<unnamed>' panicked at 'explicit panic', /build/source/nac3core/src/codegen/mod.rs:236:13
Got an error: explicit panic
thread '<unnamed>' panicked at 'tasks panicked', nac3core/src/codegen/mod.rs:192:13

^ or similar, with entire code dump.

I thought it's because of some i32 shenanigans (mentioned in #117) and I put int32 in phaser coredevice driver around all operators that use any arithmetic or logical operation, and even around constants. Sometimes even 2-3 times, around variables too. I thought I was getting somewhere as the error message would get smaller over time, but then it would get bigger again. Looks like I was a victim of #190 - running the same code gets me different results. Either way had to give up before I lost all my hair.

``` ... !0 = !{!"branch_weights", i32 2000, i32 1} Got an error: explicit panic "Call parameter type does not match function signature!\n %load6 = load %artiq.coredevice.phaser.Phaser.0*, %artiq.coredevice.phaser.Phaser.0** %gep5, align 8\n %artiq.coredevice.phaser.Phaser* call void @artiq.coredevice.phaser.Phaser.write8.0(%artiq.coredevice.phaser.Phaser.0* %load6, i32 %add, i32 %rshift)\nCall parameter type does not match function signature!\n %load12 = load %artiq.coredevice.phaser.Phaser.0*, %artiq.coredevice.phaser.Phaser.0** %gep5, align 8\n %artiq.coredevice.phaser.Phaser* call void @artiq.coredevice.phaser.Phaser.write8.0(%artiq.coredevice.phaser.Phaser.0* %load12, i32 %add8, i32 %1)\n" thread '<unnamed>' panicked at 'explicit panic', /build/source/nac3core/src/codegen/mod.rs:236:13 Got an error: explicit panic thread '<unnamed>' panicked at 'tasks panicked', nac3core/src/codegen/mod.rs:192:13 ``` ^ or similar, with entire code dump. I thought it's because of some ``i32`` shenanigans (mentioned in #117) and I put int32 in phaser coredevice driver around all operators that use any arithmetic or logical operation, and even around constants. Sometimes even 2-3 times, around variables too. I thought I was getting somewhere as the error message would get smaller over time, but then it would get bigger again. Looks like I was a victim of #190 - running the same code gets me different results. Either way had to give up before I lost all my hair.
Collaborator

Tried to reproduce this bug by making a call to set_duc_phase_mu, by self.phaser0.channel[0].set_duc_phase_mu(0) (guessed from the IR)

It seems that this is caused by how llvm handles opaque types of the same name when linking different modules as discussed here. Thinking of what is the best way to solve this since the llvm Context is not Send. Maybe using a single llvm Context with lock, and a global type cache for opaque types for codegen..?

A temporary workaround for this is surely to simply use a single thread to do codegen (then everything would be in a single llvm module).. And I am guessing that the random segfault in #275 is also caused by some inconsistency in different llvm modules since using a single thread seems to solve that problem, too.

Tried to reproduce this bug by making a call to `set_duc_phase_mu`, by `self.phaser0.channel[0].set_duc_phase_mu(0)` (guessed from the IR) It seems that this is caused by how llvm handles opaque types of the same name when linking different modules as discussed [here](https://groups.google.com/g/llvm-dev/c/H3Y4rsj8YSw). Thinking of what is the best way to solve this since the llvm `Context` is not `Send`. Maybe using a single llvm `Context` with lock, and a global type cache for opaque types for codegen..? A temporary workaround for this is surely to simply use a single thread to do codegen (then everything would be in a single llvm module).. And I am guessing that the random segfault in #275 is also caused by some inconsistency in different llvm modules since using a single thread seems to solve that problem, too.

Let's measure how much time parallel codegen really saves on a representative example. We keep having issues with it (e.g. non-deterministic behavior still remains), many parts ended up not being parallelized, it makes the codebase more complex and harder to work with, and nobody has time to work on those problems. If it's not much faster let's get rid of all the thread stuff.

Let's measure how much time parallel codegen really saves on a representative example. We keep having issues with it (e.g. non-deterministic behavior still remains), many parts ended up not being parallelized, it makes the codebase more complex and harder to work with, and nobody has time to work on those problems. If it's not much faster let's get rid of all the thread stuff.
sb10q added the
high-priority
label 2022-04-27 10:56:19 +08:00
sb10q added this to the Alpha milestone 2022-04-27 10:56:21 +08:00
ychenfo was assigned by sb10q 2022-04-27 10:56:49 +08:00
Collaborator

Let's measure how much time parallel codegen really saves on a representative example

For benchmarking the speed improvement of multithreading, I wrote a simple script to randomly generate code of a certain number of functions with certain number of lines per function, and then ran some timing tests for them. I have also just tried to compile nac3devices.py with different configuration of number of threads.

The result is as below (50 loc per function; compilation stops after the linking of all llvm modules generated by nac3)

#threads \ #functions 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
1 0.43426 0.71871 1.00324 1.27270 1.59457 1.92298 2.29284 2.66872 3.04991 3.4882 3.85791 4.2864 4.7402 5.1258 5.60339
2 0.38343 0.61268 0.81729 1.02287 1.24567 1.47682 1.73611 1.96663 2.20381 2.4653 2.71497 2.97901 3.2733 3.49916 3.76381
3 0.36986 0.56715 0.76161 0.94898 1.15193 1.3579 1.56997 1.78241 1.98548 2.2410 2.4305 2.64326 2.89478 3.06611 3.31408
4 0.36335 0.55004 0.73611 0.90098 1.09461 1.29332 1.50530 1.71033 1.90691 2.09377 2.3224 2.50326 2.74885 2.89775 3.12817
8 0.35300 0.52765 0.70544 0.86577 1.03324 1.22613 1.41654 1.59702 1.78105 1.96741 2.14730 2.32104 2.5655 2.71278 2.91608
16 0.34950 0.53884 0.70027 0.86249 1.03299 1.22663 1.41152 1.59118 1.78939 1.9538 2.11694 2.30541 2.5199 2.69922 2.9065
nac3devices.py 1 2 3 4 5 6 7 8
time 1.15624 1.04921 1.0739 1.03593 1.04692 1.07034 1.07185 1.07667

So it seems that the multithreading does make things noticeable faster.

Looking into detail of the get_llvm_type function, I also found that under single thread sometimes the llvm type of a non-polymorphic classes also got to evaluated for multiple times, resulting in different opaque types for the same class. Need to look more into this.

The detailed benchmark data is attached.

> Let's measure how much time parallel codegen really saves on a representative example For benchmarking the speed improvement of multithreading, I wrote a simple script to randomly generate code of a certain number of functions with certain number of lines per function, and then ran some timing tests for them. I have also just tried to compile `nac3devices.py` with different configuration of number of threads. The result is as below (50 loc per function; compilation stops after the linking of all llvm modules generated by nac3) | #threads \ #functions | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | 110 | 120 | 130 | 140 | 150 | | --------------------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | | 1 | 0.43426 | 0.71871 | 1.00324 | 1.27270 | 1.59457 | 1.92298 | 2.29284 | 2.66872 | 3.04991 | 3.4882 | 3.85791 | 4.2864 | 4.7402 | 5.1258 | 5.60339 | | 2 | 0.38343 | 0.61268 | 0.81729 | 1.02287 | 1.24567 | 1.47682 | 1.73611 | 1.96663 | 2.20381 | 2.4653 | 2.71497 | 2.97901 | 3.2733 | 3.49916 | 3.76381 | | 3 | 0.36986 | 0.56715 | 0.76161 | 0.94898 | 1.15193 | 1.3579 | 1.56997 | 1.78241 | 1.98548 | 2.2410 | 2.4305 | 2.64326 | 2.89478 | 3.06611 | 3.31408 | | 4 | 0.36335 | 0.55004 | 0.73611 | 0.90098 | 1.09461 | 1.29332 | 1.50530 | 1.71033 | 1.90691 | 2.09377 | 2.3224 | 2.50326 | 2.74885 | 2.89775 | 3.12817 | | 8 | 0.35300 | 0.52765 | 0.70544 | 0.86577 | 1.03324 | 1.22613 | 1.41654 | 1.59702 | 1.78105 | 1.96741 | 2.14730 | 2.32104 | 2.5655 | 2.71278 | 2.91608 | | 16 | 0.34950 | 0.53884 | 0.70027 | 0.86249 | 1.03299 | 1.22663 | 1.41152 | 1.59118 | 1.78939 | 1.9538 | 2.11694 | 2.30541 | 2.5199 | 2.69922 | 2.9065 | | nac3devices.py | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | | -------------- | ------- | ------- | ------ | ------- | ------- | ------- | ------- | ------- | | time | 1.15624 | 1.04921 | 1.0739 | 1.03593 | 1.04692 | 1.07034 | 1.07185 | 1.07667 | So it seems that the multithreading does make things noticeable faster. Looking into detail of the `get_llvm_type` function, I also found that under single thread sometimes the llvm type of a non-polymorphic classes also got to evaluated for multiple times, resulting in different opaque types for the same class. Need to look more into this. The detailed benchmark data is attached.
258 KiB
sb10q closed this issue 2022-07-04 14:39:34 +08:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#276
There is no content yet.