SUServo: Variable with the same name doesn't get overwritten in some cases? #285

Closed
opened 2022-05-06 14:49:32 +08:00 by mwojcik · 9 comments

I had a hunch about the SUServo that maybe the compiler calculated constants differently (different order of operations, maybe) and did a quick check. Well, that wasn't the case, but I found something else.

Consider this function:
https://github.com/m-labs/artiq/blob/nac3/artiq/coredevice/suservo.py#L130

Within the context of set_config - simple address + value, nothing complicated.

Throw in some print_rpc to see what's happening with the variable:

        self.core.delay(5.*ms)
        print_rpc("writing:")
        print_rpc(addr)  # 1
        addr |= WE
        print_rpc(addr)  # 2
        value &= (1 << COEFF_WIDTH) - 1
        value |= (addr >> 8) << COEFF_WIDTH
        addr = (addr & 0xff)
        print_rpc(addr)  # 3
        addr = (self.channel << 8) | addr
        self.core.delay(5.*ms)
        rtio_output(addr, value)
        delay_mu(self.ref_period_mu)

Output:

writing:
3072  # 1
3072  # 2
3072  # 3

Change the names of the variables a little bit so the compiler doesn't get too lost:

        self.core.delay(5.*ms)
        print_rpc("writing:")
        print_rpc(addr)  # 1
        addr |= WE
        print_rpc(addr)  # 2
        value &= (1 << COEFF_WIDTH) - 1
        value |= (addr >> 8) << COEFF_WIDTH
        addr1 = addr & 0xff
        print_rpc(addr1)  # 3
        addr2 = (self.channel << 8) | addr1
        self.core.delay(5.*ms)
        rtio_output(addr2, value)

Output:

writing:
3072  # 1
3072  # 2
0  # 3

and suservo driver finally gets status back but still no output - status is not OK, maybe some other math is also not applied

Obviously, this caused config to be written in a wrong place, and prevented suservo from working. Looks like a problem with the compiler.

I had a hunch about the SUServo that maybe the compiler calculated constants differently (different order of operations, maybe) and did a quick check. Well, that wasn't the case, but I found something else. Consider this function: https://github.com/m-labs/artiq/blob/nac3/artiq/coredevice/suservo.py#L130 Within the context of ``set_config`` - simple address + value, nothing complicated. Throw in some print_rpc to see what's happening with the variable: ```python self.core.delay(5.*ms) print_rpc("writing:") print_rpc(addr) # 1 addr |= WE print_rpc(addr) # 2 value &= (1 << COEFF_WIDTH) - 1 value |= (addr >> 8) << COEFF_WIDTH addr = (addr & 0xff) print_rpc(addr) # 3 addr = (self.channel << 8) | addr self.core.delay(5.*ms) rtio_output(addr, value) delay_mu(self.ref_period_mu) ``` Output: ``` writing: 3072 # 1 3072 # 2 3072 # 3 ``` Change the names of the variables a little bit so the compiler doesn't get too lost: ```python self.core.delay(5.*ms) print_rpc("writing:") print_rpc(addr) # 1 addr |= WE print_rpc(addr) # 2 value &= (1 << COEFF_WIDTH) - 1 value |= (addr >> 8) << COEFF_WIDTH addr1 = addr & 0xff print_rpc(addr1) # 3 addr2 = (self.channel << 8) | addr1 self.core.delay(5.*ms) rtio_output(addr2, value) ``` Output: ``` writing: 3072 # 1 3072 # 2 0 # 3 ``` and suservo driver finally gets status back but still no output - status is not OK, maybe some other math is also not applied Obviously, this caused config to be written in a wrong place, and prevented suservo from working. Looks like a problem with the compiler.
sb10q added the
high-priority
label 2022-05-06 14:55:06 +08:00
sb10q added this to the Alpha milestone 2022-05-06 14:55:08 +08:00
Poster
Owner

Weirdly enough, if I put it as a standalone test like this:

from numpy import int32

from artiq.experiment import *
from artiq.coredevice.core import Core
from artiq.coredevice.ttl import TTLOut
from artiq.coredevice.suservo import SUServo, Channel as SUServoChannel

COEFF_WIDTH = 18
COEFF_DEPTH = 10 + 1
WE = 1 << COEFF_DEPTH + 1

@nac3
class SUServoDemo(EnvExperiment):
    core: KernelInvariant[Core]
    led0: KernelInvariant[TTLOut]
    suservo0: KernelInvariant[SUServo]
    suservo0_ch0: KernelInvariant[SUServoChannel]

    def build(self):
        self.setattr_device("core")
        self.setattr_device("led0")
        self.setattr_device("suservo0")
        self.setattr_device("suservo0_ch0")

    @kernel
    def write(self, addr: int32, value: int32):
        self.core.delay(5.*ms)

        print_rpc("writing:")
        print_rpc(addr)  # 1
        addr |= WE
        print_rpc(addr)  # 2
        value &= (1 << COEFF_WIDTH) - 1
        value |= (addr >> 8) << COEFF_WIDTH
        addr = addr & 0xff
        print_rpc(addr)  # 3
        self.core.delay(5.*ms)

    @kernel
    def run(self):
        addr = 3072
        value = 1
        self.write(addr, value)

it prints correct values:

writing:
3072  # 1
7168  # 2
0  # 3
Weirdly enough, if I put it as a standalone test like this: ``` python from numpy import int32 from artiq.experiment import * from artiq.coredevice.core import Core from artiq.coredevice.ttl import TTLOut from artiq.coredevice.suservo import SUServo, Channel as SUServoChannel COEFF_WIDTH = 18 COEFF_DEPTH = 10 + 1 WE = 1 << COEFF_DEPTH + 1 @nac3 class SUServoDemo(EnvExperiment): core: KernelInvariant[Core] led0: KernelInvariant[TTLOut] suservo0: KernelInvariant[SUServo] suservo0_ch0: KernelInvariant[SUServoChannel] def build(self): self.setattr_device("core") self.setattr_device("led0") self.setattr_device("suservo0") self.setattr_device("suservo0_ch0") @kernel def write(self, addr: int32, value: int32): self.core.delay(5.*ms) print_rpc("writing:") print_rpc(addr) # 1 addr |= WE print_rpc(addr) # 2 value &= (1 << COEFF_WIDTH) - 1 value |= (addr >> 8) << COEFF_WIDTH addr = addr & 0xff print_rpc(addr) # 3 self.core.delay(5.*ms) @kernel def run(self): addr = 3072 value = 1 self.write(addr, value) ``` it prints correct values: ``` writing: 3072 # 1 7168 # 2 0 # 3 ```
ychenfo was assigned by sb10q 2022-05-06 15:13:30 +08:00
Collaborator

I am not sure if I missed anything.. I tried to reproduce the problem using the following setup (artiq: nac3 branch, nac3: current master branch, device_db.py as attached)

[ychenfo@zeus:~/code/artiq/artiq/examples/nac3devices]$ git diff
diff --git a/artiq/coredevice/suservo.py b/artiq/coredevice/suservo.py
index 07655ba4..a48e6813 100644
--- a/artiq/coredevice/suservo.py
+++ b/artiq/coredevice/suservo.py
@@ -135,11 +135,14 @@ class SUServo:
         :param addr: Memory location address.
         :param value: Data to be written.
         """
+        print_rpc(addr)
         addr |= WE
+        print_rpc(addr)
         value &= (1 << COEFF_WIDTH) - 1
         value |= (addr >> 8) << COEFF_WIDTH
         addr = addr & 0xff
-        rtio_output((self.channel << 8) | addr, value)
+        print_rpc(addr)
+        # rtio_output((self.channel << 8) | addr, value)
         delay_mu(self.ref_period_mu)

     @kernel

[ychenfo@zeus:~/code/artiq/artiq/examples/nac3devices]$ cat ./assign_fail.py
from numpy import int32

from artiq.experiment import *
from artiq.coredevice.core import Core
from artiq.coredevice.ttl import TTLOut
from artiq.coredevice.suservo import SUServo, Channel as SUServoChannel

COEFF_WIDTH = 18
COEFF_DEPTH = 10 + 1
WE = 1 << COEFF_DEPTH + 1

@nac3
class SUServoDemo(EnvExperiment):
    core: KernelInvariant[Core]
    suservo0: KernelInvariant[SUServo]

    def build(self):
        self.setattr_device("core")
        self.setattr_device("suservo0")


    @kernel
    def run(self):
        self.suservo0.write(3072, 1)

[ychenfo@zeus:~/code/artiq/artiq/examples/nac3devices]$ artiq_run ./assign_fail.py
WARNING:artiq.coredevice.comm_kernel:Mismatch between gateware (7.8046.8a7af3f.beta) and software (8.0.unknown.beta) versions
3072
7168
0

and the output seems fine?

This is indeed almost like the standalone test above, is there any more context I need to put the code in?

I am not sure if I missed anything.. I tried to reproduce the problem using the following setup (artiq: [nac3 branch](https://github.com/m-labs/artiq/tree/fc95dffd0b879f5d49213971815e9d13ccd0f419), nac3: [current master branch](https://git.m-labs.hk/M-Labs/nac3/src/commit/a022005183c59c5d5c6b7375637ee09a17726aad), `device_db.py` as attached) ```diff [ychenfo@zeus:~/code/artiq/artiq/examples/nac3devices]$ git diff diff --git a/artiq/coredevice/suservo.py b/artiq/coredevice/suservo.py index 07655ba4..a48e6813 100644 --- a/artiq/coredevice/suservo.py +++ b/artiq/coredevice/suservo.py @@ -135,11 +135,14 @@ class SUServo: :param addr: Memory location address. :param value: Data to be written. """ + print_rpc(addr) addr |= WE + print_rpc(addr) value &= (1 << COEFF_WIDTH) - 1 value |= (addr >> 8) << COEFF_WIDTH addr = addr & 0xff - rtio_output((self.channel << 8) | addr, value) + print_rpc(addr) + # rtio_output((self.channel << 8) | addr, value) delay_mu(self.ref_period_mu) @kernel [ychenfo@zeus:~/code/artiq/artiq/examples/nac3devices]$ cat ./assign_fail.py from numpy import int32 from artiq.experiment import * from artiq.coredevice.core import Core from artiq.coredevice.ttl import TTLOut from artiq.coredevice.suservo import SUServo, Channel as SUServoChannel COEFF_WIDTH = 18 COEFF_DEPTH = 10 + 1 WE = 1 << COEFF_DEPTH + 1 @nac3 class SUServoDemo(EnvExperiment): core: KernelInvariant[Core] suservo0: KernelInvariant[SUServo] def build(self): self.setattr_device("core") self.setattr_device("suservo0") @kernel def run(self): self.suservo0.write(3072, 1) [ychenfo@zeus:~/code/artiq/artiq/examples/nac3devices]$ artiq_run ./assign_fail.py WARNING:artiq.coredevice.comm_kernel:Mismatch between gateware (7.8046.8a7af3f.beta) and software (8.0.unknown.beta) versions 3072 7168 0 ``` and the output seems fine? This is indeed almost like the [standalone test above](https://git.m-labs.hk/M-Labs/nac3/issues/285#issuecomment-5402), is there any more context I need to put the code in?

@ychenfo you probably need more of the original code to reproduce the problem. It seems that unrelated code changes create some corruption in the compiler.

@ychenfo you probably need more of the original code to reproduce the problem. It seems that unrelated code changes create some corruption in the compiler.
Collaborator

you probably need more of the original code to reproduce the problem. It seems that unrelated code changes create some corruption in the compiler.

Thanks! Read the issue more carefully and now I am able to reproduce this. Seems that in IR after constant propagation the value is already wrong, looking into this now.

> you probably need more of the original code to reproduce the problem. It seems that unrelated code changes create some corruption in the compiler. Thanks! Read the issue more carefully and now I am able to reproduce this. Seems that in IR after constant propagation the value is already wrong, looking into this now.
Collaborator

Turns out to be the problem with KernelInvariant handling, in set_config, passing 3072 instead of CONFIG_ADDR gives the correct output, looking into how to solve it in codegen.

Turns out to be the problem with KernelInvariant handling, in [`set_config`](https://github.com/m-labs/artiq/blob/nac3/artiq/coredevice/suservo.py#L158-L174), passing `3072` instead of `CONFIG_ADDR` gives the correct output, looking into how to solve it in codegen.
sb10q closed this issue 2022-05-14 16:00:38 +08:00

@ychenfo would you be able to test this (and the rest of SU-Servo) on a remote system?

@ychenfo would you be able to test this (and the rest of SU-Servo) on a remote system?
sb10q reopened this issue 2022-05-14 16:02:31 +08:00
Collaborator

would you be able to test this (and the rest of SU-Servo) on a remote system?

The fix of this bug of assigning variables is tested on the remote system.

For testing the rest of suservo, does it mean the tests related to suservo in artiq_sinara_tester? I think I must have missed something because the result I got by running artiq_sinara_tester -o suservos is the same before and after the fix? They are both the following:

****** Sinara system tester ******

Mismatch between gateware (7.8047.16393ef.beta) and software (8.8247.fc95dff.beta) versions
*** Testing SUServos.
Initializing modules...
suservo0
...done
Setting up SUServo channels...
...done
Enabling...
suservo0
...done
Each Sampler channel applies proportional amplitude control
on the respective Urukul0 (ADC 0-3) and Urukul1 (ADC 4-7, if
present) channels.
Frequency: 10 MHz, output power: about -9 dBm at 0 V and about -15 dBm at 1.5 V
Verify frequency and power behavior.
Press ENTER when done.
> would you be able to test this (and the rest of SU-Servo) on a remote system? The fix of this bug of assigning variables is tested on the remote system. For testing the rest of suservo, does it mean the tests related to suservo in `artiq_sinara_tester`? I think I must have missed something because the result I got by running `artiq_sinara_tester -o suservos` is the same before and after the fix? They are both the following: ``` ****** Sinara system tester ****** Mismatch between gateware (7.8047.16393ef.beta) and software (8.8247.fc95dff.beta) versions *** Testing SUServos. Initializing modules... suservo0 ...done Setting up SUServo channels... ...done Enabling... suservo0 ...done Each Sampler channel applies proportional amplitude control on the respective Urukul0 (ADC 0-3) and Urukul1 (ADC 4-7, if present) channels. Frequency: 10 MHz, output power: about -9 dBm at 0 V and about -15 dBm at 1.5 V Verify frequency and power behavior. Press ENTER when done. ```
Poster
Owner

Hey, yes that behavior is exactly what I experienced - so the output is correct. Affected would be the actual configuration of the SUServo, and the behavior mentioned by the tester would not work correctly.

I could test it on actual hardware with checking the output later when I get a spare Sampler.

Hey, yes that behavior is exactly what I experienced - so the output is correct. Affected would be the actual configuration of the SUServo, and the behavior mentioned by the tester would not work correctly. I could test it on actual hardware with checking the output later when I get a spare Sampler.
https://git.m-labs.hk/M-Labs/nac3/issues/270#issuecomment-5530
sb10q closed this issue 2022-05-28 12:56:34 +08:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#285
There is no content yet.