zc706 networking never comes up if RTIO PLL lock fails #181

Closed
opened 2022-04-07 02:16:18 +08:00 by ljstephenson · 4 comments

If the RTIO PLL lock fails, networking never comes up, meaning that it is impossible to reload gateware without removing the SD card. This is highly impractical in the NIST setups as the SD card is difficult to insert/remove.

Somewhat related to #179 in that being able to reload gateware remotely is important to us.

If the RTIO PLL lock fails, networking never comes up, meaning that it is impossible to reload gateware without removing the SD card. This is highly impractical in the NIST setups as the SD card is difficult to insert/remove. Somewhat related to #179 in that being able to reload gateware remotely is important to us.
Owner

Hm, mainline ARTIQ doesn't seem to get stuck in a loop if PLL lock fails, just prints an error to the logs. Mostly because the check is done only once, without loop.

@sb10q - Should we apply the same behavior for Zynq? Or was there any particular reason why it was a loop in the first place? I haven't seen a case where it would fail to lock at the beginning and eventually succeed.

At least that would allow coremgt to run, and for firmware or settings to be changed, as PLL lock failure may also happen if there's external clock setting enabled (e.g. by mistake) but there's no external clock connected.

Either way it seems that there may be some problem with RtioClockMultiplier that doesn't seem to work as intended on some Zynq-based units (same code works fine with Kasli/KC705, and some units seem to work consistently), in some cases (every few reboots/PORs it works?). Will have to check the documentation, maybe something is out of spec.

Hm, mainline ARTIQ doesn't seem to get stuck in a loop if PLL lock fails, just prints an error to the logs. Mostly because the check is done only once, without loop. @sb10q - Should we apply the same behavior for Zynq? Or was there any particular reason why it was a loop in the first place? I haven't seen a case where it would fail to lock at the beginning and eventually succeed. At least that would allow coremgt to run, and for firmware or settings to be changed, as PLL lock failure may also happen if there's external clock setting enabled (e.g. by mistake) but there's no external clock connected. Either way it seems that there may be some problem with ``RtioClockMultiplier`` that doesn't seem to work as intended on *some* Zynq-based units (same code works fine with Kasli/KC705, and some units seem to work consistently), in *some* cases (every few reboots/PORs it works?). Will have to check the documentation, maybe something is out of spec.
Owner

I don't recall why it is a loop. Following the behavior of the RISC-V ARTIQ firmware sounds fine to me.

I don't recall why it is a loop. Following the behavior of the RISC-V ARTIQ firmware sounds fine to me.
sb10q closed this issue 2022-04-12 14:19:19 +08:00
Owner

@ljstephenson do you know why the RTIO PLL failed in the first place?

@ljstephenson do you know why the RTIO PLL failed in the first place?
Author

There were a couple of unrelated issues - I stumbled into this using the DRTIO master zc706 gateware that would not lock to any clock, even when using a known good clock input.

It also turned out that when I first saw it there was no clock input anyway due to some hardware failures. (A loose connection had caused a large enough voltage drop that a voltage regulator was undervoltaged and so a CMOS->LVDS clock converter had no output.)

The standalone zc706 gateware hasn't failed to lock on a good input that I know of - although we have seen the PLL fail occasionally, I am reasonably convinced that it was just this loose connection changing resistance slightly and killing the clock input. I will keep an eye on it since it seems this is an intermittent failure with kasli SoC?

There were a couple of unrelated issues - I stumbled into this using the DRTIO master zc706 gateware that would not lock to any clock, even when using a known good clock input. It also turned out that when I first saw it there was no clock input anyway due to some hardware failures. (A loose connection had caused a large enough voltage drop that a voltage regulator was undervoltaged and so a CMOS->LVDS clock converter had no output.) The standalone zc706 gateware hasn't failed to lock on a good input that I know of - although we have seen the PLL fail occasionally, I am reasonably convinced that it was just this loose connection changing resistance slightly and killing the clock input. I will keep an eye on it since it seems this is an intermittent failure with kasli SoC?
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/artiq-zynq#181
No description provided.