zc706 networking never comes up if RTIO PLL lock fails #181
Labels
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: M-Labs/artiq-zynq#181
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
If the RTIO PLL lock fails, networking never comes up, meaning that it is impossible to reload gateware without removing the SD card. This is highly impractical in the NIST setups as the SD card is difficult to insert/remove.
Somewhat related to #179 in that being able to reload gateware remotely is important to us.
Hm, mainline ARTIQ doesn't seem to get stuck in a loop if PLL lock fails, just prints an error to the logs. Mostly because the check is done only once, without loop.
@sb10q - Should we apply the same behavior for Zynq? Or was there any particular reason why it was a loop in the first place? I haven't seen a case where it would fail to lock at the beginning and eventually succeed.
At least that would allow coremgt to run, and for firmware or settings to be changed, as PLL lock failure may also happen if there's external clock setting enabled (e.g. by mistake) but there's no external clock connected.
Either way it seems that there may be some problem with
RtioClockMultiplier
that doesn't seem to work as intended on some Zynq-based units (same code works fine with Kasli/KC705, and some units seem to work consistently), in some cases (every few reboots/PORs it works?). Will have to check the documentation, maybe something is out of spec.I don't recall why it is a loop. Following the behavior of the RISC-V ARTIQ firmware sounds fine to me.
@ljstephenson do you know why the RTIO PLL failed in the first place?
There were a couple of unrelated issues - I stumbled into this using the DRTIO master zc706 gateware that would not lock to any clock, even when using a known good clock input.
It also turned out that when I first saw it there was no clock input anyway due to some hardware failures. (A loose connection had caused a large enough voltage drop that a voltage regulator was undervoltaged and so a CMOS->LVDS clock converter had no output.)
The standalone zc706 gateware hasn't failed to lock on a good input that I know of - although we have seen the PLL fail occasionally, I am reasonably convinced that it was just this loose connection changing resistance slightly and killing the clock input. I will keep an eye on it since it seems this is an intermittent failure with kasli SoC?