Kasli-soc: add WRPLL clock recovery #282

Merged
sb10q merged 1 commits from morgan/artiq-zynq:WRPLL into master 2024-08-17 17:37:23 +08:00
Owner

Prerequisite

Summary

overview

This PR implement WRPLL with Si549 DCXOs on kasli-soc.
The WRPLL first use a frequency counter to pull in the frequency and acquired a BASE_ADPLL.

Then two PLL will be enabled:

  • Helper PLL
    • A PI controller
    • trigger at every gtx tags
    • using GTX beating period to maintain the correct offset frequency f_{helper} = \dfrac{f_{GTX} * 32767}{32768}
  • Main PLL
    • A PI controller
    • trigger when gtx and main tags are ready
    • using DDMTD to lock its frequency and phase to the GTX CDR

Currently, the PR supports most existing function provided by Si5324:

  • Runtime
    • supports internal DCXO and PLL to SMA CLKIN
    • For 10/80/100Mhz SMA CLKIN, a mmcm will be used to multiply the frequency to 125Mhz before passing it to WRPLL.
    • Limitation
      • 150Mhz mode is not supported
      • PLL bypass mode is not supported
  • Satman:
    • supports clock recovery from GTX CDR and skew offset
    • calibration can be enable with the calibrate_wrpll_skew feature
    • Limitation
      • 150Mhz mode is not supported

Changelog

Gateware

  • kasli-soc

    • use enable_wrpll from json to switch between si5324 to si549
    • all variants: add wrpll
    • all variants: add gtx & main tag nFIQ
    • all variants: add WRPLL_REF_CLK config
    • standalone & master: add wrpll_refclk
    • satman: add skewtester
  • ddmtd (in helper clock domain)

    • add DDMTD and deglitcher
  • si549

    • add i2c and adpll programmer
  • wrpll

    • add helper clockdomain
    • add frequency counter
    • add gtx & main tag collection
    • add gtx & main tag eventmanager for FIQ
    • add skewtester
    • add SMA frequency multiplier to generate 125Mhz refclk

Firmware

  • rtio_clocking

    • add wrpll refclk setup
  • runtime main

    • drive CLK_SEL to true when si549 is used
    • allow standalone to use io_expander
  • satman main

    • drive CLK_SEL to true when si549 is used
    • add main & helper si549 setup
    • add WRPLL select_recovered_clock
  • io_expander

    • set CLK_SEL pin to output when si549 is used
    • gate virtual leds for standalone
  • si549

    • add bit bang i2c
    • add si549 programming
    • add main & helper setup
    • wrpll
      • add frequency counter to set BASE_ADPLL
      • add main & helper PLL
      • add gtx & main tags interrupt handler
      • add skew tester and tag offset calibration
    • tag collector
      • add gtx & main tags processing
      • add TAG_OFFSET for Satman WRPLL
      • add TAG_OFFSET getter & setter for calibration
    • wrpll_refclk
      • add bit bang mmcm dynamic configuration
  • FIQ

    • add wrpll custom handler
  • cargo template

    • remove dummy_fiq_handler to fix compilation error
    • add calibrate_wrpll_skew feature

Measurement

TODO

  • Replace the PI controllers with a proper filter
  • Change the current ddmtd deglitcher to median edge
## Prerequisite - flake update for the [zynq-rs/109](https://git.m-labs.hk/M-Labs/zynq-rs/pulls/109) and [zynq-rs/110](https://git.m-labs.hk/M-Labs/zynq-rs/pulls/110) patches ## Summary ![overview](/attachments/65d874a8-bd72-47ed-bec5-62536117a39d) This PR implement WRPLL with Si549 DCXOs on kasli-soc. The WRPLL first use a frequency counter to pull in the frequency and acquired a `BASE_ADPLL`. Then two PLL will be enabled: - Helper PLL - A PI controller - trigger at every gtx tags - using GTX beating period to maintain the correct offset frequency $f_{helper} = \dfrac{f_{GTX} * 32767}{32768}$ - Main PLL - A PI controller - trigger when gtx and main tags are ready - using DDMTD to lock its frequency and phase to the GTX CDR Currently, the PR supports most existing function provided by Si5324: - Runtime - supports internal DCXO and PLL to SMA CLKIN - For 10/80/100Mhz SMA CLKIN, a mmcm will be used to multiply the frequency to 125Mhz before passing it to WRPLL. - **Limitation** - 150Mhz mode is not supported - PLL bypass mode is not supported - Satman: - supports clock recovery from GTX CDR and skew offset - calibration can be enable with the `calibrate_wrpll_skew` feature - **Limitation** - 150Mhz mode is not supported ## Changelog ### Gateware - kasli-soc - use enable_wrpll from json to switch between si5324 to si549 - all variants: add wrpll - all variants: add gtx & main tag nFIQ - all variants: add WRPLL_REF_CLK config - standalone & master: add wrpll_refclk - satman: add skewtester - ddmtd (in helper clock domain) - add DDMTD and deglitcher - si549 - add i2c and adpll programmer - wrpll - add helper clockdomain - add frequency counter - add gtx & main tag collection - add gtx & main tag eventmanager for FIQ - add skewtester - add SMA frequency multiplier to generate 125Mhz refclk ### Firmware - rtio_clocking - add wrpll refclk setup - runtime main - drive CLK_SEL to true when si549 is used - allow standalone to use io_expander - satman main - drive CLK_SEL to true when si549 is used - add main & helper si549 setup - add WRPLL select_recovered_clock - io_expander - set CLK_SEL pin to output when si549 is used - gate virtual leds for standalone - si549 - add bit bang i2c - add si549 programming - add main & helper setup - wrpll - add frequency counter to set BASE_ADPLL - add main & helper PLL - add gtx & main tags interrupt handler - add skew tester and tag offset calibration - tag collector - add gtx & main tags processing - add TAG_OFFSET for Satman WRPLL - add TAG_OFFSET getter & setter for calibration - wrpll_refclk - add bit bang mmcm dynamic configuration - FIQ - add wrpll custom handler - cargo template - remove dummy_fiq_handler to fix compilation error - add calibrate_wrpll_skew feature ## Measurement - [phase noise when wrpll lock to CDR](https://git.m-labs.hk/M-Labs/artiq-zynq/pulls/282#issuecomment-8929) - [phase noise when wrpll lock to SMA and network benchmark](https://git.m-labs.hk/M-Labs/artiq-zynq/pulls/282#issuecomment-9316) ## TODO - [ ] Replace the PI controllers with a proper filter - [ ] Change the current ddmtd deglitcher to median edge
sb10q reviewed 2023-12-20 17:26:47 +08:00
@ -0,0 +72,4 @@
]
class Collector(Module):
Owner

Maybe this should move to firmware?

Maybe this should move to firmware?
Author
Owner

I tried moving the gateware collector to firmware with a gtx & main tag interrupts setup. But the CPU is not fast enough to sample two consecutive gtx tags while executing the helper PLL + main PLL. If a firmware collector is prefered, the main & helper PLL need to run alternatively to avoid cycle slipping for the beating period error calculation.

Firmware collector idea:

collector 2 gtx tags -> helper PLL -> collector gtx & main -> main PLL -> repeat

I tried moving the gateware collector to firmware with a gtx & main tag interrupts setup. But the CPU is not fast enough to sample two consecutive gtx tags while executing the helper PLL + main PLL. If a firmware collector is prefered, the main & helper PLL need to run alternatively to avoid cycle slipping for the beating period error calculation. ### Firmware collector idea: collector 2 gtx tags -> helper PLL -> collector gtx & main -> main PLL -> repeat
Owner

the CPU is not fast enough to sample two consecutive gtx tags

On a 1GHz Cortex A9 I find it hard to believe. Most likely this can be solved by proper design of the CPU/gateware interface i.e. still collect the related tags in gateware, but do not process them any further and just send them as-is to the CPU.

> the CPU is not fast enough to sample two consecutive gtx tags On a 1GHz Cortex A9 I find it hard to believe. Most likely this can be solved by proper design of the CPU/gateware interface i.e. still collect the related tags in gateware, but do not process them any further and just send them as-is to the CPU.
sb10q reviewed 2023-12-20 17:27:42 +08:00
@ -206,3 +208,3 @@
class GenericMaster(SoCCore):
def __init__(self, description, acpki=False):
def __init__(self, description, acpki=False, si5324=False):
Owner

Si5324 should be the default until WRPLL is completed.
Historically this flag was called --with-wrpll.

Si5324 should be the default until WRPLL is completed. Historically this flag was called --with-wrpll.
sb10q reviewed 2023-12-20 17:28:45 +08:00
@ -13,2 +13,4 @@
}
#[cfg(has_si549)]
const USE_SI549: u8 = 0xFF;
Owner

Something called USE_ should be a bool. Rename or restructure.

Something called USE_ should be a bool. Rename or restructure.
Owner

Remove Q# from the diagram. It doesn't add anything and is just clutter.

Remove Q# from the diagram. It doesn't add anything and is just clutter.
morgan force-pushed WRPLL from 0a4c34b1f2 to 7dc57dc24e 2024-01-04 16:24:01 +08:00 Compare
morgan force-pushed WRPLL from 7dc57dc24e to 2f57ccf617 2024-01-08 14:40:15 +08:00 Compare
morgan force-pushed WRPLL from 2f57ccf617 to ec506fd0ce 2024-01-08 16:49:19 +08:00 Compare
Author
Owner

Force push to address above comments, add an I term for main PLL, improve deglitcher, general clean up and update commit messages. Ready to review.

Force push to address above comments, add an I term for main PLL, improve deglitcher, general clean up and update commit messages. Ready to review.
morgan force-pushed WRPLL from ec506fd0ce to 3ec537a5da 2024-02-07 17:07:42 +08:00 Compare
Author
Owner

Force push changelog:

  • update helper PLL error formula to n * ideal_beating_period - GTX_TAG(n) to use the same KP/KI pair for both main & helper PLL
  • change the beating period to ~3.8kHz and update KP/KI pair
  • change WRPLL from IRQ to FIQ to avoid the PLLs being disable when during mutex lock (e.g. during UART communcation)
  • si549 i2c gateware is now in sys clockdomain
  • general clean up and update doc string
Force push changelog: - update helper PLL error formula to `n * ideal_beating_period - GTX_TAG(n)` to use the same KP/KI pair for both main & helper PLL - change the beating period to ~3.8kHz and update KP/KI pair - change WRPLL from IRQ to FIQ to avoid the PLLs being disable when during mutex lock (e.g. during UART communcation) - si549 i2c gateware is now in sys clockdomain - general clean up and update doc string
morgan force-pushed WRPLL from 3ec537a5da to b7694af295 2024-02-14 15:35:33 +08:00 Compare
Author
Owner

Force push changelog:

  • allow custom divider settings for main & helper si549
    • add 100Mhz options, locked with 100Mhz si549 master
  • increase blind period again, there are more glitches @100Mhz
  • remove the unused helper frequency counter
  • rebase with current master

125Mhz phase noise measurement on satellite MMCX J3

125Mhz_phase_noise

100Mhz phase noise measurement on satellite MMCX J3

  • si5324 is missing due to #285
    100Mhz_phase_noise
Force push changelog: - allow custom divider settings for main & helper si549 - add 100Mhz options, locked with 100Mhz si549 master - increase blind period again, there are more glitches @100Mhz - remove the unused helper frequency counter - rebase with current master ## 125Mhz phase noise measurement on satellite MMCX J3 ![125Mhz_phase_noise](/attachments/62a261c8-49f4-4405-926b-d1eacb0c59bb) ## 100Mhz phase noise measurement on satellite MMCX J3 - si5324 is missing due to #285 ![100Mhz_phase_noise](/attachments/f8d873f1-e3bc-4098-9f52-53a839ef8611)
sb10q reviewed 2024-03-01 14:29:32 +08:00
@ -0,0 +1,551 @@
use core::result::Result::Ok;
Owner

I don't see this line anywhere else in the codebase.

I don't see this line anywhere else in the codebase.
Author
Owner

Yea, no need for that line. Will be removed in next push. I think it was added by rust-analyzer plugin from vscodium.

Yea, no need for that line. Will be removed in next push. I think it was added by rust-analyzer plugin from vscodium.
sb10q reviewed 2024-03-01 14:31:03 +08:00
@ -0,0 +291,4 @@
static M_INTEGRATOR: Mutex<i32> = Mutex::new(0);
#[derive(Clone, Copy)]
pub enum FIQ {
Owner

I wouldn't call it FIQ because we're going to port that to MiSoC/Kasli where there's no FIQ, and the firmware divergence should be minimized (ideally, artiq-zynq and artiq firmware should be merged at some point).

I wouldn't call it FIQ because we're going to port that to MiSoC/Kasli where there's no FIQ, and the firmware divergence should be minimized (ideally, artiq-zynq and artiq firmware should be merged at some point).
Owner

We should probably move enable-wrpll to the JSON so it can be easily enabled on AFWS builds.

We should probably move enable-wrpll to the JSON so it can be easily enabled on AFWS builds.
morgan force-pushed WRPLL from b7694af295 to 673e2fa6de 2024-03-11 14:54:29 +08:00 Compare
morgan force-pushed WRPLL from 673e2fa6de to 0d0f9b5a6b 2024-03-11 14:55:01 +08:00 Compare
morgan force-pushed WRPLL from 0d0f9b5a6b to cbe48f9412 2024-03-11 15:04:23 +08:00 Compare
Author
Owner

Force push to rebase, cleanup and add support for runtime wrpll. Internal 150Mhz si549 and PLL bypass mode are not supported.

Changelog

  • Gateware
    • ddmtd & wrpll
      • refactor to accept clockdomain instead of just grtio
      • rename variables with gtx to the more generic ref
    • wrpll
      • refactor to use platform variable
      • put si549 submodule into wrpll
      • use refclk's name instead of hardcoding the domain name in frequency counter
      • add SMAFrequencyMultiplier submodule with mmcm bypass options
    • kasli_soc
      • refactor clocksynthesis into a submodule as all 3 variant need the same code for main si549 clk
      • add WRPLL_REF_CLK config
      • add wrpll for standalone and master
      • add wrpll_refclk using SMAFrequencyMultiplier to generate 125Mhz from SMA CLKIN
  • Firmware
    • remove has_wrpll in favor of just using has_si549
    • libboard_artiq cargo template
      • remove dummy_fiq_handler feature to fix compilation error
    • io_expander
      • gate virtual leds behind cfg to fix compilation warning on standalone
      • allow standalone to use io_expander
    • si549
      • rename variables with fiq to the more generic isr
      • rename variables with gtx to the more generic ref
      • rename csr functions after putting si549 as a submodule of wrpll
      • DividerConfig is used as reference following si5324
      • gate refclk frequency counter csr function behind wrpll_ref_clk="SMA_CLKIN"
      • add wrpll_refclk to generate a 125Mhz refclk using mmcm
        • sma clkin--> mmcm --> 125Mhz refclk--> wrpll
        • add mmcm drp bit bang control
        • add mmcm bypass options to reduce phase noise for 125Mhz
        • gate wrpll_refclk behind sma clkin only
    • runtime main
      • allow standalone to init io_expander
      • gate driving CLK_SEL to true behind has_si549
      • gate async io_expanders_service behind has_virtual_leds as standalone don't need to change virtual leds
    • rtio clocking
      • add main & helper si549 configs and clocking info!
      • add mmcm drp register configs (directly copied from preconfig mmcm)
      • add wrpll setup after sys clk switch as syscd CSRs reset after the switch (especially the EvantManger cannot remember their enable state if they are put before sys switch)

Network testing

  • with WRPLL enabled, async throughput loses 0.75MiB/s
## standalone using internal 125Mhz si549 only
 test_kernel_overhead (artiq.test.coredevice.test_performance.KernelOverheadTest.test_kernel_overhead) ... 0.04203264157999911 s
ok
test_array_large (artiq.test.coredevice.test_performance.TransferTest.test_array_large) ... ok
test_array_small (artiq.test.coredevice.test_performance.TransferTest.test_array_small) ... ok
test_async_throughput (artiq.test.coredevice.test_performance.TransferTest.test_async_throughput) ... Async throughput:  61.78MiB/s
ok
test_byte_list_large (artiq.test.coredevice.test_performance.TransferTest.test_byte_list_large) ... ok
test_byte_list_small (artiq.test.coredevice.test_performance.TransferTest.test_byte_list_small) ... ok
test_bytes_large (artiq.test.coredevice.test_performance.TransferTest.test_bytes_large) ... ok
test_bytes_small (artiq.test.coredevice.test_performance.TransferTest.test_bytes_small) ... ok
test_list_large (artiq.test.coredevice.test_performance.TransferTest.test_list_large) ... ok
test_list_small (artiq.test.coredevice.test_performance.TransferTest.test_list_small) ... ok

| Test                 | Mean (MiB/s) |  std (MiB/s) |
| -------------------- | ------------ | ------------ |
| I32 Array (1MB) H2D  |        71.68 |         2.51 |
| I32 Array (1MB) D2H  |        57.05 |         0.42 |
| I32 Array (1KB) H2D  |         4.93 |         1.17 |
| I32 Array (1KB) D2H  |         4.17 |         0.43 |
| Bytes List (1MB) H2D |        51.69 |         1.30 |
| Bytes List (1MB) D2H |        39.64 |         0.52 |
| Bytes List (1KB) H2D |         6.03 |         1.10 |
| Bytes List (1KB) D2H |         5.29 |         0.69 |
| Bytes (1MB) H2D      |        76.99 |         0.94 |
| Bytes (1MB) D2H      |        55.89 |         0.38 |
| Bytes (1KB) H2D      |         6.24 |         1.37 |
| Bytes (1KB) D2H      |         5.67 |         0.85 |
| I32 List (1MB) H2D   |        60.28 |         1.32 |
| I32 List (1MB) D2H   |        56.10 |         0.33 |
| I32 List (1KB) H2D   |         6.11 |         1.33 |
| I32 List (1KB) D2H   |         5.53 |         0.77 |

## standalone WRPLL and 125Mhz SMA CLKIN
test_kernel_overhead (artiq.test.coredevice.test_performance.KernelOverheadTest.test_kernel_overhead) ... 0.04144362593000096 s
ok
test_array_large (artiq.test.coredevice.test_performance.TransferTest.test_array_large) ... ok
test_array_small (artiq.test.coredevice.test_performance.TransferTest.test_array_small) ... ok
test_async_throughput (artiq.test.coredevice.test_performance.TransferTest.test_async_throughput) ... Async throughput:  61.03MiB/s
ok
test_byte_list_large (artiq.test.coredevice.test_performance.TransferTest.test_byte_list_large) ... ok
test_byte_list_small (artiq.test.coredevice.test_performance.TransferTest.test_byte_list_small) ... ok
test_bytes_large (artiq.test.coredevice.test_performance.TransferTest.test_bytes_large) ... ok
test_bytes_small (artiq.test.coredevice.test_performance.TransferTest.test_bytes_small) ... ok
test_list_large (artiq.test.coredevice.test_performance.TransferTest.test_list_large) ... ok
test_list_small (artiq.test.coredevice.test_performance.TransferTest.test_list_small) ... ok

| Test                 | Mean (MiB/s) |  std (MiB/s) |
| -------------------- | ------------ | ------------ |
| I32 Array (1MB) H2D  |        70.58 |         3.45 |
| I32 Array (1MB) D2H  |        56.57 |         0.33 |
| I32 Array (1KB) H2D  |         4.56 |         0.77 |
| I32 Array (1KB) D2H  |         4.09 |         0.42 |
| Bytes List (1MB) H2D |        50.90 |         1.48 |
| Bytes List (1MB) D2H |        39.25 |         0.48 |
| Bytes List (1KB) H2D |         4.89 |         1.32 |
| Bytes List (1KB) D2H |         4.84 |         0.70 |
| Bytes (1MB) H2D      |        76.65 |         1.70 |
| Bytes (1MB) D2H      |        55.26 |         0.47 |
| Bytes (1KB) H2D      |         5.00 |         1.35 |
| Bytes (1KB) D2H      |         5.49 |         1.16 |
| I32 List (1MB) H2D   |        58.57 |         1.70 |
| I32 List (1MB) D2H   |        55.41 |         0.48 |
| I32 List (1KB) H2D   |         5.17 |         1.59 |
| I32 List (1KB) D2H   |         4.87 |         0.36 |

Phase noise measurement

Standalone WRPLL performance compare to standalone si5324 & clocker

standalone_ph_noise_diffpll

Effect of difference reference sources on Standalone WRPLL

standalone_ph_noise_diff_src

Force push to rebase, cleanup and add support for runtime wrpll. Internal 150Mhz si549 and PLL bypass mode are not supported. ## Changelog - Gateware - ddmtd & wrpll - refactor to accept clockdomain instead of just grtio - rename variables with `gtx` to the more generic `ref` - wrpll - refactor to use platform variable - put si549 submodule into wrpll - use refclk's name instead of hardcoding the domain name in frequency counter - add SMAFrequencyMultiplier submodule with mmcm bypass options - kasli_soc - refactor clocksynthesis into a submodule as all 3 variant need the same code for main si549 clk - add WRPLL_REF_CLK config - add wrpll for standalone and master - add wrpll_refclk using SMAFrequencyMultiplier to generate 125Mhz from SMA CLKIN - Firmware - remove `has_wrpll` in favor of just using `has_si549` - libboard_artiq cargo template - remove `dummy_fiq_handler` feature to fix compilation error - io_expander - gate virtual leds behind cfg to fix compilation warning on standalone - allow standalone to use io_expander - si549 - rename variables with `fiq` to the more generic `isr` - rename variables with `gtx` to the more generic `ref` - rename csr functions after putting si549 as a submodule of wrpll - `DividerConfig` is used as reference following si5324 - gate refclk frequency counter csr function behind `wrpll_ref_clk="SMA_CLKIN"` - add wrpll_refclk to generate a 125Mhz refclk using mmcm - `sma clkin--> mmcm --> 125Mhz refclk--> wrpll` - add mmcm drp bit bang control - add mmcm bypass options to reduce phase noise for 125Mhz - gate wrpll_refclk behind sma clkin only - runtime main - allow standalone to init io_expander - gate driving CLK_SEL to true behind has_si549 - gate async io_expanders_service behind `has_virtual_leds` as standalone don't need to change virtual leds - rtio clocking - add main & helper si549 configs and clocking info! - add mmcm drp register configs (directly copied from preconfig mmcm) - add wrpll setup after sys clk switch as syscd CSRs reset after the switch (especially the EvantManger cannot remember their enable state if they are put before sys switch) ## Network testing - with WRPLL enabled, async throughput loses 0.75MiB/s ```bash ## standalone using internal 125Mhz si549 only test_kernel_overhead (artiq.test.coredevice.test_performance.KernelOverheadTest.test_kernel_overhead) ... 0.04203264157999911 s ok test_array_large (artiq.test.coredevice.test_performance.TransferTest.test_array_large) ... ok test_array_small (artiq.test.coredevice.test_performance.TransferTest.test_array_small) ... ok test_async_throughput (artiq.test.coredevice.test_performance.TransferTest.test_async_throughput) ... Async throughput: 61.78MiB/s ok test_byte_list_large (artiq.test.coredevice.test_performance.TransferTest.test_byte_list_large) ... ok test_byte_list_small (artiq.test.coredevice.test_performance.TransferTest.test_byte_list_small) ... ok test_bytes_large (artiq.test.coredevice.test_performance.TransferTest.test_bytes_large) ... ok test_bytes_small (artiq.test.coredevice.test_performance.TransferTest.test_bytes_small) ... ok test_list_large (artiq.test.coredevice.test_performance.TransferTest.test_list_large) ... ok test_list_small (artiq.test.coredevice.test_performance.TransferTest.test_list_small) ... ok | Test | Mean (MiB/s) | std (MiB/s) | | -------------------- | ------------ | ------------ | | I32 Array (1MB) H2D | 71.68 | 2.51 | | I32 Array (1MB) D2H | 57.05 | 0.42 | | I32 Array (1KB) H2D | 4.93 | 1.17 | | I32 Array (1KB) D2H | 4.17 | 0.43 | | Bytes List (1MB) H2D | 51.69 | 1.30 | | Bytes List (1MB) D2H | 39.64 | 0.52 | | Bytes List (1KB) H2D | 6.03 | 1.10 | | Bytes List (1KB) D2H | 5.29 | 0.69 | | Bytes (1MB) H2D | 76.99 | 0.94 | | Bytes (1MB) D2H | 55.89 | 0.38 | | Bytes (1KB) H2D | 6.24 | 1.37 | | Bytes (1KB) D2H | 5.67 | 0.85 | | I32 List (1MB) H2D | 60.28 | 1.32 | | I32 List (1MB) D2H | 56.10 | 0.33 | | I32 List (1KB) H2D | 6.11 | 1.33 | | I32 List (1KB) D2H | 5.53 | 0.77 | ``` ```bash ## standalone WRPLL and 125Mhz SMA CLKIN test_kernel_overhead (artiq.test.coredevice.test_performance.KernelOverheadTest.test_kernel_overhead) ... 0.04144362593000096 s ok test_array_large (artiq.test.coredevice.test_performance.TransferTest.test_array_large) ... ok test_array_small (artiq.test.coredevice.test_performance.TransferTest.test_array_small) ... ok test_async_throughput (artiq.test.coredevice.test_performance.TransferTest.test_async_throughput) ... Async throughput: 61.03MiB/s ok test_byte_list_large (artiq.test.coredevice.test_performance.TransferTest.test_byte_list_large) ... ok test_byte_list_small (artiq.test.coredevice.test_performance.TransferTest.test_byte_list_small) ... ok test_bytes_large (artiq.test.coredevice.test_performance.TransferTest.test_bytes_large) ... ok test_bytes_small (artiq.test.coredevice.test_performance.TransferTest.test_bytes_small) ... ok test_list_large (artiq.test.coredevice.test_performance.TransferTest.test_list_large) ... ok test_list_small (artiq.test.coredevice.test_performance.TransferTest.test_list_small) ... ok | Test | Mean (MiB/s) | std (MiB/s) | | -------------------- | ------------ | ------------ | | I32 Array (1MB) H2D | 70.58 | 3.45 | | I32 Array (1MB) D2H | 56.57 | 0.33 | | I32 Array (1KB) H2D | 4.56 | 0.77 | | I32 Array (1KB) D2H | 4.09 | 0.42 | | Bytes List (1MB) H2D | 50.90 | 1.48 | | Bytes List (1MB) D2H | 39.25 | 0.48 | | Bytes List (1KB) H2D | 4.89 | 1.32 | | Bytes List (1KB) D2H | 4.84 | 0.70 | | Bytes (1MB) H2D | 76.65 | 1.70 | | Bytes (1MB) D2H | 55.26 | 0.47 | | Bytes (1KB) H2D | 5.00 | 1.35 | | Bytes (1KB) D2H | 5.49 | 1.16 | | I32 List (1MB) H2D | 58.57 | 1.70 | | I32 List (1MB) D2H | 55.41 | 0.48 | | I32 List (1KB) H2D | 5.17 | 1.59 | | I32 List (1KB) D2H | 4.87 | 0.36 | ``` ## Phase noise measurement ### Standalone WRPLL performance compare to standalone si5324 & clocker ![standalone_ph_noise_diffpll](/attachments/8686e91c-82e8-4c16-8cdf-df7f30547020) ### Effect of difference reference sources on Standalone WRPLL ![standalone_ph_noise_diff_src](/attachments/4b37e794-e90d-40f8-a0f3-fbeb79db79bc)
morgan force-pushed WRPLL from cbe48f9412 to ff68521bfd 2024-03-19 11:05:54 +08:00 Compare
morgan force-pushed WRPLL from ff68521bfd to 1acd9da095 2024-03-19 11:08:51 +08:00 Compare
Author
Owner

Force push changelog

  • Flake
    • update artiq for the enable_wrpll json field
  • Gateware
    • replace the --use-wrpll arg with the enable_wrpll field in json
    • add SkewTester
  • Firmware
    • add back has_wrpll cfg
    • si549
      • move set_adpll, helper_setup outside of wrpll to separate si549 driver from wrpll module
      • add TAG_OFFSET for Satman WRPLL to meet the setup/hold constraints of rx synchronizer
      • add skew tester and calibration for Satman WRPLL
      • gate calibration behind calibrate_wrpll_skew feature
    • cargo template
      • add calibrate_wrpll_skew feature

Skew calibration

  • manually edit the make file --features=target_$(TARGET),calibrate_wrpll_skew
  • two kasli-soc are tested, both cards have a calibrated result of around 19000
  • 19050 is used as the calibrated tag offset, it's the average result of 10 consecutive runs on one of the card.
calibration successful, error zone: 2304 -> 3112, width: 31960 (351deg), middle of working region: 19092
calibration successful, error zone: 2176 -> 3008, width: 31936 (350deg), middle of working region: 18976
calibration successful, error zone: 2296 -> 3120, width: 31944 (350deg), middle of working region: 19092
calibration successful, error zone: 2304 -> 3128, width: 31944 (350deg), middle of working region: 19100
calibration successful, error zone: 2208 -> 3008, width: 31968 (351deg), middle of working region: 18992
calibration successful, error zone: 2304 -> 3136, width: 31936 (350deg), middle of working region: 19104
calibration successful, error zone: 2216 -> 3032, width: 31952 (351deg), middle of working region: 19008
calibration successful, error zone: 2192 -> 3024, width: 31936 (350deg), middle of working region: 18992
calibration successful, error zone: 2208 -> 3032, width: 31944 (350deg), middle of working region: 19004
calibration successful, error zone: 2312 -> 3112, width: 31968 (351deg), middle of working region: 19096
calibration successful, error zone: 2296 -> 3136, width: 31928 (350deg), middle of working region: 19100
Force push changelog - Flake - update artiq for the `enable_wrpll` json field - Gateware - replace the --use-wrpll arg with the `enable_wrpll` field in json - add SkewTester - Firmware - add back `has_wrpll` cfg - si549 - move `set_adpll`, `helper_setup` outside of wrpll to separate si549 driver from wrpll module - add `TAG_OFFSET` for Satman WRPLL to meet the setup/hold constraints of rx synchronizer - add skew tester and calibration for Satman WRPLL - gate calibration behind `calibrate_wrpll_skew` feature - cargo template - add `calibrate_wrpll_skew` feature ## Skew calibration - manually edit the make file `--features=target_$(TARGET),calibrate_wrpll_skew` - two kasli-soc are tested, both cards have a calibrated result of around `19000` - `19050` is used as the calibrated tag offset, it's the average result of 10 consecutive runs on one of the card. ```bash calibration successful, error zone: 2304 -> 3112, width: 31960 (351deg), middle of working region: 19092 calibration successful, error zone: 2176 -> 3008, width: 31936 (350deg), middle of working region: 18976 calibration successful, error zone: 2296 -> 3120, width: 31944 (350deg), middle of working region: 19092 calibration successful, error zone: 2304 -> 3128, width: 31944 (350deg), middle of working region: 19100 calibration successful, error zone: 2208 -> 3008, width: 31968 (351deg), middle of working region: 18992 calibration successful, error zone: 2304 -> 3136, width: 31936 (350deg), middle of working region: 19104 calibration successful, error zone: 2216 -> 3032, width: 31952 (351deg), middle of working region: 19008 calibration successful, error zone: 2192 -> 3024, width: 31936 (350deg), middle of working region: 18992 calibration successful, error zone: 2208 -> 3032, width: 31944 (350deg), middle of working region: 19004 calibration successful, error zone: 2312 -> 3112, width: 31968 (351deg), middle of working region: 19096 calibration successful, error zone: 2296 -> 3136, width: 31928 (350deg), middle of working region: 19100 ```
Owner

GTX_CDR -> GT_CDR

We'll want to run this on other boards with different transceiver types.

GTX_CDR -> GT_CDR We'll want to run this on other boards with different transceiver types.
sb10q reviewed 2024-03-19 11:30:56 +08:00
@ -0,0 +691,4 @@
fn one_clock_cycle(timer: &mut GlobalTimer) {
unsafe {
csr::wrpll_refclk::mmcm_dclk_write(1);
timer.delay_us(1);
Owner

Maximum DCLK frequency is 200MHz as per https://www.mouser.com/datasheet/2/903/ds191_XC7Z030_XC7Z045_data_sheet-1923769.pdf

You can remove the timer and delays.

Maximum DCLK frequency is 200MHz as per https://www.mouser.com/datasheet/2/903/ds191_XC7Z030_XC7Z045_data_sheet-1923769.pdf You can remove the timer and delays.
morgan force-pushed WRPLL from 1acd9da095 to 6cc02cc460 2024-03-19 11:39:01 +08:00 Compare
morgan force-pushed WRPLL from 6cc02cc460 to 005f96a924 2024-03-19 12:45:48 +08:00 Compare
Author
Owner

Force push changelog

  • cleanup some of the commit message
  • Gateware:
    • rename GTX_CDR -> GT_CDR
  • Firmware:
    • si549: cfg rename GTX_CDR -> GT_CDR
    • wrpll_refclk: remove timer delay from one_clock_cycle (registers are verified to be written properly)
Force push changelog - cleanup some of the commit message - Gateware: - rename GTX_CDR -> GT_CDR - Firmware: - si549: cfg rename GTX_CDR -> GT_CDR - wrpll_refclk: remove timer delay from one_clock_cycle (registers are verified to be written properly)
morgan changed title from satellite: add WRPLL clock recovery to Kasli-soc: add WRPLL clock recovery 2024-03-25 10:21:28 +08:00
sb10q reviewed 2024-03-25 17:44:49 +08:00
@ -47,6 +48,19 @@ eem_iostandard_dict = {
def eem_iostandard(eem):
return IOStandard(eem_iostandard_dict[eem])
class ClockSynthesis(Module):
Owner

This is a poorly chosen name since there isn't a PLL or anything like it inside. Maybe just inline it anyway? It's just IBUFGDS really and some constraints that may be redundant and/or are dependent on the board.

This is a poorly chosen name since there isn't a PLL or anything like it inside. Maybe just inline it anyway? It's just IBUFGDS really and some constraints that may be redundant and/or are dependent on the board.
morgan force-pushed WRPLL from 005f96a924 to 8fd524d36b 2024-03-26 12:10:49 +08:00 Compare
morgan force-pushed WRPLL from 8fd524d36b to 137489da93 2024-03-26 13:23:25 +08:00 Compare
Author
Owner

Force push changelog

  • kasli soc
    • revert the ClockSynthesis refactor
    • add clk_synth, IBUFGDS & period constraint for master and satellite
  • Testing
    • master & standalone: locked with external 125Mhz
    • satellite: locked with master and the skew met the timing constraint
Force push changelog - kasli soc - revert the `ClockSynthesis` refactor - add `clk_synth`, `IBUFGDS` & `period constraint` for master and satellite - Testing - master & standalone: locked with external 125Mhz - satellite: locked with master and the skew met the timing constraint
sb10q reviewed 2024-03-26 15:08:40 +08:00
@ -0,0 +133,4 @@
)
]
# PL->PS interrupt
Owner

On Kasli we won't have PS/PL. But this can be addressed later.

On Kasli we won't have PS/PL. But this can be addressed later.
sb10q reviewed 2024-03-26 15:26:32 +08:00
@ -0,0 +304,4 @@
csr::wrpll::main_dcxo_adpll_stb_write(1);
csr::wrpll::main_dcxo_adpll_stb_write(0);
if csr::wrpll::main_dcxo_nack_read() == 1 {
Owner

This test can only be done after the transaction is completed and seems misplaced. Should be right after the while loop I guess, and the core should probably reset the nack status on stb.

This test can only be done after the transaction is completed and seems misplaced. Should be right after the while loop I guess, and the core should probably reset the nack status on stb.
sb10q reviewed 2024-03-26 15:26:50 +08:00
@ -0,0 +302,4 @@
csr::wrpll::main_dcxo_adpll_write(adpll as u32);
csr::wrpll::main_dcxo_adpll_stb_write(1);
csr::wrpll::main_dcxo_adpll_stb_write(0);
Owner

Writing 0 is superfluous if you use CSR() in the core.

Writing 0 is superfluous if you use CSR() in the core.
Owner

Wording of messages and code comments could be improved.

Wording of messages and code comments could be improved.
sb10q reviewed 2024-03-26 15:31:28 +08:00
@ -0,0 +364,4 @@
static REF_TAG: Mutex<u32> = Mutex::new(0);
static REF_TAG_READY: Mutex<bool> = Mutex::new(false);
static MAIN_TAG: Mutex<u32> = Mutex::new(0);
static MAIN_TAG_READY: Mutex<bool> = Mutex::new(false);
Owner

Why do we need all those mutexes? Only one thread is accessing those, no?

Why do we need all those mutexes? Only one thread is accessing those, no?
Author
Owner

Yes, will change it to static mut

Yes, will change it to `static mut`
sb10q reviewed 2024-03-26 15:34:16 +08:00
@ -0,0 +650,4 @@
info!("warming up refclk...");
// refclk need a couple seconds for freq counter to read it properly
timer.delay_us(20_000_000);
Owner

That's 20 seconds, not "a couple".
What is refclk and why do we need to wait so long?

That's 20 seconds, not "a couple". What is refclk and why do we need to wait so long?
Author
Owner

it's waiting for the gtx_cdr, sometimes the cdr is not locked/stable yet and the frequency counter will read the wrong value.

it's waiting for the gtx_cdr, sometimes the cdr is not locked/stable yet and the frequency counter will read the wrong value.
Owner

Wouldn't it be better to just wait for the CDR lock then?
When is select_recovered_clock() called?

Wouldn't it be better to just wait for the CDR lock then? When is select_recovered_clock() called?
Author
Owner

Yes, that will also save time. Let me look in it
The select_recovered_clock(true) is called directly after drtiosat_link_rx_up

Yes, that will also save time. Let me look in it The select_recovered_clock(true) is called directly after [drtiosat_link_rx_up](https://git.m-labs.hk/M-Labs/artiq-zynq/src/commit/1f7c53b8d0dfab4a0194e6a860e57bb17e6e9b66/src/satman/src/main.rs#L914)
sb10q reviewed 2024-03-26 15:35:03 +08:00
@ -0,0 +799,4 @@
let locked = unsafe { csr::wrpll_refclk::mmcm_locked_read() == 1 };
if !locked {
return Err("mmcm failed to generate 125Mhz ref clock from SMA CLKIN");
Owner

MHz

MHz
morgan force-pushed WRPLL from 20732c0de1 to aa4bb8bae8 2024-04-09 17:34:28 +08:00 Compare
morgan force-pushed WRPLL from aa4bb8bae8 to 14fa038118 2024-04-12 16:45:37 +08:00 Compare
Author
Owner

Force pushed changelog

  • gateware
    • doc clenaup
    • increase blind period to 400 to improve lock stability for satman
    • si549
      • strobe now use CSR
      • the nack status is latched during adpll transfer and the next adpll stb will reset it
    • frequency counter
      • fix a issue where the counter value is not reset properly
      • add a busy CSRstatus
  • firmware
    • doc cleanup
    • replace mutex with static mut
    • set_adpll
      • remove main_dcxo_adpll_stb_write(0)
      • move nack check just after busy
    • select recovery clock
      • remove 20s delay
      • add 200us delay for adpll reset
    • frequency counter
      • remove the 150_000us wait in favor of using the busy CSRstatus
    • PLLs
      • use a different form of PI expression from PID-IIR primer
      • add anti integrator windup to prevent the wind-up forcing the PLL out of capture range
      • use 4 ppm as adpll limit

Test

  • Satman PI anti integrator windup test
    • without the clamp there is a 5% chance (one in around 20) the integrator will blow the PLL out of capture range and fail to lock
    • 50 consecutive runs are tested and no lock failure after the clamp
  • Satman long term stability test
    • with 300 blind period, there is a small chance of glitched input kicking the controller to lose lock for a couple cycles before locking again. Usually can be observed when turning persistence mode on with an oscilloscope
    • a 2hr lock test is run with 400 blind period and persistence mode on. The lock held for the test duration.
Force pushed changelog - gateware - doc clenaup - increase blind period to 400 to improve lock stability for satman - si549 - strobe now use CSR - the nack status is latched during adpll transfer and the next adpll stb will reset it - frequency counter - fix a issue where the counter value is not reset properly - add a `busy` CSRstatus - firmware - doc cleanup - replace `mutex` with `static mut` - set_adpll - remove `main_dcxo_adpll_stb_write(0)` - move `nack` check just after `busy` - select recovery clock - remove 20s delay - add 200us delay for adpll reset - frequency counter - remove the 150_000us wait in favor of using the `busy` CSRstatus - PLLs - use a different form of PI expression from [PID-IIR primer](https://hackmd.io/IACbwcOTSt6Adj3_F9bKuw?view#Integral-wind-up-and-output-limiting) - add anti integrator windup to prevent the wind-up forcing the PLL out of capture range - use 4 ppm as adpll limit ## Test - Satman PI anti integrator windup test - without the clamp there is a 5% chance (one in around 20) the integrator will blow the PLL out of capture range and fail to lock - 50 consecutive runs are tested and no lock failure after the clamp - Satman long term stability test - with 300 blind period, there is a small chance of glitched input kicking the controller to lose lock for a couple cycles before locking again. Usually can be observed when turning persistence mode on with an oscilloscope - a 2hr lock test is run with 400 blind period and persistence mode on. The lock held for the test duration.
sb10q merged commit 14fa038118 into master 2024-04-12 17:23:11 +08:00
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/artiq-zynq#282
No description provided.