assembly/src/sw_sup/drtio.md

3.1 KiB

DRTIO

This page intends to help users solve problems with their DRTIO systems.

Description (from user experience)

Distributed Real Time Input/Output - allows almost seamlessly connecting several satellites to one master crate, so that all the crates can be controlled as one whole crate. The connection between the crates is done either by passive copper direct attach cables (suitable for one-crate setups) or optical fibers SFP+ adapters (suitable for multiple crates that can be distributed up to several kilometers). The DRTIO protocol is not compatible with Ethernet, and moreover, satellites do not have any network access and can be controlled only by master. However, both star (2 levels) and tree topologies are supported as well, with default one being the star (one master and up to 3-4 directly connected satellites), and if any chaining is needed, the routing table setup is needed. To switch between satellite/master/standalone variants you just need to flash appropriate firmware, and set the respective base field in the JSON description.

The master will attempt to connect the satellite whenever it sees that there are SFPs plugged in. For this purpose, it will ping the satellite until it establishes the connection. This connection process can be observed from the logs:

// successful connection
[ 5385.011286s] INFO(runtime::rtio_mgt::drtio): [LINK#1] link RX became up, pinging
[ 5390.219274s] INFO(runtime::rtio_mgt::drtio): [LINK#1] remote replied after 27 packets
[ 5390.257152s] INFO(runtime::rtio_mgt::drtio): [LINK#1] link initialization completed
[ 5390.264854s] INFO(runtime::rtio_mgt::drtio): [DEST#2] destination is up
[ 5390.271567s] INFO(runtime::rtio_mgt::drtio): [DEST#2] buffer space is 128

// not successful connection:
[    95.269811s]  INFO(runtime::rtio_mgt::drtio): [LINK#1] link RX became up, pinging
[   115.076772s] ERROR(runtime::rtio_mgt::drtio): [LINK#1] ping failed

During the connection, the clock signal is being distributed, effectively making the clocks across crates to be synchronized.

Common problems

Master and satellite do not connect with each other

  • Shady cables and SFP adapters are often the cause, use the adapters from reputable sources, or better, use the one we ship. You may also contact our helpdesk to get help in choosing the right adapters if needed.
  • The adapter is not pushed until the end. You shouldn't be able to pull out the adapters without pulling the petals/handles.
  • The fiber is not properly connected - you shouldn't be able to pull it out without squeezing the handle. Also the optics may be dirty or damaged.
  • Wrong setups - master to master, standalone to standalone. Messing up with SFP ports generally makes it unusable, but the connection should be established in most cases.
  • The fiber adapters are not symmetrical - if one end has 1270/1330 label, another one should be 1330/1270.

Master-satellite interrupted/unstable connection

This often happens due to overheating issues. Check if the Kasli/SoC fans are working properly and try installing rack fans to increase the air flow.