The ARTIQ Real-Time I/O design employs several concepts to achieve its goals of high timing resolution on the nanosecond scale and low latency on the microsecond scale while still not sacrificing a readable and extensible language.
In a typical environment two very different classes of hardware need to be controlled.
One class is the vast arsenal of diverse laboratory hardware that interfaces with and is controlled from a typical PC.
The other is specialized real-time hardware
that requires tight coupling and a low-latency interface to a CPU.
The ARTIQ code that describes a given experiment is composed of two types of "programs":
regular Python code that is executed on the host and ARTIQ *kernels* that are executed on a *core device*.
The CPU that executes the ARTIQ kernels has direct access to specialized programmable I/O timing logic (part of the *gateware*).
The two types of code can invoke each other and transitions between them are seamless.
The ARTIQ kernels do not interface with the real-time gateware directly.
That would lead to imprecise, indeterminate, and generally unpredictable timing.
Instead the CPU operates at one end of a bank of FIFO (first-in-first-out) buffers while the real-time gateware at the other end guarantees the *all or nothing* level of excellent timing precision.
A FIFO for an output channel hold timestamps and event data describing when and what is to be executed.
The CPU feeds events into this FIFO.
A FIFO for an input channel contains timestamps and event data for events that have been recorded by the real-time gateware and are waiting to be read out by
the CPU on the other end.
The timeline
------------
The set of all input and output events on all channels constitutes the *timeline*.
A high resolution wall clock (``rtio_counter``) counts clock cycles and causes output events to be executed when their timestamp matches the clock and input events to be recorded and stamped with the current clock value accordingly.
The kernel runtime environment maintains a timeline cursor (called ``now``) used as the timestamp when output events are submitted to the FIFOs.
This timeline cursor can be moved forward or backward on the timeline relative to its current value using :func:`artiq.language.core.delay` and :func:`artiq.language.core.delay_mu`, the later being a delay given in *machine units* as opposed to SI units.
The absolute value of ``now`` on the timeline can be retrieved using :func:`artiq.language.core.now_mu` and it can be set using :func:`artiq.language.core.at_mu`.
RTIO timestamps, the timeline cursor, and the ``rtio_counter`` wall clock are all relative to the core device startup/boot time.
The wall clock keeps running across experiments.
Absolute timestamps can be large numbers.
They are represented internally as 64-bit integers with a resolution of typically a nanosecond and a range of hundreds of years.
Conversions between such a large integer number and a floating point representation can cause loss of precision through cancellation.
When computing the difference of absolute timestamps, use ``self.core.mu_to_seconds(t2-t1)``, not ``self.core.mu_to_seconds(t2)-self.core.mu_to_seconds(t1)`` (see :meth:`artiq.coredevice.Core.mu_to_seconds`).
The :meth:`artiq.coredevice.ttl.TTLOut.pulse` method advances the timeline cursor (using ``delay()``) while other methods such as :meth:`artiq.coredevice.ttl.TTLOut.on`, :meth:`artiq.coredevice.ttl.TTLOut.off`, :meth:`artiq.coredevice.ad9914.set`. The latter are called *zero-duration* methods.
To track down ``RTIOUnderflows`` in an experiment there are a few approaches:
* Exception backtraces show where underflow has occurred while executing the
code.
* The :any:`integrated logic analyzer <core-device-rtio-analyzer-tool>` shows the timeline context that lead to the exception. The analyzer is always active and supports plotting of RTIO slack. RTIO slack is the difference between timeline cursor and wall clock time (``now - rtio_counter``).
A sequence error happens when the sequence of coarse timestamps cannot be supported by the gateware. For example, there may have been too many timeline rewinds.
Internally, the gateware stores output events in an array of FIFO buffers (the "lanes") and the timestamps in each lane must be strictly increasing. If an event with a decreasing or equal timestamp is submitted, the gateware selects the next lane, wrapping around if the final lane is reached. If this lane also contains an event with a timestamp beyond the one being submitted then a sequence error occurs. See `this issue <https://github.com/m-labs/artiq/issues/1081>`_ for a real-life example of how this works.
* Whether a particular sequence of timestamps causes a sequence error or not is fully deterministic (starting from a known RTIO state, e.g. after a reset). Adding a constant offset to the whole sequence does not affect the result.
* Zero-duration methods (such as :meth:`artiq.coredevice.ttl.TTLOut.on()`) do not advance the timeline and so will consume additional lanes if they are scheduled simultaneously. Adding a tiny delay will prevent this (e.g. ``delay_mu(np.int64(self.core.ref_multiplier))``, at least one coarse rtio cycle).
The offending event is discarded and the RTIO core keeps operating.
This error is reported asynchronously via the core device log: for performance reasons with DRTIO, the CPU does not wait for an error report from the satellite after writing an event. Therefore, it is not possible to raise an exception precisely.
A collision happens when more than one event is submitted on a given channel with the same coarse timestamp, and that channel does not implement replacement behavior or the fine timestamps are different.
Coarse timestamps correspond to the RTIO system clock (typically around 125MHz) whereas fine timestamps correspond to the RTIO SERDES clock (typically around 1GHz). Different channels may have different ratios between the coarse and fine timestamp clock frequencies.
The offending event is discarded and the RTIO core keeps operating.
This error is reported asynchronously via the core device log: for performance reasons with DRTIO, the CPU does not wait for an error report from the satellite after writing an event. Therefore, it is not possible to raise an exception precisely.
Busy errors
-----------
A busy error happens when at least one output event could not be executed because the channel was already busy executing a previous event.
The offending event was discarded.
This error is reported asynchronously via the core device log.
The :meth:`artiq.coredevice.ttl.TTLInOut.count` method of an input channel will often lead to a situation of negative slack (timeline cursor ``now`` smaller than the current wall clock ``rtio_counter``):
The :meth:`artiq.coredevice.ttl.TTLInOut.gate_rising` method leaves the timeline cursor at the closing time of the gate. ``count()`` must necessarily wait until the gate closing event has actually been executed, at which point ``rtio_counter > now``: ``count()`` synchronizes timeline cursor (``now``) and wall clock (``rtio_counter``). In these situations, a ``delay()`` is necessary to re-establish positive slack so that further output events can be placed.
The RTIO input channels buffer input events received while an input gate is open, or at certain points in time when using the sampling API (:meth:`artiq.coredevice.ttl.TTLInOut.sample_input`).
The events are kept in a FIFO until the CPU reads them out via e.g. :meth:`artiq.coredevice.ttl.TTLInOut.count`, :meth:`artiq.coredevice.ttl.TTLInOut.timestamp_mu` or :meth:`artiq.coredevice.ttl.TTLInOut.sample_get`.
The condition is converted into an :class:`artiq.coredevice.exceptions.RTIOOverflow` exception that is raised on a subsequent invocation of one of the readout methods (e.g. ``count()``, ``timestamp_mu()``, ``sample_get()``).
Here, ``run()`` calls ``k1()`` which exits leaving the cursor one second after the rising edge and ``k2()`` then submits a falling edge at that position.
The seamless handover of the timeline (cursor and events) across kernels and experiments implies that a kernel can exit long before the events it has submitted have been executed.
If a previous kernel sets timeline cursor far in the future this effectively locks the system.
When a kernel should wait until all the events have been executed, use the :meth:`artiq.coredevice.core.Core.wait_until_mu` with a timestamp after (or at) the last event:
Therefore, when switching experiments it can be adequate to clear the RTIO FIFOs and initialize the timeline cursor to "sometime in the near future" using :meth:`artiq.coredevice.core.Core.reset`.
The example idle kernel implements this mechanism.
Since it never waits for any input, it will rapidly fill the output FIFOs and would produce a large positive slack.
To avoid large positive slack and to accommodate for seamless handover the idle kernel will only run when no other experiment is pending and the example will wait before submitting events until there is significant negative slack.