The central part of our code is our ``LED`` class, that derives from :class:`artiq.language.environment.EnvExperiment`. Among other features, ``EnvExperiment`` calls our ``build`` method and provides the ``setattr_device`` method that interfaces to the device database to create the appropriate device drivers and make those drivers accessible as ``self.core`` and ``self.led``. The ``@kernel`` decorator tells the system that the ``run`` method must be executed on the core device (instead of the host). The decorator uses ``self.core`` internally, which is why we request the core device using ``setattr_device`` like any other.
Copy the file ``device_db.pyon`` (containing the device database) from the ``examples/master`` folder of ARTIQ into the same directory as ``led.py`` (alternatively, you can use the ``--device-db`` option of ``artiq_run``). You can open PYON database files using a text editor - their contents are in a human-readable format. You will probably want to set the IP address of the core device in ``device_db.pyon`` so that the computer can connect to it (it is the ``host`` parameter of the ``comm`` entry). See :ref:`device-db` for more information. The example device database is designed for the NIST QC1 hardware on the KC705; see :ref:`board-ports` for RTIO channel assignments if you need to adapt the device database to a different hardware platform.
If the ``led`` device is a bidirectional TTL (i.e. ``TTLInOut`` instead of ``TTLOut``), you need to put it in output (driving) mode. Add the following at the beginning of ``run``: ::
A method or function running on the core device (which we call a "kernel") may communicate with the host by calling non-kernel functions that may accept parameters and may return a value. The "remote procedure call" (RPC) mechanisms handle automatically the communication between the host and the device of which function to call, with which parameters, and what the returned value is.
What happens is the ARTIQ compiler notices that the ``input_led_state`` function does not have a ``@kernel`` decorator and thus must be executed on the host. When the core device calls it, it sends a request to the host to execute it. The host displays the prompt, collects user input, and sends the result back to the core device, which sets the LED state accordingly.
RPC functions must always return a value of the same type. When they return a non-``None`` value, the compiler should be informed in advance of the type of the value, which is what the ``-> TBool`` annotation is for.
The ``break_realtime`` call is necessary to waive the real-time requirements of the LED state change (as the ``input_led_state`` function can take an arbitrarily long time). This will become clearer later as we explain timing control.
A number of Python algorithmic features can be used inside a kernel for compilation and execution on the core device. They include ``for`` and ``while`` loops, conditionals (``if``, ``else``, ``elif``), functions, exceptions (without parameter), and statically typed variables of the following types:
For a demonstration of some of these features, see the ``mandelbrot.py`` example.
Real-time I/O
-------------
The point of running code on the core device is the ability to meet demanding real-time constraints. In particular, the core device can respond to an incoming stimulus or the result of a measurement with a low and predictable latency. We will see how to use inputs later; first, we must familiarize ourselves with how time is managed in kernels.
Create a new file ``rtio.py`` containing the following: ::
If ``ttl0`` is a bidirectional channel (``TTLInOut``), it is in input (non-driving) mode by default. You need to call ``self.ttl0.output()`` as explained above for the LED.
Connect an oscilloscope or logic analyzer to TTL0 and run ``artiq_run.py led.py``. Notice that the generated signal's period is precisely 4 microseconds, and that it has a duty cycle of precisely 50%. This is not what you would expect if the delay and the pulse were implemented with CPU-controlled GPIO: overhead from the loop management, function calls, etc. would increase the signal's period, and asymmetry in the overhead would cause duty cycle distortion.
Instead, inside the core device, output timing is generated by the gateware and the CPU only programs switching commands with certain timestamps that the CPU computes. This guarantees precise timing as long as the CPU can keep generating timestamps that are increasing fast enough. In case it fails to do that (and attempts to program an event with a timestamp in the past), the :class:`artiq.coredevice.exceptions.RTIOUnderflow` exception is raised. The kernel causing it may catch it (using a regular ``try... except...`` construct), or it will be propagated to the host.
Try reducing the period of the generated waveform until the CPU cannot keep up with the generation of switching events and the underflow exception is raised. Then try catching it: ::
The core device records the real-time I/O waveforms into a circular buffer. It is possible to dump any Python object so that it appears alongside the waveforms using the ``rtio_log`` function, which accepts a channel name (i.e. a log target) as the first argument: ::
Afterwards, the recorded data can be extracted and written to a VCD file using ``artiq_coreanalyzer -w rtio.vcd`` (see: :ref:`core-device-rtio-analyzer-tool`). VCD files can be viewed using third-party tools such as GtkWave.
It is often necessary that several pulses overlap one another. This can be expressed through the use of ``with parallel`` constructs, in which all statements execute at the same time. The execution time of the ``parallel`` block is the execution time of its longest statement.
Within a parallel block, some statements can be made sequential again using a ``with sequential`` construct. Observe the pulses generated by this code: ::
Branches of a ``parallel`` block are executed one after another, with a reset of the internal RTIO time variable before moving to the next branch. If a branch takes a lot of CPU time, it may cause an underflow when the next branch begins its execution.
The ARTIQ compiler runs many optimizations, most of which perform well on code that has pristine Python semantics. It also contains more powerful, and more invasive, optimizations that require opt-in to activate.
Fast-math flags
+++++++++++++++
The compiler does not normally perform algebraically equivalent transformations on floating-point expressions, because this can dramatically change the result. However, it can be instructed to do so if all of the following is true:
* Arguments and results will not be not-a-number or infinity values;
* The sign of a zero value is insignificant;
* Any algebraically equivalent transformations, such as reassociation or replacing division with multiplication by reciprocal, are legal to perform.
If this is the case for a given kernel, a ``fast-math`` flag can be specified to enable more aggressive optimization for this specific kernel: ::
@kernel(flags={"fast-math"})
def calculate(x, y, z):
return x * z + y * z
This flag particularly benefits loops with I/O delays performed in fractional seconds rather than machine units, as well as updates to DDS phase and frequency.
Kernel invariants
+++++++++++++++++
The compiler attempts to remove or hoist out of loops any redundant memory load operations, as well as propagate known constants into function bodies, which can enable further optimization. However, it must make conservative assumptions about code that it is unable to observe, because such code can change the value of the attribute, making the optimization invalid.
When an attribute is known to never change while the kernel is running, it can be marked as a *kernel invariant* to enable more aggressive optimization for this specific attribute: ::
class Converter:
kernel_invariants = {"ratio"}
def __init__(self, ratio=1.0):
self.ratio = ratio
@kernel
def convert(self, value):
return value * self.ratio ** 2
In the synthetic example above, the compiler will be able to detect that the result of evaluating ``self.ratio ** 2`` never changes and replace it with a constant, removing an expensive floating-point operation.