doc: Rewrite and overhaul of 'Compiler' page

This commit is contained in:
architeuthis 2024-06-27 10:32:55 +08:00 committed by Sébastien Bourdeauducq
parent 9087c8698d
commit 81106f3567
2 changed files with 138 additions and 68 deletions

View File

@ -1,34 +1,52 @@
Compiler
========
The ARTIQ compiler transforms the Python code of the kernels into machine code executable on the core device. It is invoked automatically when calling a function that uses the ``@kernel`` decorator.
The ARTIQ compiler transforms the Python code of the kernels into machine code executable on the core device. For limited purposes (normally, obtaining executable binaries of idle and startup kernels), it can be accessed through ``artiq_compile``. Otherwise it is invoked automatically whenever a function with an applicable decorator is called.
Supported Python features
-------------------------
ARTIQ kernel code accepts *nearly,* but not quite, a strict subset of Python 3. The necessities of real-time operation impose a harsher set of limitations; as a result, many Python features are necessarily omitted, and there are some specific discrepancies (see also :ref:`compiler-pitfalls`).
A number of Python features can be used inside a kernel for compilation and execution on the core device. They include ``for`` and ``while`` loops, conditionals (``if``, ``else``, ``elif``), functions, exceptions, and statically typed variables of the following types:
In general, ARTIQ Python supports only statically typed variables; it implements no heap allocation or garbage collection systems, essentially disallowing any heap-based data structures (although lists and arrays remain available in a stack-based form); and it cannot use runtime dispatch, meaning that, for example, all elements of an array must be of the same type. Nonetheless, technical details aside, a basic knowledge of Python is entirely sufficient to write useful and coherent ARTIQ experiments.
* Booleans
* 32-bit signed integers (default size)
* 64-bit signed integers (use ``numpy.int64`` to convert)
* Double-precision floating point numbers
* Lists of any supported types
* String constants
* User-defined classes, with attributes of any supported types (attributes that are not used anywhere in the kernel are ignored)
.. note::
The ARTIQ compiler is now in its second iteration. The third generation, known as NAC3, is `currently in development <https://git.m-labs.hk/M-Labs/nac3>`_, and available for pre-alpha experimental use. NAC3 represents a major overhaul of ARTIQ compilation, and will feature much faster compilation speeds, a greatly improved type system, and more predictable and transparent operation. It is compatible with ARTIQ firmware starting at ARTIQ-7. Instructions for installation and basic usage differences can also be found `on the M-Labs Forum <https://forum.m-labs.hk/d/392-nac3-new-artiq-compiler-3-prealpha-release>`_. While NAC3 is a work in progress and many important features remain unimplemented, installation and feedback is welcomed.
For a demonstration of some of these features, see the ``mandelbrot.py`` example.
ARTIQ Python code
-----------------
When several instances of a user-defined class are referenced from the same kernel, every attribute must have the same type in every instance of the class.
A variety of short experiments can be found in the subfolders of ``artiq/examples``, especially under ``kc705_nist_clock/repository`` and ``no_hardware/repository``. Reading through these will give you a general idea of what ARTIQ Python is capable of and how to use it.
Remote procedure calls
----------------------
Functions and decorators
^^^^^^^^^^^^^^^^^^^^^^^^^
Kernel code can call host functions without any additional ceremony. However, such functions are assumed to return `None`, and if a value other than `None` is returned, an exception is raised. To call a host function returning a value other than `None` its return type must be annotated using the standard Python syntax, e.g.: ::
The ARTIQ compiler recognizes several specialized decorators, which determine the way the decorated function will be compiled and handled.
def return_four() -> TInt32:
return 4
``@kernel`` (see :meth:`~artiq.language.core.kernel`) designates kernel functions, which will be compiled for and wholly executed on the core device; the basic setup and background for kernels is detailed on the :doc:`getting_started_core` page. ``@subkernel`` (:meth:`~artiq.language.core.subkernel`) designates subkernel functions, which are largely similar to kernels except that they are executed on satellite devices in a DRTIO setting, with some associated limitations; they are described in more detail on the :doc:`using_drtio_subkernels` page.
The Python types correspond to ARTIQ type annotations as follows:
``@rpc`` (:meth:`~artiq.language.core.rpc`) designates functions to be executed on the host machine, which are compiled and run in regular Python, outside of the core device's real-time limitations. Notably, functions without decorators are assumed to be host-bound by default, and treated identically to an explicitly marked ``@rpc``. As a result, the explicit decorator is only really necessary when specifying additional flags (for example, ``flags={"async"}``, see below).
``@portable`` (:meth:`~artiq.language.core.portable`) designates functions to be executed *on the same device they are called.* In other words, when called from a kernel, a portable is executed as a kernel; when called from a subkernel, it is executed as a kernel, on the same satellite device as the calling subkernel; when called from a host function, it is executed on the host machine.
``@host_only`` (:meth:`~artiq.language.core.host_only`) functions are executed fully on the host, similarly to ``@rpc``, but calling them from a kernel as an RPC will be refused by the compiler. It can be used to mark functions which should only ever be called by the host.
.. warning::
ARTIQ goes to some lengths to cache code used in experiments correctly, so that experiments run according to the state of the code when they were started, even if the source is changed during the run time. Python itself annoyingly fails to implement this (see also `issue #416 <https://github.com/m-labs/artiq/issues/416>`_), necessitating a workaround on ARTIQ's part. One particular downstream limitation is that the ARTIQ compiler is unable to recognize decorators with path prefixes, i.e.: ::
import artiq.experiment as aq
[...]
@aq.kernel
def run(self):
pass
will fail to compile. As long as ``from artiq.experiment import *`` is used as in the examples, this is never an issue. If prefixes are strongly preferred, a possible workaround is to import decorators separately, as e.g. ``from artiq.language.core import kernel``.
.. _compiler-types:
ARTIQ types
^^^^^^^^^^^
Python/NumPy types correspond to ARTIQ types as follows:
+---------------+-------------------------+
| Python | ARTIQ |
@ -43,6 +61,10 @@ The Python types correspond to ARTIQ type annotations as follows:
+---------------+-------------------------+
| str | TStr |
+---------------+-------------------------+
| bytes | TBytes |
+---------------+-------------------------+
| bytearray | TByteArray |
+---------------+-------------------------+
| list of T | TList(T) |
+---------------+-------------------------+
| NumPy array | TArray(T, num_dims) |
@ -54,56 +76,109 @@ The Python types correspond to ARTIQ type annotations as follows:
| numpy.int64 | TInt64 |
+---------------+-------------------------+
| numpy.float64 | TFloat |
+---------------+-------------------------+
+---------------+-------------------------+
Integers are 32-bit by default but may be converted to 64-bit with ``numpy.int64``.
The ARTIQ compiler can be thought of as overriding all built-in Python types, and types in kernel code cannot always be assumed to behave as they would in host Python. In particular, normally heap-allocated types such as arrays, lists, and strings are very limited in what they support. Strings must be constant and lists and arrays must be of constant size. Methods like ``append``, ``push``, and ``pop`` are unavailable as a matter of principle, and will not compile. Certain types, notably dictionaries, have no ARTIQ implementation and cannot be used in kernels at all.
.. tip::
Instead of pushing or appending, preallocate for the maximum number of elements you expect with a list comprehension, i.e. ``x = [0 for _ in range(1024)]``, and then keep a variable ``n`` noting the last filled element of the array. Afterwards, ``x[0:n]`` will give you a list with that number of elements.
Multidimensional arrays are allowed (using NumPy syntax). Element-wise operations (e.g. ``+``, ``/``), matrix multiplication (``@``) and multidimensional indexing are supported; slices and views (currently) are not.
User-defined classes are supported, provided their attributes are of other supported types (attributes that are not used in the kernel are ignored and thus unrestricted). When several instances of a user-defined class are referenced from the same kernel, every attribute must have the same type in every instance of the class.
Basic ARTIQ Python
^^^^^^^^^^^^^^^^^^
Basic Python features can broadly be used inside kernels without further compunctions. This includes loops (``for`` / ``while`` / ``break`` / ``continue``), conditionals (``if``, ``else``, ``elif``), functions, exceptions, ``try`` / ``except`` / ``else`` blocks, and statically typed variables of any supported types.
Kernel code can call host functions without any additional ceremony. However, such functions are assumed to return ``None``, and if a value other than ``None`` is returned, an exception is raised. To call a host function returning a value other than ``None`` its return type must be annotated, using the standard Python syntax, e.g.: ::
def return_four() -> TInt32:
return 4
Kernels can freely modify attributes of objects shared with the host. However, by necessity, these modifications are actually applied to local copies of the objects, as the latency of immediate writeback would be unsupportable in a real-time environment. Instead, modifications are written back *when the kernel completes;* notably, this means RPCs called by a kernel itself will only have access to the unmodified host version of the object, as the kernel hasn't finished execution yet. In some cases, accessing data on the host is better handled by calling RPCs specifically to make the desired modifications.
.. warning::
Kernels *cannot and should not* return lists, arrays, or strings they have created, or any objects containing them; in the absence of a heap, the way these values are allocated means they cannot outlive the kernels they are created in. Trying to do so will normally be discovered by lifetime tracking and result in compilation errors, but in certain cases lifetime tracking will fail to detect a problem and experiments will encounter memory corruption at runtime. For example: ::
def func(a):
return a
class ProblemReturn1(EnvExperiment):
def build(self):
self.setattr_device("core")
@kernel
def run(self):
# results in memory corruption
return func([1, 2, 3])
will compile, **but corrupts at runtime.** On the other hand, lists, arrays, or strings can and should be used as inputs for RPCs, and this is the preferred method of returning data to the host. In this way the data is inherently read and sent before the kernel completes and there are no allocation issues.
Available built-in functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ARTIQ makes various useful built-in and mathematical functions from Python, NumPy, and SciPy available in kernel code. They are not guaranteed to be perfectly equivalent to their host namesakes (for example, ``numpy.rint()`` normally rounds-to-even, but in kernel code rounds toward zero) but their behavior should be basically predictable.
.. list-table::
:header-rows: 1
+ * Reference
* Functions
+ * `Python built-ins <https://docs.python.org/3/library/functions.html>`_
* - ``len()``, ``round()``, ``abs()``, ``min()``, ``max()``
- ``print()`` (for specifics see warning below)
- all basic type conversions (``int()``, ``float()`` etc.)
+ * `NumPy mathematic utilities <https://numpy.org/doc/stable/reference/routines.math.html>`_
* - ``sqrt()``, ``cbrt```
- ``fabs()``, ``fmax()``, ``fmin()``
- ``floor()``, ``ceil()``, ``trunc()``, ``rint()``
+ * `NumPy exponents and logarithms <https://numpy.org/doc/stable/reference/routines.math.html#exponents-and-logarithms>`_
* - ``exp()``, ``exp2()``, ``expm1()``
- ``log()``, ``log2()``, ``log10()``
+ * `NumPy trigonometric and hyperbolic functions <https://numpy.org/doc/stable/reference/routines.math.html#trigonometric-functions>`_
* - ``sin()``, ``cos()``, ``tan()``,
- ``arcsin()``, ``arccos()``, ``arctan()``
- ``sinh()``, ``cosh()``, ``tanh()``
- ``arcsinh()``, ``arccosh()``, ``arctanh()``
- ``hypot()``, ``arctan2()``
+ * `NumPy floating point routines <https://numpy.org/doc/stable/reference/routines.math.html#floating-point-routines>`_
* - ``copysign``, ``nextafter()``
+ * `SciPy special functions <https://docs.scipy.org/doc/scipy/reference/special.html>`_
* - ``erf()``, ``erfc()``
- ``gamma()``, ``gammaln()``
- ``j0()``, ``j1()``, ``y0()``, ``y1()``
Basic NumPy array handling (``np.array()``, ``numpy.transpose()``, ``numpy.full``, ``@``, element-wise operation, etc.) is also available. NumPy functions are implicitly broadcast when applied to arrays.
.. warning::
A kernel ``print`` call normally specifically prints to the host machine, either the terminal of ``artiq_run`` or the dashboard log when using ``artiq_dashboard``. This makes it effectively an RPC, with some of the corresponding limitations. In subkernels and whenever compiled through ``artiq_compile``, where RPCs are not supported, it is instead considered a print to the local log (UART only in the case of satellites, UART and core log for master/standalone devices).
.. _compiler-pitfalls:
Pitfalls
--------
The ARTIQ compiler accepts *nearly* a strict subset of Python 3. However, by necessity there
is a number of differences that can lead to bugs.
Arbitrary-length integers are not supported at all on the core device; all integers are
either 32-bit or 64-bit. This especially affects calculations that result in a 32-bit signed
overflow; if the compiler detects a constant that doesn't fit into 32 bits, the entire expression
will be upgraded to 64-bit arithmetics, however if all constants are small, 32-bit arithmetics
will be used even if the result will overflow. Overflows are not detected.
The result of calling the builtin ``round`` function is different when used with
the builtin ``float`` type and the ``numpy.float64`` type on the host interpreter; ``round(1.0)``
returns an integer value 1, whereas ``round(numpy.float64(1.0))`` returns a floating point value
``numpy.float64(1.0)``. Since both ``float`` and ``numpy.float64`` are mapped to
the builtin ``float`` type on the core device, this can lead to problems in functions marked
``@portable``; the workaround is to explicitly cast the argument of ``round`` to ``float``:
``round(float(numpy.float64(1.0)))`` returns an integer on the core device as well as on the host
interpreter.
Empty lists do not have valid list element types, so they cannot be used in the kernel.
In kernels, lifetime of allocated values (e.g. lists or numpy arrays) might not be correctly
tracked across function calls (see `#1497 <https://github.com/m-labs/artiq/issues/1497>`_,
`#1677 <https://github.com/m-labs/artiq/issues/1677>`_) like in this example ::
Arbitrary-length integers are not supported at all on the core device; all integers are either 32-bit or 64-bit. This especially affects calculations that result in a 32-bit signed overflow. If the compiler detects a constant that can't fit into 32 bits, the entire expression will be upgraded to 64-bit arithmetic, but if all constants are small, 32-bit arithmetic is used even if the result will overflow. Overflows are not detected.
@kernel
def func(a):
return a
The result of calling the builtin ``round`` function is different when used with the builtin ``float`` type and the ``numpy.float64`` type on the host interpreter; ``round(1.0)`` returns an integer value 1, whereas ``round(numpy.float64(1.0))`` returns a floating point value ``numpy.float64(1.0)``. Since both ``float`` and ``numpy.float64`` are mapped to the builtin ``float`` type on the core device, this can lead to problems in functions marked ``@portable``; the workaround is to explicitly cast the argument of ``round`` to ``float``: ``round(float(numpy.float64(1.0)))`` returns an integer on the core device as well as on the host interpreter.
class ProblemReturn1(EnvExperiment):
def build(self):
self.setattr_device("core")
Flags and optimizations
-----------------------
@kernel
def run(self):
# results in memory corruption
return func([1, 2, 3])
This results in memory corruption at runtime.
The ARTIQ compiler runs many optimizations, most of which perform well on code that has pristine Python semantics. It also contains more powerful, and more invasive, optimizations that require opt-in to activate.
Asynchronous RPCs
-----------------
^^^^^^^^^^^^^^^^^
If an RPC returns no value, it can be invoked in a way that does not block until the RPC finishes
execution, but only until it is queued. (Submitting asynchronous RPCs too rapidly, as well as
submitting asynchronous RPCs with arguments that are too large, can still block until completion.)
If an RPC returns no value, it can be invoked in a way that does not block until the RPC finishes execution, but only until it is queued. (Submitting asynchronous RPCs too rapidly, as well as submitting asynchronous RPCs with arguments that are too large, can still block until completion.)
To define an asynchronous RPC, use the ``@rpc`` annotation with a flag: ::
@ -111,17 +186,12 @@ To define an asynchronous RPC, use the ``@rpc`` annotation with a flag: ::
def record_result(x):
self.results.append(x)
Additional optimizations
------------------------
The ARTIQ compiler runs many optimizations, most of which perform well on code that has pristine Python semantics. It also contains more powerful, and more invasive, optimizations that require opt-in to activate.
Fast-math flags
+++++++++++++++
^^^^^^^^^^^^^^^
The compiler does not normally perform algebraically equivalent transformations on floating-point expressions, because this can dramatically change the result. However, it can be instructed to do so if all of the following is true:
The compiler does not normally perform algebraically equivalent transformations on floating-point expressions, because this can dramatically change the result. However, it can be instructed to do so if all of the following are true:
* Arguments and results will not be not-a-number or infinity values;
* Arguments and results will not be Not-a-Number or infinite;
* The sign of a zero value is insignificant;
* Any algebraically equivalent transformations, such as reassociation or replacing division with multiplication by reciprocal, are legal to perform.
@ -134,7 +204,7 @@ If this is the case for a given kernel, a ``fast-math`` flag can be specified to
This flag particularly benefits loops with I/O delays performed in fractional seconds rather than machine units, as well as updates to DDS phase and frequency.
Kernel invariants
+++++++++++++++++
^^^^^^^^^^^^^^^^^
The compiler attempts to remove or hoist out of loops any redundant memory load operations, as well as propagate known constants into function bodies, which can enable further optimization. However, it must make conservative assumptions about code that it is unable to observe, because such code can change the value of the attribute, making the optimization invalid.
@ -171,7 +241,7 @@ In the synthetic example above, the compiler will be able to detect that the res
delay(self.worker.interval / 5.0)
self.worker.work()
In the synthetic example above, the compiler will be able to detect that the result of evaluating ``self.interval / 5.0`` never changes, even though it neither knows the value of ``self.worker.interval`` beforehand nor can it see through the ``self.worker.work()`` function call, and hoist the expensive floating-point division out of the loop, transforming the code for ``loop`` into an equivalent of the following: ::
In the synthetic example above, the compiler will be able to detect that the result of evaluating ``self.interval / 5.0`` never changes, even though it neither knows the value of ``self.worker.interval`` beforehand nor can see through the ``self.worker.work()`` function call, and thus can hoist the expensive floating-point division out of the loop, transforming the code for ``loop`` into an equivalent of the following: ::
@kernel
def loop(self):

View File

@ -73,7 +73,7 @@ You can then turn the LED off and on by entering 0 or 1 at the prompt that appea
What happens is that the ARTIQ compiler notices that the :meth:`input_led_state` function does not have a ``@kernel`` decorator (:func:`~artiq.language.core.kernel`) and thus must be executed on the host. When the function is called on the core device, it sends a request to the host, which executes it. The core device waits until the host returns, and then continues the kernel; in this case, the host displays the prompt, collects user input, and the core device sets the LED state accordingly.
The return type of all RPC functions must be known in advance. If the return value is not ``None``, the compiler requires a type annotation, like ``-> TBool`` in the example above.
The return type of all RPC functions must be known in advance. If the return value is not ``None``, the compiler requires a type annotation, like ``-> TBool`` in the example above. See also :ref:`compiler-types`.
Without the :meth:`~artiq.coredevice.core.Core.break_realtime` call, the RTIO events emitted by :func:`self.led.on()` or :func:`self.led.off()` would be scheduled at a fixed and very short delay after entering :meth:`~artiq.language.environment.Experiment.run()`. These events would fail because the RPC to :meth:`input_led_state()` can take an arbitrarily long amount of time, and therefore the deadline for the submission of RTIO events would have long passed when :func:`self.led.on()` or :func:`self.led.off()` are called (that is, the ``rtio_counter_mu`` wall clock will have advanced far ahead of the timeline cursor ``now_mu``, and an :exc:`~artiq.coredevice.exceptions.RTIOUnderflow` would result; see :ref:`artiq-real-time-i-o-concepts` for the full explanation of wall clock vs. timeline.) The :meth:`~artiq.coredevice.core.Core.break_realtime` call is necessary to waive the real-time requirements of the LED state change. Rather than delaying by any particular time interval, it reads ``rtio_counter_mu`` and moves up the ``now_mu`` cursor far enough to ensure it's once again safely ahead of the wall clock.