artiq/doc/manual/getting_started_core.rst

Getting started with the core language
======================================

.. _connecting-to-the-core-device:

Connecting to the core device
-----------------------------

As a very first step, we will turn on a LED on the core device. Create a file ``led.py`` containing the following: ::

    from artiq.experiment import *


    class LED(EnvExperiment):
        def build(self):
            self.setattr_device("core")
            self.setattr_device("led")

        @kernel
        def run(self):
            self.led.on()

The central part of our code is our ``LED`` class, that derives from :class:`artiq.language.environment.EnvExperiment`. Among other features, ``EnvExperiment`` calls our ``build`` method and provides the ``setattr_device`` method that interfaces to the device database to create the appropriate device drivers and make those drivers accessible as ``self.core`` and ``self.led``. The ``@kernel`` decorator tells the system that the ``run`` method must be executed on the core device (instead of the host). The decorator uses ``self.core`` internally, which is why we request the core device using ``setattr_device`` like any other.

Copy the file ``device_db.pyon`` (containing the device database) from the ``examples/master`` folder of ARTIQ into the same directory as ``led.py`` (alternatively, you can use the ``--device-db`` option of ``artiq_run``). You can open PYON database files using a text editor - their contents are in a human-readable format. You will probably want to set the IP address of the core device in ``device_db.pyon`` so that the computer can connect to it (it is the ``host`` parameter of the ``comm`` entry). See :ref:`device-db` for more information. The example device database is designed for the NIST QC1 hardware on the KC705; see :ref:`board-ports` for RTIO channel assignments if you need to adapt the device database to a different hardware platform.

.. note::
    If the ``led`` device is a bidirectional TTL (i.e. ``TTLInOut`` instead of ``TTLOut``), you need to put it in output (driving) mode. Add the following at the beginning of ``run``: ::

        self.led.output()
        delay(0.1*us)

Run your code using ``artiq_run``, which is part of the ARTIQ front-end tools: ::

    $ artiq_run led.py

The LED of the device should turn on. Congratulations! You have a basic ARTIQ system up and running.

Host/core device interaction
----------------------------

A method or function running on the core device (which we call a "kernel") may communicate with the host by calling non-kernel functions that may accept parameters and may return a value. The "remote procedure call" (RPC) mechanisms handle automatically the communication between the host and the device of which function to call, with which parameters, and what the returned value is.

Modify the code as follows: ::

    def input_led_state() -> TBool:
        return bool(input("Enter desired LED state: "))

    class LED(EnvExperiment):
        def build(self):
            self.setattr_device("core")
            self.setattr_device("led")

        @kernel
        def run(self):
            s = input_led_state()
            self.core.break_realtime()
            if s:
                self.led.on()
            else:
                self.led.off()


You can then turn the LED off and on by entering 0 or 1 at the prompt that appears: ::

    $ artiq_run led.py
    Enter desired LED state: 1
    $ artiq_run led.py
    Enter desired LED state: 0

What happens is the ARTIQ compiler notices that the ``input_led_state`` function does not have a ``@kernel`` decorator and thus must be executed on the host. When the core device calls it, it sends a request to the host to execute it. The host displays the prompt, collects user input, and sends the result back to the core device, which sets the LED state accordingly.

RPC functions must always return a value of the same type. When they return a non-``None`` value, the compiler should be informed in advance of the type of the value, which is what the ``-> TBool`` annotation is for.

The ``break_realtime`` call is necessary to waive the real-time requirements of the LED state change (as the ``input_led_state`` function can take an arbitrarily long time). This will become clearer later as we explain timing control.

Algorithmic features
--------------------

A number of Python algorithmic features can be used inside a kernel for compilation and execution on the core device. They include ``for`` and ``while`` loops, conditionals (``if``, ``else``, ``elif``), functions, exceptions (without parameter), and statically typed variables of the following types:

* Booleans
* 32-bit signed integers (default size)
* 64-bit signed integers (use ``int(n, width=64)`` to convert)
* Signed rational numbers with 64-bit numerator and 64-bit denominator
* Double-precision floating point numbers
* Lists of the above types. Lists can be arbitrarily nested.

For a demonstration of some of these features, see the ``mandelbrot.py`` example.

Real-time I/O
-------------

The point of running code on the core device is the ability to meet demanding real-time constraints. In particular, the core device can respond to an incoming stimulus or the result of a measurement with a low and predictable latency. We will see how to use inputs later; first, we must familiarize ourselves with how time is managed in kernels.

Create a new file ``rtio.py`` containing the following: ::

    from artiq.experiment import *


    class Tutorial(EnvExperiment):
        def build(self):
            self.setattr_device("core")
            self.setattr_device("ttl0")

        @kernel
        def run(self):
            for i in range(1000000):
                self.ttl0.pulse(2*us)
                delay(2*us)


.. note::
    If ``ttl0`` is a bidirectional channel (``TTLInOut``), it is in input (non-driving) mode by default. You need to call ``self.ttl0.output()`` as explained above for the LED.


Connect an oscilloscope or logic analyzer to TTL0 and run ``artiq_run.py led.py``. Notice that the generated signal's period is precisely 4 microseconds, and that it has a duty cycle of precisely 50%. This is not what you would expect if the delay and the pulse were implemented with CPU-controlled GPIO: overhead from the loop management, function calls, etc. would increase the signal's period, and asymmetry in the overhead would cause duty cycle distortion.

Instead, inside the core device, output timing is generated by the gateware and the CPU only programs switching commands with certain timestamps that the CPU computes. This guarantees precise timing as long as the CPU can keep generating timestamps that are increasing fast enough. In case it fails to do that (and attempts to program an event with a timestamp in the past), the :class:`artiq.coredevice.exceptions.RTIOUnderflow` exception is raised. The kernel causing it may catch it (using a regular ``try... except...`` construct), or it will be propagated to the host.

Try reducing the period of the generated waveform until the CPU cannot keep up with the generation of switching events and the underflow exception is raised. Then try catching it: ::

    from artiq.experiment import *


    def print_underflow():
        print("RTIO underflow occured")

    class Tutorial(EnvExperiment):
        def build(self):
            self.setattr_device("core")
            self.setattr_device("ttl0")

        @kernel
        def run(self):
            try:
                for i in range(1000000):
                    self.ttl0.pulse(...)
                    delay(...)
            except RTIOUnderflow:
                print_underflow()

RTIO analyzer
-------------

The core device records the real-time I/O waveforms into a circular buffer. It is possible to dump any Python object so that it appears alongside the waveforms using the ``rtio_log`` function, which accepts a channel name (i.e. a log target) as the first argument: ::

    from artiq.experiment import *


    class Tutorial(EnvExperiment):
        def build(self):
            self.setattr_device("core")
            self.setattr_device("ttl0")

        @kernel
        def run(self):
            for i in range(100):
                self.ttl0.pulse(...)
                rtio_log("ttl0", "i", i)
                delay(...)

Afterwards, the recorded data can be extracted and written to a VCD file using ``artiq_coreanalyzer -w rtio.vcd`` (see: :ref:`core-device-rtio-analyzer-tool`). VCD files can be viewed using third-party tools such as GtkWave.

Parallel and sequential blocks
------------------------------

It is often necessary that several pulses overlap one another. This can be expressed through the use of ``with parallel`` constructs, in which all statements execute at the same time. The execution time of the ``parallel`` block is the execution time of its longest statement.

Try the following code and observe the generated pulses on a 2-channel oscilloscope or logic analyzer: ::

    for i in range(1000000):
        with parallel:
            self.ttl0.pulse(2*us)
            self.ttl1.pulse(4*us)
        delay(4*us)

Within a parallel block, some statements can be made sequential again using a ``with sequential`` construct. Observe the pulses generated by this code: ::

    for i in range(1000000):
        with parallel:
            with sequential:
                self.ttl0.pulse(2*us)
                delay(1*us)
                self.ttl0.pulse(1*us)
            self.ttl1.pulse(4*us)
        delay(4*us)

.. note::
    Branches of a ``parallel`` block are executed one after another, with a reset of the internal RTIO time variable before moving to the next branch. If a branch takes a lot of CPU time, it may cause an underflow when the next branch begins its execution.

Additional optimizations
------------------------

The ARTIQ compiler runs many optimizations, most of which perform well on code that has pristine Python semantics. It also contains more powerful, and more invasive, optimizations that require opt-in to activate.

Fast-math flags
+++++++++++++++

The compiler does not normally perform algebraically equivalent transformations on floating-point expressions, because this can dramatically change the result. However, it can be instructed to do so if all of the following is true:

* Arguments and results will not be not-a-number or infinity values;
* The sign of a zero value is insignificant;
* Any algebraically equivalent transformations, such as reassociation or replacing division with multiplication by reciprocal, are legal to perform.

If this is the case for a given kernel, a ``fast-math`` flag can be specified to enable more aggressive optimization for this specific kernel: ::

    @kernel(flags={"fast-math"})
    def calculate(x, y, z):
        return x * z + y * z

This flag particularly benefits loops with I/O delays performed in fractional seconds rather than machine units, as well as updates to DDS phase and frequency.

Kernel invariants
+++++++++++++++++

The compiler attempts to remove or hoist out of loops any redundant memory load operations, as well as propagate known constants into function bodies, which can enable further optimization. However, it must make conservative assumptions about code that it is unable to observe, because such code can change the value of the attribute, making the optimization invalid.

When an attribute is known to never change while the kernel is running, it can be marked as a *kernel invariant* to enable more aggressive optimization for this specific attribute: ::

    class Converter:
        kernel_invariants = {"ratio"}

        def __init__(self, ratio=1.0):
            self.ratio = ratio

        @kernel
        def convert(self, value):
            return value * self.ratio ** 2

In the synthetic example above, the compiler will be able to detect that the result of evaluating ``self.ratio ** 2`` never changes and replace it with a constant, removing an expensive floating-point operation.
manual: management system tutorial, Git integration 2015-08-26 20:23:50 +08:00			`Getting started with the core language`
			`======================================`
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
manual: explain how to compile and flash the idle kernel 2015-05-29 22:30:32 +08:00			`.. _connecting-to-the-core-device:`

doc: add some tutorial steps 2014-09-29 14:50:29 +08:00			`Connecting to the core device`
			`-----------------------------`

			As a very first step, we will turn on a LED on the core device. Create a file ``led.py`` containing the following: ::

artiq.experiment: merge language and coredevice namespaces perl -i -pe 's/^from artiq import \$/from artiq.experiment import /' your_experiments/*.py (assuming you skipped the changes form the previous commit) 2016-01-26 06:49:08 +08:00			`from artiq.experiment import *`
device and parameter database 2014-12-03 18:20:30 +08:00
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
refactor ddb/pdb/rdb 2015-07-14 04:08:20 +08:00			`class LED(EnvExperiment):`
			`def build(self):`
environment: attr_* -> setattr_* 2015-10-04 00:18:21 +08:00			`self.setattr_device("core")`
			`self.setattr_device("led")`
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
doc: various cleanups 2014-09-30 18:10:40 +08:00			`@kernel`
			`def run(self):`
coredevice/gpio: replace set() with on()/off() to make API consistent 2014-11-20 04:33:33 +08:00			`self.led.on()`
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
environment: attr_* -> setattr_* 2015-10-04 00:18:21 +08:00			The central part of our code is our ``LED`` class, that derives from :class:`artiq.language.environment.EnvExperiment`. Among other features, ``EnvExperiment`` calls our ``build`` method and provides the ``setattr_device`` method that interfaces to the device database to create the appropriate device drivers and make those drivers accessible as ``self.core`` and ``self.led``. The ``@kernel`` decorator tells the system that the ``run`` method must be executed on the core device (instead of the host). The decorator uses ``self.core`` internally, which is why we request the core device using ``setattr_device`` like any other.
device and parameter database 2014-12-03 18:20:30 +08:00
databases: assume empty dataset_db if file not present. Closes #156 2015-10-24 10:15:20 +08:00			Copy the file ``device_db.pyon`` (containing the device database) from the ``examples/master`` folder of ARTIQ into the same directory as ``led.py`` (alternatively, you can use the ``--device-db`` option of ``artiq_run``). You can open PYON database files using a text editor - their contents are in a human-readable format. You will probably want to set the IP address of the core device in ``device_db.pyon`` so that the computer can connect to it (it is the ``host`` parameter of the ``comm`` entry). See :ref:`device-db` for more information. The example device database is designed for the NIST QC1 hardware on the KC705; see :ref:`board-ports` for RTIO channel assignments if you need to adapt the device database to a different hardware platform.
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
doc/manual: add note about bidirectional TTL in tutorial 2015-08-28 16:29:56 +08:00			`.. note::`
			If the ``led`` device is a bidirectional TTL (i.e. ``TTLInOut`` instead of ``TTLOut``), you need to put it in output (driving) mode. Add the following at the beginning of ``run``: ::

			`self.led.output()`
			`delay(0.1*us)`

refactor ddb/pdb/rdb 2015-07-14 04:08:20 +08:00			Run your code using ``artiq_run``, which is part of the ARTIQ front-end tools: ::
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
refactor ddb/pdb/rdb 2015-07-14 04:08:20 +08:00			`$ artiq_run led.py`
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
			`The LED of the device should turn on. Congratulations! You have a basic ARTIQ system up and running.`

doc/manual: more tutorial steps 2014-09-29 20:20:14 +08:00			`Host/core device interaction`
			`----------------------------`
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
			`A method or function running on the core device (which we call a "kernel") may communicate with the host by calling non-kernel functions that may accept parameters and may return a value. The "remote procedure call" (RPC) mechanisms handle automatically the communication between the host and the device of which function to call, with which parameters, and what the returned value is.`

			`Modify the code as follows: ::`

doc/tutorial: add missing type annotation in LED example. Closes #356 2016-03-29 14:53:52 +08:00			`def input_led_state() -> TBool:`
			`return bool(input("Enter desired LED state: "))`
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
refactor ddb/pdb/rdb 2015-07-14 04:08:20 +08:00			`class LED(EnvExperiment):`
			`def build(self):`
environment: attr_* -> setattr_* 2015-10-04 00:18:21 +08:00			`self.setattr_device("core")`
			`self.setattr_device("led")`
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
doc: various cleanups 2014-09-30 18:10:40 +08:00			`@kernel`
			`def run(self):`
manual: adapt to LED on RTIO 2015-04-17 10:43:50 +08:00			`s = input_led_state()`
recover_underflow -> break_realtime 2015-05-03 20:42:42 +08:00			`self.core.break_realtime()`
manual: adapt to LED on RTIO 2015-04-17 10:43:50 +08:00			`if s:`
coredevice/gpio: replace set() with on()/off() to make API consistent 2014-11-20 04:33:33 +08:00			`self.led.on()`
			`else:`
			`self.led.off()`
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
manual: adapt to LED on RTIO 2015-04-17 10:43:50 +08:00
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00			`You can then turn the LED off and on by entering 0 or 1 at the prompt that appears: ::`

refactor ddb/pdb/rdb 2015-07-14 04:08:20 +08:00			`$ artiq_run led.py`
doc: various cleanups 2014-09-30 18:10:40 +08:00			`Enter desired LED state: 1`
refactor ddb/pdb/rdb 2015-07-14 04:08:20 +08:00			`$ artiq_run led.py`
doc: various cleanups 2014-09-30 18:10:40 +08:00			`Enter desired LED state: 0`
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
			What happens is the ARTIQ compiler notices that the ``input_led_state`` function does not have a ``@kernel`` decorator and thus must be executed on the host. When the core device calls it, it sends a request to the host to execute it. The host displays the prompt, collects user input, and sends the result back to the core device, which sets the LED state accordingly.

doc/tutorial: add missing type annotation in LED example. Closes #356 2016-03-29 14:53:52 +08:00			RPC functions must always return a value of the same type. When they return a non-``None`` value, the compiler should be informed in advance of the type of the value, which is what the ``-> TBool`` annotation is for.

recover_underflow -> break_realtime 2015-05-03 20:42:42 +08:00			The ``break_realtime`` call is necessary to waive the real-time requirements of the LED state change (as the ``input_led_state`` function can take an arbitrarily long time). This will become clearer later as we explain timing control.
manual: adapt to LED on RTIO 2015-04-17 10:43:50 +08:00
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00			`Algorithmic features`
			`--------------------`

			A number of Python algorithmic features can be used inside a kernel for compilation and execution on the core device. They include ``for`` and ``while`` loops, conditionals (``if``, ``else``, ``elif``), functions, exceptions (without parameter), and statically typed variables of the following types:

			`* Booleans`
			`* 32-bit signed integers (default size)`
doc: update mention of int64. 2015-11-24 20:24:38 +08:00			* 64-bit signed integers (use ``int(n, width=64)`` to convert)
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00			`* Signed rational numbers with 64-bit numerator and 64-bit denominator`
			`* Double-precision floating point numbers`
doc: nested lists are now supported. 2015-11-24 17:53:40 +08:00			`* Lists of the above types. Lists can be arbitrarily nested.`
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
			For a demonstration of some of these features, see the ``mandelbrot.py`` example.

			`Real-time I/O`
			`-------------`

			`The point of running code on the core device is the ability to meet demanding real-time constraints. In particular, the core device can respond to an incoming stimulus or the result of a measurement with a low and predictable latency. We will see how to use inputs later; first, we must familiarize ourselves with how time is managed in kernels.`

			Create a new file ``rtio.py`` containing the following: ::

artiq.experiment: merge language and coredevice namespaces perl -i -pe 's/^from artiq import \$/from artiq.experiment import /' your_experiments/*.py (assuming you skipped the changes form the previous commit) 2016-01-26 06:49:08 +08:00			`from artiq.experiment import *`
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
style 2015-08-28 16:30:33 +08:00
refactor ddb/pdb/rdb 2015-07-14 04:08:20 +08:00			`class Tutorial(EnvExperiment):`
			`def build(self):`
environment: attr_* -> setattr_* 2015-10-04 00:18:21 +08:00			`self.setattr_device("core")`
			`self.setattr_device("ttl0")`
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
doc: various cleanups 2014-09-30 18:10:40 +08:00			`@kernel`
			`def run(self):`
			`for i in range(1000000):`
device and parameter database 2014-12-03 18:20:30 +08:00			`self.ttl0.pulse(2*us)`
doc: various cleanups 2014-09-30 18:10:40 +08:00			`delay(2*us)`
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00

doc: insist that output() must be called on TTLInOut. Closes #297 2016-03-01 00:28:40 +08:00			`.. note::`
			If ``ttl0`` is a bidirectional channel (``TTLInOut``), it is in input (non-driving) mode by default. You need to call ``self.ttl0.output()`` as explained above for the LED.


doc: clarify hw support 2015-08-21 13:33:23 +08:00			Connect an oscilloscope or logic analyzer to TTL0 and run ``artiq_run.py led.py``. Notice that the generated signal's period is precisely 4 microseconds, and that it has a duty cycle of precisely 50%. This is not what you would expect if the delay and the pulse were implemented with CPU-controlled GPIO: overhead from the loop management, function calls, etc. would increase the signal's period, and asymmetry in the overhead would cause duty cycle distortion.
doc: add some tutorial steps 2014-09-29 14:50:29 +08:00
manual: artiq.coredevice.runtime_exceptions -> artiq.coredevice.exceptions 2016-01-14 05:40:44 +08:00			Instead, inside the core device, output timing is generated by the gateware and the CPU only programs switching commands with certain timestamps that the CPU computes. This guarantees precise timing as long as the CPU can keep generating timestamps that are increasing fast enough. In case it fails to do that (and attempts to program an event with a timestamp in the past), the :class:`artiq.coredevice.exceptions.RTIOUnderflow` exception is raised. The kernel causing it may catch it (using a regular ``try... except...`` construct), or it will be propagated to the host.
doc/manual: more tutorial steps 2014-09-29 20:20:14 +08:00
			`Try reducing the period of the generated waveform until the CPU cannot keep up with the generation of switching events and the underflow exception is raised. Then try catching it: ::`

doc: update Underflow catching example 2016-01-26 07:41:27 +08:00			`from artiq.experiment import *`
doc/manual: more tutorial steps 2014-09-29 20:20:14 +08:00
style 2015-08-28 16:30:33 +08:00
doc: various cleanups 2014-09-30 18:10:40 +08:00			`def print_underflow():`
			`print("RTIO underflow occured")`
doc/manual: more tutorial steps 2014-09-29 20:20:14 +08:00
refactor ddb/pdb/rdb 2015-07-14 04:08:20 +08:00			`class Tutorial(EnvExperiment):`
			`def build(self):`
environment: attr_* -> setattr_* 2015-10-04 00:18:21 +08:00			`self.setattr_device("core")`
			`self.setattr_device("ttl0")`
doc/manual: more tutorial steps 2014-09-29 20:20:14 +08:00
manual/getting_started: fix minor errors found during tutorial 2014-11-18 05:34:00 +08:00			`@kernel`
doc: various cleanups 2014-09-30 18:10:40 +08:00			`def run(self):`
			`try:`
			`for i in range(1000000):`
device and parameter database 2014-12-03 18:20:30 +08:00			`self.ttl0.pulse(...)`
doc: various cleanups 2014-09-30 18:10:40 +08:00			`delay(...)`
			`except RTIOUnderflow:`
			`print_underflow()`
doc/manual: more tutorial steps 2014-09-29 20:20:14 +08:00
doc: add section for RTIO analyzer, mention GtkWave 2016-02-16 06:07:41 +08:00			`RTIO analyzer`
			`-------------`

doc: Document fast-math flag and kernel invariants. Fixes #351, #359. 2016-04-26 10:15:16 +08:00			The core device records the real-time I/O waveforms into a circular buffer. It is possible to dump any Python object so that it appears alongside the waveforms using the ``rtio_log`` function, which accepts a channel name (i.e. a log target) as the first argument: ::
Add documentation for rtio_log (#206). 2016-02-16 05:43:02 +08:00
			`from artiq.experiment import *`


			`class Tutorial(EnvExperiment):`
			`def build(self):`
			`self.setattr_device("core")`
			`self.setattr_device("ttl0")`

			`@kernel`
			`def run(self):`
			`for i in range(100):`
			`self.ttl0.pulse(...)`
			`rtio_log("ttl0", "i", i)`
			`delay(...)`

doc: add section for RTIO analyzer, mention GtkWave 2016-02-16 06:07:41 +08:00			Afterwards, the recorded data can be extracted and written to a VCD file using ``artiq_coreanalyzer -w rtio.vcd`` (see: :ref:`core-device-rtio-analyzer-tool`). VCD files can be viewed using third-party tools such as GtkWave.
Add documentation for rtio_log (#206). 2016-02-16 05:43:02 +08:00
doc: convert to 'with parallel' 2016-02-23 15:53:41 +08:00			`Parallel and sequential blocks`
			`------------------------------`
doc/manual: more tutorial steps 2014-09-29 20:20:14 +08:00
doc: convert to 'with parallel' 2016-02-23 15:53:41 +08:00			It is often necessary that several pulses overlap one another. This can be expressed through the use of ``with parallel`` constructs, in which all statements execute at the same time. The execution time of the ``parallel`` block is the execution time of its longest statement.
doc/manual: more tutorial steps 2014-09-29 20:20:14 +08:00
			`Try the following code and observe the generated pulses on a 2-channel oscilloscope or logic analyzer: ::`

doc: various cleanups 2014-09-30 18:10:40 +08:00			`for i in range(1000000):`
doc: convert to 'with parallel' 2016-02-23 15:53:41 +08:00			`with parallel:`
device and parameter database 2014-12-03 18:20:30 +08:00			`self.ttl0.pulse(2*us)`
			`self.ttl1.pulse(4*us)`
doc: various cleanups 2014-09-30 18:10:40 +08:00			`delay(4*us)`
doc/manual: more tutorial steps 2014-09-29 20:20:14 +08:00
doc: convert to 'with parallel' 2016-02-23 15:53:41 +08:00			Within a parallel block, some statements can be made sequential again using a ``with sequential`` construct. Observe the pulses generated by this code: ::
doc/manual: more tutorial steps 2014-09-29 20:20:14 +08:00
doc: various cleanups 2014-09-30 18:10:40 +08:00			`for i in range(1000000):`
doc: convert to 'with parallel' 2016-02-23 15:53:41 +08:00			`with parallel:`
doc: various cleanups 2014-09-30 18:10:40 +08:00			`with sequential:`
device and parameter database 2014-12-03 18:20:30 +08:00			`self.ttl0.pulse(2*us)`
doc: various cleanups 2014-09-30 18:10:40 +08:00			`delay(1*us)`
device and parameter database 2014-12-03 18:20:30 +08:00			`self.ttl0.pulse(1*us)`
			`self.ttl1.pulse(4*us)`
doc: various cleanups 2014-09-30 18:10:40 +08:00			`delay(4*us)`
doc/manual: more tutorial steps 2014-09-29 20:20:14 +08:00
doc: convert to 'with parallel' 2016-02-23 15:53:41 +08:00			`.. note::`
			Branches of a ``parallel`` block are executed one after another, with a reset of the internal RTIO time variable before moving to the next branch. If a branch takes a lot of CPU time, it may cause an underflow when the next branch begins its execution.
doc: Document fast-math flag and kernel invariants. Fixes #351, #359. 2016-04-26 10:15:16 +08:00
			`Additional optimizations`
			`------------------------`

			`The ARTIQ compiler runs many optimizations, most of which perform well on code that has pristine Python semantics. It also contains more powerful, and more invasive, optimizations that require opt-in to activate.`

			`Fast-math flags`
			`+++++++++++++++`

			`The compiler does not normally perform algebraically equivalent transformations on floating-point expressions, because this can dramatically change the result. However, it can be instructed to do so if all of the following is true:`

			`* Arguments and results will not be not-a-number or infinity values;`
			`* The sign of a zero value is insignificant;`
			`* Any algebraically equivalent transformations, such as reassociation or replacing division with multiplication by reciprocal, are legal to perform.`

			If this is the case for a given kernel, a ``fast-math`` flag can be specified to enable more aggressive optimization for this specific kernel: ::

			`@kernel(flags={"fast-math"})`
			`def calculate(x, y, z):`
			`return x * z + y * z`

			`This flag particularly benefits loops with I/O delays performed in fractional seconds rather than machine units, as well as updates to DDS phase and frequency.`

			`Kernel invariants`
			`+++++++++++++++++`

			`The compiler attempts to remove or hoist out of loops any redundant memory load operations, as well as propagate known constants into function bodies, which can enable further optimization. However, it must make conservative assumptions about code that it is unable to observe, because such code can change the value of the attribute, making the optimization invalid.`

			`When an attribute is known to never change while the kernel is running, it can be marked as a kernel invariant to enable more aggressive optimization for this specific attribute: ::`

			`class Converter:`
			`kernel_invariants = {"ratio"}`

			`def __init__(self, ratio=1.0):`
			`self.ratio = ratio`

			`@kernel`
			`def convert(self, value):`
			`return value * self.ratio ** 2`

			In the synthetic example above, the compiler will be able to detect that the result of evaluating ``self.ratio ** 2`` never changes and replace it with a constant, removing an expensive floating-point operation.