artiq/doc/manual/faq.rst

340 lines
29 KiB
ReStructuredText

FAQ (How do I...)
=================
use this documentation?
-----------------------
The content of this manual is arranged in rough reading order. If you start at the beginning and make your way through section by section, you should form a pretty good idea of how ARTIQ works and how to use it. Otherwise:
**If you are just starting out,** and would like to get ARTIQ set up on your computer and your core device, start with :doc:`installing`, :doc:`flashing`, and :doc:`configuring`, in that order.
**If you have a working ARTIQ setup** (or someone else has set it up for you), start with the tutorials: read :doc:`rtio`, then progress to :doc:`getting_started_core`, :doc:`getting_started_mgmt`, and :doc:`using_data_interfaces`. If your system is in a DRTIO configuration, :doc:`DRTIO and subkernels <using_drtio_subkernels>` will also be helpful.
Pages like :doc:`management_system` and :doc:`core_device` describe **specific components of the ARTIQ ecosystem** in more detail. If you want to understand more about device and dataset databases, for example, read the :doc:`environment` page; if you want to understand the ARTIQ Python dialect and everything it does or does not support, read the :doc:`compiler` page.
Reference pages, like :doc:`main_frontend_tools` and :doc:`core_drivers_reference`, contain the detailed documentation of the individual methods and command-line tools ARTIQ provides. They are heavily interlinked throughout the rest of the documentation: whenever a method, tool, or exception is mentioned by name, like :class:`~artiq.frontend.artiq_run`, :meth:`~artiq.language.core.now_mu`, or :exc:`~artiq.coredevice.exceptions.RTIOUnderflow`, it can normally be clicked on to directly access the reference material. Notice also that the online version of this manual is searchable; see the 'Search docs' bar at left.
.. _build-documentation:
build this documentation?
-------------------------
To generate this manual from source, you can use ``nix build`` directives, for example: ::
$ nix build git+https://github.com/m-labs/artiq.git\?ref=release-[number]#artiq-manual-html
Substitute ``artiq-manual-pdf`` to get the LaTeX PDF version. The results will be in ``result``.
The manual is written in `reStructured Text <https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html>`_; you can find the source files in the ARTIQ repository under ``doc/manual``. If you spot a mistake, a typo, or something that's out of date or missing -- in particular, if you want to add something to this FAQ -- feel free to clone the repository, edit the source RST files, and make a pull request with your version of an improvement. (If you're not a fan of or not familiar with command-line Git, both GitHub and Gitea support making edits and pull requests directly in the web interface; tutorial materials are easy to find online.) The second best thing is to open an issue to make M-Labs aware of the problem.
roll back to older versions of ARTIQ, or obtain it through other installation methods?
--------------------------------------------------------------------------------------
At all times, three versions of ARTIQ are actively supported by M-Labs, released through the beta, stable, and legacy channels. See :doc:`releases`.
If you are trying to rollback to stable or legacy, the process should be accordingly simple. See the respective :doc:`installing` page in the respective version of the manual. If you've previously used the version you are rolling back to, you can likely use the rollback methods described in :ref:`installing-upgrading`; otherwise you can always treat it as a fresh install. Remember that it will also be necessary to reflash core devices with corresponding legacy binaries.
Regarding pre-legacy releases, note that being actively supported simply means that M-Labs makes prebuilt packages and binaries for these versions available via the supported installation methods and through AFWS. Outdated versions aren't automatically built or offered over these channels, but their source code remains available in the Git repository, and you are free to use it or adapt it in accordance with the terms of the license, including building whatever packages you prefer. In general, though, newer releases of ARTIQ offer more features, more stability, better performance, and better support. The legacy release is supported simply as a convenience for users who haven't been able to upgrade yet. For normal purposes, it is recommended to use the current stable release of ARTIQ if at all possible, or the beta to gain access to new features and improvements that are still in development.
For more details, see also `Clarifications regarding the ARTIQ release model and AFWS <https://forum.m-labs.hk/d/823-clarifications-regarding-the-artiq-release-model-and-afws>`_.
.. tip::
If you're particularly concerned with being able to precisely reproduce older experiments, even when you've moved on to newer ARTIQ versions, upgrade carefully and make your own local backups to be able to rollback to older versions of your system. Make sure to keep copies of older firmware binaries in order to be able to reflash your hardware. Older versions of ARTIQ will always continue working if left untouched, and you won't need to worry about rebuilding from the source if you keep your own prebuilt versions around.
.. _faq-networking:
troubleshoot networking problems?
---------------------------------
Diagnosis aids:
- Can you ``ping`` the device?
- Is the Ethernet LED on?
- Is the ERROR LED on?
- Is there anything unusual recorded in :ref:`the UART log <connecting-UART>`?
Some things to consider:
- Is the ``core_addr`` field of your ``device_db.py`` set correctly?
- Did your device flash and boot successfully? Were the binaries generated for the correct board hardware version?
- Are your core device's IP address and networking configurations definitely set correctly? Check the UART log to confirm, and talk to your network administrator about what the correct choices are.
- Is your core device configured for an external reference clock? If so, it cannot function correctly without one. Is the external reference clock plugged in?
- Are Ethernet and (on Kasli only) SFP0 plugged in all the way? Are they working? Try different cables and SFP adapters; M-Labs tests with CAT6 cables, but lower categories should be supported too.
- Are your PC and your crate in the same subnet?
- Is some other device in your network already using the configured IP address? Turn off the core device and try pinging the configured IP address; if it responds, you have a culprit. One of the two will need a different networking configuration.
- Are there restrictions or issues in your router or subnet that are preventing the core device from connecting? It may help to try connecting the core device to your PC directly.
fix 'Mismatch between gateware and software versions'?
------------------------------------------------------
Either reflash your core device with a newer version of ARTIQ (see :doc:`flashing`) or update your software (see :ref:`installing-upgrading`), depending on which is out of date.
.. note::
You can check the specific versions you are using at any time by comparing the gateware version given in the core startup log and the output given by adding ``--version`` to any of the standard ARTIQ front-end commands. This is especially useful when e.g. seeking help in the forum or at the helpdesk, where your running ARTIQ version is often crucial information to diagnose a problem.
Minor version mismatches are common, even in stable ARTIQ versions, but should not cause any issues. The ARTIQ release system ensures breaking changes are strictly limited to new release versions, or to the beta branch (which explicitly makes no promises of stability.) Updates that *are* applied to the stable version are usually bug fixes, documentation improvements, or other quality-of-life changes. As long as gateware and software are using the same stable release version of ARTIQ, even if there is a minor mismatch, no warning will be displayed.
fix unreliable DRTIO master-satellite links?
--------------------------------------------
Inconsistent DRTIO connections, especially with odd or absent errors in the core logs, are often a symptom of overheating either in the master or satellite boards. Check the core device fans for failure or defects. Improve air circulation around the crate or attach additional fans to see if that improves or resolves the issue. In the long term, fan trays to be rack-mounted together with the crate are a clean solution to these kinds of problems.
add or remove EEM peripherals or DRTIO satellites?
--------------------------------------------------
Adding new real-time hardware to an ARTIQ system almost always means reflashing the core device; if you are adding new satellite core devices, they will have to be flashed as well. If you have obtained your upgrades from M-Labs or QUARTIQ, updated binaries and reflashing support will normally be offered to you directly. In any other case, track down your JSON system description file(s), bring them up to date with the updated state of your system, and see :doc:`building_developing`.
Once you have an updated set of binaries, reflash the core device, following the instructions in :doc:`flashing`. Be sure to update your device database before starting experimentation; run :mod:`~artiq.frontend.artiq_ddb_template` on your system description(s) to update the local devices, and copy over any aliases or entries for NDSP controllers you may have been using. Note that the device database is a Python file, and the generated file of local devices can also simply be imported into the final version, allowing for dynamic modifications, especially in complex systems that may have multiple device databases in use.
see command-line help?
----------------------
Like most if not almost all terminal utilities, ARTIQ commands, tools and applets print their help messages directly into the terminal and exit when run with the flag ``--help`` or ``-h``: ::
$ artiq_run -h
This is the simplest and most direct way of accessing the same usage and reference material that is replicated in this manual on the pages :doc:`main_frontend_tools` and :doc:`utilities`.
.. _faq-find-examples:
find ARTIQ examples?
--------------------
The official examples are stored in the ``examples`` folder of the ARTIQ package. You can find the location of the ARTIQ package on your machine with: ::
python3 -c "import artiq; print(artiq.__path__[0])"
Copy the ``examples`` folder from that path into your home or user directory, and start experimenting! (Note that some examples have dependencies not included with a standard ARTIQ install, like matplotlib and numba. To run those examples properly, make sure those modules are accessible.)
If you have progressed past this level and would like to see more in-depth code or real-life examples of how other groups have handled running experiments with ARTIQ, see the "Community code" directory on the M-labs `resources page <https://m-labs.hk/experiment-control/resources/>`_.
fix ``failed to connect to moninj`` in the dashboard?
-----------------------------------------------------
This and other similar messages almost always indicate that your device database lists controllers (for example, ``aqctl_moninj_proxy``) that either haven't been started or aren't reachable at the given host and port. See :ref:`mgmt-ctlmgr`, or simply run: ::
$ artiq_ctlmgr
to let the controller manager start the necessary controllers automatically.
fix ``address already in use`` when running ARTIQ commands?
-----------------------------------------------------------
A message like ``OSError: [Errno 98] error while attempting to bind on address ('127.0.0.1', 1067): [errno 98] address already in use`` indicates that the IP address and port number combination you're trying to use is already occupied by some other process. Often this simply means that the ARTIQ process you're trying to start is in fact already running. Note for example that trying to start a controller which is already being run by a controller manager will generally fail for this reason.
.. note::
ARTIQ management system communications, whether distributed or local, run over TCP/IP, using TCP port numbers to identify their destinations. Generally speaking, client processes like the dashboard don't require fixed ports of their own, since they can simply reach out to the master when they want to establish a connection. Running multiple dashboards will never cause a port conflict. On the other hand, server processes like the ARTIQ master have to be 'listening' at a fixed, open port in order to be able to receive incoming connections. For more details, look into `ports in computer networking <https://en.wikipedia.org/wiki/Port_(computer_networking)>`_.
Most management system processes belong to the second category, and are bound to one or several fixed communication ports while they're running. See also :doc:`default_network_ports`.
You can use the command ``netstat`` to list the ports currently in use on your system. To check the status of a specific port on Linux, try either of: ::
$ netstat -anp --inet | grep "<port-number>"
$ lsof -i:<port-number>
On Windows, you can list ports with: ::
$ netstat -ano -p TCP
Use your preferred method to search through the output; suitable commands will vary by environment (e.g. ``grep`` in an MSYS2 shell, ``Select-String`` in PowerShell, ``find`` in the Windows command line, etc.)
In all cases, if there are no results, the port isn't in use and should be free for new processes.
.. tip::
While it is possible to run, for example, two identical ARTIQ controllers on the same machine, they can't be bound to the same port numbers at the same time. If you're intentionally running multiple copies of the same ARTIQ processes, use the command-line ``--port`` options to set alternate ports for at least one of the two. See :doc:`main_frontend_tools` and :doc:`utilities` for exact flags to use. Controllers should have similar flags available and will also require updated :ref:`device database entries <ndsp-integration>`. Note that alternate ports must be consistent to be useful, e.g., a master and dashboard must have the same ``--port-notify`` set in order to communicate with each other!
Otherwise, either the running process must be stopped, or you'll have to set different port numbers for the process you're trying to start. In some cases it might happen that a process is no longer accessible or has become unresponsive but is still occupying its ports. The easiest way to free the ports is to kill the process manually. On Linux, you can use the ``kill`` command with ``lsof``: ::
$ kill $(lsof -t -i:<port-number>)
On Windows, use ``netstat`` again to identify the process ID, and then feed it into ``taskkill``, e.g.: ::
$ netstat -ano -p TCP
$ taskkill /F /PID <process-ID>
diagnose and fix sequence errors?
---------------------------------
Go through your code, keeping manual track of SED lanes. See the following example: ::
@kernel
def run(self):
self.core.reset()
with parallel:
self.ttl0.on() # lane0
self.ttl_sma.pulse(800*us) # lane1(rising) lane1(falling)
with sequential:
self.ttl1.on() # lane2
self.ttl2.on() # lane3
self.ttl3.on() # lane4
self.ttl4.on() # lane5
delay(800*us)
self.ttl1.off() # lane5
self.ttl2.off() # lane6
self.ttl3.off() # lane7
self.ttl4.off() # lane0
self.ttl0.off() # lane1 -> clashes with the falling edge of ttl_sma,
# which is already at +800us
In most cases, as in this one, it's relatively easy to rearrange the generation of events so that they will be better spread out across SED lanes without sacrificing actual functionality. One possible solution for the above sequence looks like: ::
@kernel
def run(self):
self.core.reset()
self.ttl0.on() # lane0
self.ttl_sma.on() # lane1
self.ttl1.on() # lane2
self.ttl2.on() # lane3
self.ttl3.on() # lane4
self.ttl4.on() # lane5
delay(800*us)
self.ttl1.off() # lane5
self.ttl2.off() # lane6
self.ttl3.off() # lane7
self.ttl4.off() # lane0 (no clash: new timestamp is higher than last)
self.ttl_sma.off() # lane1
self.ttl0.off() # lane2
In this case, the :meth:`~artiq.coredevice.ttl.TTLInOut.pulse` is split up into its component :meth:`~artiq.coredevice.ttl.TTLInOut.on` and :meth:`~artiq.coredevice.ttl.TTLInOut.off` so that events can be generated more linearly. It can also be worth keeping in mind that delaying by even a single coarse RTIO cycle between events avoids switching SED lanes at all; in contexts where perfect simultaneity is not a priority, this is an easy way to avoid sequencing issues. See again :ref:`sequence-errors`.
understand applet commands?
---------------------------
The 'Command' field contains the exact terminal command used to open and operate the applet. The default ``${artiq_applet}`` prefix simply translates to something to the effect of ``python -m artiq.applets.``, intended to be immediately followed by the applet module name. The options suffixed after the module name are the same used in the command line, and a list of them can be shown by using the standard command line ``-h`` help flag: ::
$ python -m artiq.applets.plot_xy -h
in any terminal.
organize datasets in folders?
-----------------------------
Use the dot (".") in dataset names to separate folders. The GUI will automatically create and delete folders in the dataset tree display.
organize applets in groups?
---------------------------
Create groups by left-clicking within the applet list and selecting 'New Group'. Move applets in and out of groups by dragging them with the mouse. To unselect an applet or a group, use CTRL+click.
organize experiment windows in the dashboard?
---------------------------------------------
Experiment windows can be organized by using the following hotkeys:
* CTRL+SHIFT+T to tile experiment windows
* CTRL+SHIFT+C to cascade experiment windows
The windows will be organized in the order they were last interacted with.
fix errors when restarting the management system after a crash?
---------------------------------------------------------------
On Windows in particular, abnormal shutdowns such as power outages or bluescreens can sometimes corrupt the organizational files used by the management system, resulting in errors to the tune of ``ValueError: source code string cannot contain null bytes`` when restarting. The easiest way to handle these problems is to delete the corrupted files and start from scratch.
If the master itself fails to start, it may be necessary to delete the dataset database or even ``last_rid.pyon`` to restart properly, but if the dashboard or browser fail, the problem is probably in the GUI configuration files, where the state of the GUI (arrangement of docks, applets, etc.) is kept. These files are backed up once whenever they are successfully loaded. Navigate to the user configuration directory (see :ref:`gui-config-files`) and look for a file suffixed ``.backup``. To restore the GUI, simply delete the corrupted configuration file and rename the backup to replace it.
create and use variable-length arrays in kernels?
-------------------------------------------------
You can't, in general; see the corresponding notes under :ref:`compiler-types`. ARTIQ kernels do not support heap allocation, meaning in particular that lists, arrays, and strings must be of constant size. One option is to preallocate everything, as mentioned on the Compiler page; another option is to chunk it and e.g. read 100 events per function call, push them upstream and retry until the gate time closes.
understand how best to send data between kernel and host?
---------------------------------------------------------
See also :ref:`basic-artiq-python`. Let's run down the options for kernel-host data transfer:
- Kernels can return single values directly. They *cannot* return lists, arrays or strings, because of the way these values are allocated, which prevents values of these types from outliving the kernel they are created in. This is still true when the values in question are wrapped in functions or objects, in which case they may be missed by lifetime tracking and accepted by the compiler, but will cause memory corruption when run.
- Kernels can freely make changes to attributes of objects shared with the host, including ``self``. However, these changes will be made to a kernel-owned copy of the object, which is only synchronized with the host copy when the kernel completes. This means that host-side operations executed during the runtime of the kernel, including RPCs, will be handling an unmodified version of the object, and modifications made by those operations will simply be overwritten when the kernel returns.
.. note::
Attribute writeback happens *once per kernel*, that is, if your experiment contains many separate kernels called from the host, modifications will be written back when each separate kernel completes. This is generally not suitable for data transfer, however, as new kernels are costly to create, and experiments often try to avoid doing so. It is also important to specify that kernels called *from* a kernel will not write back to the host upon completion. Attribute writeback is only executed upon return to the host.
- Kernels can interact with datasets, either as attributes (if :meth:`~artiq.language.environment.HasEnvironment.setattr_dataset` is used) or by RPC of the get and set methods (:meth:`~artiq.language.environment.HasEnvironment.get_dataset`, :meth:`~artiq.language.environment.HasEnvironment.set_dataset`, etc.). In this case note that, like certain other host-side methods, :meth:`~artiq.language.environment.HasEnvironment.get_dataset` will not actually be accepted by the compiler, because its return type is not specified. To call it as an RPC, simply wrap it in another function which *does* specify a return type. :meth:`~artiq.language.environment.HasEnvironment.set_dataset` can be similarly wrapped to make it asynchronous.
- Kernels can of course also call arbitrary RPCs. When sending data to the host, these can be asynchronous, and this is normally the recommended way of transferring data back to the host, resulting in a relatively minor amount of delay in the kernel. Keep in mind however that asynchronous RPCs may still block execution for some time if the arguments are very large or if many RPCs are submitted in close succession. When receiving data from the host, RPCs must be synchronous, which is still considerably faster than starting a new kernel. Note that if data is being both (asynchronously) sent and received, there is a small possibility of minor race conditions (i.e. retrieved data may not yet show updates sent in an earlier RPC).
Kernel attributes and data transfer remain somewhat of an open area of development. Many such developments are or will be implemented in `NAC3 <https://forum.m-labs.hk/d/392-nac3-new-artiq-compiler-3-prealpha-release>`_, the next-generation ARTIQ compiler. The overhead for starting new kernels, which is largely dominated by compile time, should be significantly reduced (NAC3 can be expected to complete compilations 6x - 30x faster than currently).
write part of my experiment as a coroutine/asyncio task/generator?
------------------------------------------------------------------
You cannot change the API that your experiment exposes: :meth:`~artiq.language.environment.HasEnvironment.build`, :meth:`~artiq.language.environment.Experiment.prepare`, :meth:`~artiq.language.environment.Experiment.run` and :meth:`~artiq.language.environment.Experiment.analyze` need to be regular functions, not generators or asyncio coroutines. That would make reusing your own code in sub-experiments difficult and fragile. You can however wrap your own generators/coroutines/tasks in regular functions that you then expose as part of the API.
determine the pyserial URL to connect to a device by its serial number?
-----------------------------------------------------------------------
You can list your system's serial devices and print their vendor/product id and serial number by running::
$ python3 -m serial.tools.list_ports -v
This will give you the ``/dev/ttyUSBxx`` (or ``COMxx`` for Windows) device names. The ``hwid:`` field gives you the string you can pass via the ``hwgrep://`` feature of pyserial `serial_for_url() <https://pythonhosted.org/pyserial/pyserial_api.html#serial.serial_for_url>`_ in order to open a serial device.
The preferred way to specify a serial device is to make use of the ``hwgrep://`` URL: it allows for selecting the serial device by its USB vendor ID, product
ID and/or serial number. These never change, unlike the device file name.
For instance, if you want to specify the Vendor/Product ID and the USB Serial Number, you can do: ::
$ -d "hwgrep://<VID>:<PID> SNR=<serial_number>"``.
run unit tests?
---------------
The unit tests assume that the Python environment has been set up in such a way that ``import artiq`` will import the code being tested, and that this is still true for any subprocess created. This is not the way setuptools operates as it adds the path to ARTIQ to ``sys.path`` which is not passed to subprocesses; as a result, running the tests via ``setup.py`` is not supported. The user must first install the package or set ``PYTHONPATH``, and then run the tests with e.g. ``python3 -m unittest discover`` in the ``artiq/test`` folder and ``lit .`` in the ``artiq/test/lit`` folder.
For the hardware-in-the-loop unit tests, set the ``ARTIQ_ROOT`` environment variable to the path to a device database containing the relevant devices.
The core device tests require the following TTL devices and connections:
* ``ttl_out``: any output-only TTL.
* ``ttl_out_serdes``: any output-only TTL that uses a SERDES (i.e. has a fine timestamp). Can be aliased to ``ttl_out``.
* ``loop_out``: any output-only TTL. Must be physically connected to ``loop_in``. Can be aliased to ``ttl_out``.
* ``loop_in``: any input-capable TTL. Must be physically connected to ``loop_out``.
* ``loop_clock_out``: a clock generator TTL. Must be physically connected to ``loop_clock_in``.
* ``loop_clock_in``: any input-capable TTL. Must be physically connected to ``loop_clock_out``.
If TTL devices are missing, the corresponding tests are skipped.
.. _gui-config-files:
find the dashboard and browser configuration files?
---------------------------------------------------
::
python -c "from artiq.tools import get_user_config_dir; print(get_user_config_dir())"
Additional Resources
====================
Other related documentation
---------------------------
- the `Sinara wiki <https://github.com/sinara-hw/meta/wiki>`_
- the `SiPyCo manual <https://m-labs.hk/artiq/sipyco-manual/>`_
- the `Migen manual <https://m-labs.hk/migen/manual/>`_
- in a pinch, the `M-labs internal docs <https://git.m-labs.hk/sinara-hw/assembly>`_
For more advanced questions, sometimes the `list of publications <https://m-labs.hk/experiment-control/publications/>`_ about experiments performed using ARTIQ may be interesting. See also the official M-Labs `resources <https://m-labs.hk/experiment-control/resources/>`_ page, especially the section on community code.
"Help, I've done my best and I can't get any further!"
------------------------------------------------------
- If you have an active M-Labs AFWS/support subscription, you can email helpdesk@ at any time for personalized assistance. Please include the following information:
- Your installed ARTIQ version (add ``--version`` to any of the standard ARTIQ commands)
- The variant name of your system (refer to the sticker on the crate if you aren't sure)
- The recent output of your core log, either through ``artiq_coremgmt`` (if you're able to contact your device by network), or over UART following :ref:`the guide here <connecting-UART>`
- How your problem happened, and what you've already tried to fix it
- Compare your materials with the examples; see also :ref:`finding ARTIQ examples <faq-find-examples>` above.
- Check the list of `active issues <https://github.com/m-labs/artiq/issues>`_ on the ARTIQ GitHub repository for possible known problems with ARTIQ. Search through the closed issues to see if your question or concern has been addressed before.
- Search the `M-Labs forum <https://forum.m-labs.hk/>`_ for similar problems, or make a post asking for help yourself.
- Look into the `Mattermost live chat <https://chat.m-labs.hk>`_ or the bridged IRC channel.
- Read the open source code and its docstrings and figure it out.
- If you're reasonably certain you've identified a bug, or if you'd like to suggest a feature that should be included in future ARTIQ releases, `file a GitHub issue <https://github.com/m-labs/artiq/issues/new/choose>`_ yourself, following one of the provided templates.
- In some odd cases, you may want to see the `mailing list archive <https://www.mail-archive.com/artiq@lists.m-labs.hk/>`_; the ARTIQ mailing list was shut down at the end of 2020 and was last regularly used during the time of ARTIQ-2 and 3, but for some older ARTIQ features, or to understand a development thought process, you may still find relevant information there.
In any situation, if you found the manual unclear or unhelpful, you might consider following the :ref:`directions for contribution <build-documentation>` and editing it to be more helpful for future readers.