Commit Graph

57 Commits (daad3d263a9883e5e057b4c22a800c2e9639f47f)

Author SHA1 Message Date
Sebastien Bourdeauducq daad3d263a master: commit missing part of 7fd6dead8 2023-01-12 10:39:53 +08:00
Sebastien Bourdeauducq 73a4ef89ec scheduler: make asyncio loop a keyword-only argument, like in other asyncio APIs 2023-01-11 18:45:35 +08:00
Sebastien Bourdeauducq 6cfd1480a7 scheduler: support passing event loop 2023-01-10 12:26:24 +08:00
David Nadlinger 874d298ceb master/scheduler: Unbreak submitting from repository
This is a fix-up to commit 2a58981822.
2022-12-13 14:58:23 +00:00
Egor Savkin 2a58981822 Scheduler: replace relative path to absolute
Signed-off-by: Egor Savkin <es@m-labs.hk>
2022-12-09 21:43:36 +08:00
kk1050 7aa6104872
Add method to check if termination is requested (#811, #1932)
Co-authored-by: kk105 <kkl@m-kabs.hk>
2022-07-07 17:01:34 +08:00
kk1050 4ddd2739ee
add log_tuples function (#1896)
Co-authored-by: kk105 <kkl@m-kabs.hk>
2022-06-06 18:41:46 +08:00
David Nadlinger 966ed5d013 master/scheduler: Fix priority/due date precedence order when waiting to prepare
See test case – previously, the highest-priority pending run would
be used to calculate the timeout, rather than the earliest one.

This probably managed to go undetected for that long as any unrelated
changes to the pipeline (e.g. new submissions, or experiments pausing)
would also cause _get_run() to be re-evaluated.
2020-06-19 23:45:52 +01:00
David Nadlinger 7955b63b00 master: Always write results to HDF5 once run stage is reached
Previously, a significant risk of losing experimental results would
be associated with long-running experiments, as any stray exceptions
while run()ing the experiment – for instance, due to infrequent
network glitches or hardware reliability issue – would cause no
HDF5 file to be written. This was especially troublesome as long
experiments would suffer from a higher probability of unanticipated
failures, while at the same time being more costly to re-take in
terms of wall-clock time.

Unanticipated uncaught exceptions like that were enough of an issue
that several Oxford codebases had come up with their own half-baked
mitigation strategies, from swallowing all exceptions in run() by
convention, to always broadcasting all results to uniquely named
datasets such that the partial results could be recovered and written
to HDF5 by manually run recovery experiments.

This commit addresses the problem at its source, changing the worker
behaviour such that an HDF5 file is always written as soon as run()
starts.
2020-06-18 17:47:26 +01:00
Sebastien Bourdeauducq 4707aef45c split out artiq-comtools 2019-11-14 15:21:51 +08:00
Sebastien Bourdeauducq 3fd6962bd2 use sipyco (#585) 2019-11-10 15:55:17 +08:00
David Nadlinger 84b91ee8bd master/scheduler: Document Deleter semantics [nfc]
From looking at the code, it wasn't obvious to me that this is
supposed to handle multiple calls to delete(). This is the case,
however, when for instance Scheduler.delete()ing a run, which
will then also be deleted again from AnalyzeStage.
2019-05-14 22:37:16 +01:00
David Nadlinger e24e893303 master/scheduler: Fix misleading indentation [nfc] 2019-01-20 19:45:47 +00:00
David Nadlinger c213ab13ba sync_struct: Notifier.{read -> raw_view}, factor out common dict update code [nfc] 2019-01-19 20:19:17 +00:00
Sebastien Bourdeauducq e4a631a3d7 scheduler: consider the pipeline flushed if everything has a lower priority than us. Closes #640 2017-05-22 18:43:59 +08:00
Sebastien Bourdeauducq 432c6b99e2 master: still save results when analyze fails. Closes #684 2017-03-27 17:57:02 +08:00
Sebastien Bourdeauducq 1908339d4e scheduler: default submission arguments, closes #577 2016-10-18 17:11:06 +08:00
Sebastien Bourdeauducq 69099691f7 doc: clarify usage of pause/check_pause, closes #571 2016-10-17 20:08:15 +08:00
Sebastien Bourdeauducq 03a69ec5b7 scheduler: add check_pause function 2016-06-27 14:37:29 +08:00
Sebastien Bourdeauducq 785691ab98 fix indentation 2016-02-29 21:32:48 +08:00
Sebastien Bourdeauducq 72a993afe0 master: cache last RID. Closes #234 2016-02-15 18:20:50 +01:00
Sebastien Bourdeauducq cc6b808bf8 master: finer control of worker exception reporting. Closes #233 2016-01-23 21:23:02 -05:00
whitequark be560dbc63 Commit missing parts of 13e65c2a. 2016-01-16 03:00:17 +00:00
whitequark 13e65c2a0a scheduler: make sure worker exceptions are not unexpectedly hidden. 2016-01-16 02:20:32 +00:00
Sebastien Bourdeauducq 5e14afde3e scheduler: use current (last scanned) repo revision instead of HEAD 2015-12-06 19:00:41 +08:00
Sebastien Bourdeauducq 2c77c80b4f master: expose more scheduler APIs to the experiments 2015-10-30 13:41:18 +08:00
Sebastien Bourdeauducq 828b48ad89 master/scheduler: reduce logging severity of worker exception backtraces to debug 2015-10-28 17:48:50 +08:00
Sebastien Bourdeauducq 9f04af63e6 scheduler: raise logging severity of errors 2015-10-14 16:02:22 +08:00
Sebastien Bourdeauducq 139072d402 Graceful experiment termination. Closes #76 2015-10-06 13:50:00 +08:00
Sebastien Bourdeauducq f552d62b69 use Python 3.5 coroutines 2015-10-03 19:28:57 +08:00
Sebastien Bourdeauducq cd3107ba75 do not use deprecated asyncio.JoinableQueue 2015-10-03 13:59:18 +08:00
Sebastien Bourdeauducq 06badd1dc1 scheduler: refactor, fix pipeline hazards 2015-08-10 21:58:11 +08:00
Sebastien Bourdeauducq 54d85efc2a master,gui: show Git commit messages in schedule 2015-08-08 11:08:04 +08:00
Sebastien Bourdeauducq 7ed8fe57fa Git support 2015-08-07 15:51:56 +08:00
Sebastien Bourdeauducq 96a5d73c81 worker: split build stage from prepare 2015-07-09 13:18:12 +02:00
Sebastien Bourdeauducq 9f9079589e gui: send monitor requests to core device 2015-06-05 14:52:41 +08:00
Sebastien Bourdeauducq aa242f7c66 scheduler: simplify priority policy
Remove overdueness. User must submit calibration experiments with higher priority values for them to take precedence.
2015-05-28 18:24:45 +08:00
Sebastien Bourdeauducq b0f8141018 scheduler: cancel flush when run is cancelled 2015-05-28 17:48:33 +08:00
Sebastien Bourdeauducq e752e57fa5 scheduler: do not duplicate 'run terminated' information 2015-05-28 17:37:08 +08:00
Sebastien Bourdeauducq 737f6d4485 scheduler: support pipeline flush 2015-05-28 17:20:58 +08:00
Sebastien Bourdeauducq fc449509b8 scheduler: pass priority to experiments 2015-05-24 20:37:47 +08:00
Sebastien Bourdeauducq a21373841c scheduler: catch worker exceptions in prepare and analyze stages 2015-05-24 20:23:49 +08:00
Sebastien Bourdeauducq d6ced1c780 scheduler: support priorities 2015-05-24 01:09:22 +08:00
Sebastien Bourdeauducq b74b8d5826 Scheduling TNG 2015-05-17 16:11:00 +08:00
Sebastien Bourdeauducq 43a05c783d worker: split write_results action 2015-03-11 19:06:46 +01:00
Sebastien Bourdeauducq d5795fd619 master: watchdog support
Introduces a watchdog context manager to use in the experiment code that
terminates the process with an error if it times out. The syntax is:

with self.scheduler.watchdog(20*s):
   ...

Watchdogs timers are implemented by the master process (and the worker
communicates the necessary information about them) so that they can be
enforced even if the worker crashes. They can be nested arbitrarily.
During yields, all watchdog timers for the yielding worker are
suspended [TODO]. Setting up watchdogs is not supported in kernels,
however, a kernel can be called within watchdog contexts (and terminating
the worker will terminate the kernel [TODO]).

It is possible to implement a heartbeat mechanism using a watchdog, e.g.:

for i in range(...):
    with self.scheduler.watchdog(...):
        ....

Crashes/freezes within the iterator or the loop management would not be
detected, but they should be rare enough.
2015-03-11 16:43:14 +01:00
Sebastien Bourdeauducq f2134fa4b2 master,worker: split prepare/run/analyze 2015-03-09 23:34:09 +01:00
Sebastien Bourdeauducq 4c280d5fcc master: use a new worker process for each experiment 2015-03-09 16:22:41 +01:00
Sebastien Bourdeauducq 6601bebcfe master: make RIDs unique across restarts 2015-02-21 18:41:07 -07:00
Sebastien Bourdeauducq cc172699ea master: use RID + unit class name for HDF5 filenames 2015-02-20 14:11:55 -07:00