Commit Graph

226 Commits

Author SHA1 Message Date
David Nadlinger 966ed5d013 master/scheduler: Fix priority/due date precedence order when waiting to prepare
See test case – previously, the highest-priority pending run would
be used to calculate the timeout, rather than the earliest one.

This probably managed to go undetected for that long as any unrelated
changes to the pipeline (e.g. new submissions, or experiments pausing)
would also cause _get_run() to be re-evaluated.
2020-06-19 23:45:52 +01:00
David Nadlinger 7955b63b00 master: Always write results to HDF5 once run stage is reached
Previously, a significant risk of losing experimental results would
be associated with long-running experiments, as any stray exceptions
while run()ing the experiment – for instance, due to infrequent
network glitches or hardware reliability issue – would cause no
HDF5 file to be written. This was especially troublesome as long
experiments would suffer from a higher probability of unanticipated
failures, while at the same time being more costly to re-take in
terms of wall-clock time.

Unanticipated uncaught exceptions like that were enough of an issue
that several Oxford codebases had come up with their own half-baked
mitigation strategies, from swallowing all exceptions in run() by
convention, to always broadcasting all results to uniquely named
datasets such that the partial results could be recovered and written
to HDF5 by manually run recovery experiments.

This commit addresses the problem at its source, changing the worker
behaviour such that an HDF5 file is always written as soon as run()
starts.
2020-06-18 17:47:26 +01:00
David Nadlinger d87042597a master/worker_impl: Factor out "completed" message sending [nfc]
Just reduces the visual complexity/potential for typos a bit, and
we already have put_exception_report().
2020-06-18 01:30:46 +01:00
Sebastien Bourdeauducq db13747279 fix device_db alias corner case bugs. Closes #1140 2019-11-14 16:22:45 +08:00
Sebastien Bourdeauducq 4707aef45c split out artiq-comtools 2019-11-14 15:21:51 +08:00
Sebastien Bourdeauducq 3fd6962bd2 use sipyco (#585) 2019-11-10 15:55:17 +08:00
Charles Baynham e50a6d5aaf worker_impy: ignore newline at start of experiment docstring 2019-09-20 22:10:49 +08:00
David Nadlinger 84b91ee8bd master/scheduler: Document Deleter semantics [nfc]
From looking at the code, it wasn't obvious to me that this is
supposed to handle multiple calls to delete(). This is the case,
however, when for instance Scheduler.delete()ing a run, which
will then also be deleted again from AnalyzeStage.
2019-05-14 22:37:16 +01:00
Sebastien Bourdeauducq 93f4f31f45 devices.ctlmgr -> master.ctlmgr 2019-04-20 00:25:44 +08:00
Chris Ballance 8659c769cb master/language: add methods to set experiment pipeline/priority/flush defaults 2019-03-12 10:54:15 +01:00
David Nadlinger 01c3000ef3 master: Print offending key on HDF5 dataset type error
This helps debugging the cause of TypeErrors arising from types
not handled by the HDF5 serializer, as the backtrace doesn't
otherwise include any useful information.
2019-02-09 20:50:38 +00:00
David Nadlinger bf84226c7d language: Support appending to datasets 2019-02-09 20:50:38 +00:00
David Nadlinger 0dab7ecd73 master: Include RID in worker exception messages
This helps when debugging unexpected shutdown problems
after the fact.
2019-01-20 19:45:50 +00:00
David Nadlinger e24e893303 master/scheduler: Fix misleading indentation [nfc] 2019-01-20 19:45:47 +00:00
David Nadlinger c213ab13ba sync_struct: Notifier.{read -> raw_view}, factor out common dict update code [nfc] 2019-01-19 20:19:17 +00:00
Sebastien Bourdeauducq 9793632282 enviromnment: rename 'save' in set_dataset to 'archive'. Closes #1171 2018-10-21 12:08:34 +08:00
David Nadlinger e3cfbfed06 master: Add minimal docstring to worker_impl [nfc] 2018-10-14 10:41:32 +08:00
David Nadlinger 64b9a377da master: Factor RIDCounter out into own module, explain worker_db module [nfc]
The docstrings are quite minimal still, but should already
help with navigating the different layers when getting
accustomed with the code base.

RIDCounter was moved to its own module, as it isn't really
related to the other classes (used from master only).
2018-10-14 10:41:32 +08:00
David Nadlinger 4641ddf002 master: Remove unused import [nfc] 2018-10-14 10:41:32 +08:00
Sebastien Bourdeauducq ea7f925852 Revert "worker_db: Only warn on repeated archive read if dataset changed"
Breaks numpy arrays.

This reverts commit 141fcaaa8a.
2018-07-13 10:41:06 +08:00
David Nadligner 141fcaaa8a worker_db: Only warn on repeated archive read if dataset changed
In larger experiments, it is quite natural for the same dataset
to be read from multiple unrelated components. The only situation
where multiple reads from an archived dataset are problematic is
when the valeu actually changes between reads. Hence, this commit
restricts the warning to the latter situation.
2018-07-12 10:15:42 +08:00
Sebastien Bourdeauducq 9153c4d8a3 use tokenize.open() to open Python source files
Fixes encoding issues especially with device databases modified in obscure editors.
2018-07-07 17:04:56 +08:00
Sebastien Bourdeauducq 0b4d06c7a9 worker: keep sys.modules untouched until the end of examine() 2018-05-02 12:50:37 +08:00
Sebastien Bourdeauducq 8079aa6d20 worker: python docs recommend not replacing sys.modules 2018-05-02 12:48:50 +08:00
Sebastien Bourdeauducq 8c69d939fb worker: restore sys.modules in examine() (#976) 2018-05-02 12:32:35 +08:00
Robert Jördens dd6c48fed2 Merge branch 'master' into epoch_time 2017-08-03 12:55:01 +02:00
Chris Ballance cc289dd3a0 master: store run_time and start_time as doubles 2017-08-03 10:41:57 +01:00
Chris Ballance 223501f811 master: use epoch time for timestamps (closes #726) 2017-08-03 10:30:31 +01:00
Chris Ballance eabca1f311 master: correct example datestring in help 2017-08-03 10:12:52 +01:00
Chris Ballance 810bb69989 master: rotate logs at midnight, rather than on log size 2017-08-03 00:31:04 +01:00
Sebastien Bourdeauducq e4a631a3d7 scheduler: consider the pipeline flushed if everything has a lower priority than us. Closes #640 2017-05-22 18:43:59 +08:00
Sebastien Bourdeauducq cd757c0f16 generate device database from executable python file 2017-05-18 23:14:55 +08:00
Chris Ballance 8ebb33c05c master: record time run() is called 2017-04-26 23:36:19 +08:00
Sebastien Bourdeauducq 432c6b99e2 master: still save results when analyze fails. Closes #684 2017-03-27 17:57:02 +08:00
Chris Ballance 639066c6d8 Add tooltips to experiment arguments 2017-02-03 17:53:40 +08:00
Sebastien Bourdeauducq 780d6d152c worker: fix handling of archive parameter during dataset get in examine 2017-01-07 16:20:17 +01:00
Sebastien Bourdeauducq 6aa13fbf25 master/worker_db: set default value for archive 2016-10-19 20:12:16 +08:00
Sebastien Bourdeauducq 5d184f8061 master: keep dataset manager consistent when set_dataset is called with contradictory attributes 2016-10-18 17:11:07 +08:00
Sebastien Bourdeauducq 69d96b0158 master: archive input datasets. Closes #587 2016-10-18 17:11:07 +08:00
Sebastien Bourdeauducq ed2624545f master: ensure same dataset is in broadcast and local when mutating 2016-10-18 17:11:07 +08:00
Sebastien Bourdeauducq 1908339d4e scheduler: default submission arguments, closes #577 2016-10-18 17:11:06 +08:00
Sebastien Bourdeauducq 69099691f7 doc: clarify usage of pause/check_pause, closes #571 2016-10-17 20:08:15 +08:00
Sebastien Bourdeauducq 387688354c master: optimize repository scan, closes #546 2016-09-09 19:19:01 +08:00
Sebastien Bourdeauducq e45c089428 master, dashboard: support applet requests from experiments 2016-09-05 00:53:44 +08:00
Sebastien Bourdeauducq 84f4725015 cache source on import of modules that may contain kernels. Closes #416 2016-08-06 12:01:49 +08:00
Robert Jördens 9ca27e6d7f worker_impl: style 2016-07-09 16:58:19 +02:00
Robert Jördens cfb9fb808c worker: also return DummyDevice from ExamineDeviceMgr 2016-07-09 16:53:28 +02:00
Robert Jördens 2a5a1f320f browser, worker: feed experiments dummy devices, closes #454
* just returning `None` as dummy device (like ExamineDeviceMgr)
is not explicit enough, certainly hard to debug
* introducing a special flag for the `build` action does not
seem the right place
2016-07-08 01:23:28 +02:00
Sebastien Bourdeauducq fdc25777da master/dataset_db: support keeping old persist flag 2016-07-03 12:19:01 +08:00
Sebastien Bourdeauducq 4c8a8357b0 worker: increase send_timeout (Windows can be really slow) 2016-07-03 12:18:34 +08:00