artiq

Commit Graph

Author	SHA1	Message	Date
David Nadlinger	dd928fc014	master: Fixup `32db6ff978` (argument_ui support) This was lost in the ndscan diff upstreaming process due to other Oxford-local changes in artiq.master.worker.	2022-06-19 11:33:40 +01:00
Sebastien Bourdeauducq	80d412a8bf	support submitting experiments by content	2022-03-20 12:58:55 +08:00
David Nadlinger	7955b63b00	master: Always write results to HDF5 once run stage is reached Previously, a significant risk of losing experimental results would be associated with long-running experiments, as any stray exceptions while run()ing the experiment – for instance, due to infrequent network glitches or hardware reliability issue – would cause no HDF5 file to be written. This was especially troublesome as long experiments would suffer from a higher probability of unanticipated failures, while at the same time being more costly to re-take in terms of wall-clock time. Unanticipated uncaught exceptions like that were enough of an issue that several Oxford codebases had come up with their own half-baked mitigation strategies, from swallowing all exceptions in run() by convention, to always broadcasting all results to uniquely named datasets such that the partial results could be recovered and written to HDF5 by manually run recovery experiments. This commit addresses the problem at its source, changing the worker behaviour such that an HDF5 file is always written as soon as run() starts.	2020-06-18 17:47:26 +01:00
Sebastien Bourdeauducq	3fd6962bd2	use sipyco (#585 )	2019-11-10 15:55:17 +08:00
Chris Ballance	8659c769cb	master/language: add methods to set experiment pipeline/priority/flush defaults	2019-03-12 10:54:15 +01:00
David Nadlinger	0dab7ecd73	master: Include RID in worker exception messages This helps when debugging unexpected shutdown problems after the fact.	2019-01-20 19:45:50 +00:00
Sebastien Bourdeauducq	387688354c	master: optimize repository scan, closes #546	2016-09-09 19:19:01 +08:00
Sebastien Bourdeauducq	4c8a8357b0	worker: increase send_timeout (Windows can be really slow)	2016-07-03 12:18:34 +08:00
Sebastien Bourdeauducq	aa61c29efb	transfer Python builtin exceptions over pc_rpc and master/worker	2016-04-04 22:02:42 +08:00
Robert Jördens	fef72506e4	ctlmgr/gui/master: start subprocesses in new pgroup This only makes a difference on POSIX. It prevents subprocesses from receiving the signals that the parent receives. For ctlmgr and master is cuts down on spam on the console (KeyboardInterrupt tracebacks from all controllers) and enforces that proper termination is followed. This does not help if the parent gets SIGKILL (subprocesses may linger).	2016-02-18 23:51:12 +01:00
Sebastien Bourdeauducq	155c2ec2ef	ctlmgr,worker: set PYTHONUNBUFFERED for subprocesses	2016-02-18 12:41:08 +01:00
Sebastien Bourdeauducq	6196aaf2f5	master/worker: increase timeouts. Windows VMs can be really slow.	2016-02-16 09:44:50 +01:00
Robert Jördens	53e5d0a7bb	worker: flake8 style cleanup	2016-02-02 15:32:40 -07:00
Robert Jördens	55006119c8	subprocesses: unify termination logic	2016-02-02 15:32:36 -07:00
Sebastien Bourdeauducq	5076c85ed6	worker: Windows VMs are slow, increase send_timeout	2016-01-27 21:15:22 +01:00
Sebastien Bourdeauducq	5aa4de8e89	refactor logging and implement in worker	2016-01-26 20:31:42 +01:00
Sebastien Bourdeauducq	a583a923d8	worker: use pipe_ipc (no log)	2016-01-26 14:59:36 +01:00
Sebastien Bourdeauducq	ae19f1c75d	master: add filename in worker log entries. Closes #226	2016-01-23 21:43:24 -05:00
Sebastien Bourdeauducq	cc6b808bf8	master: finer control of worker exception reporting. Closes #233	2016-01-23 21:23:02 -05:00
whitequark	6bf48e60ba	worker: make parent errors readable in log.	2016-01-16 02:06:40 +00:00
Sebastien Bourdeauducq	8467013160	master,gui: support recomputation+reset of arguments	2015-12-06 17:27:15 +08:00
Sebastien Bourdeauducq	32c95f24d0	worker: reduce some logging levels	2015-10-29 09:34:41 +08:00
Sebastien Bourdeauducq	0d53f7ab0d	ignore ProcessLookupError when killing subprocesses. Closes #167	2015-10-28 20:57:28 +08:00
Sebastien Bourdeauducq	1ada15ae5d	master: simplify worker/parent RPC	2015-10-28 17:35:57 +08:00
Sebastien Bourdeauducq	d13b368a65	build logging into worker	2015-10-20 18:11:50 +08:00
Sebastien Bourdeauducq	1d14975bd5	worker: cleaner termination on exception in user code, improve unittest	2015-10-13 01:11:57 +08:00
Sebastien Bourdeauducq	139072d402	Graceful experiment termination. Closes #76	2015-10-06 13:50:00 +08:00
Sebastien Bourdeauducq	b3584bc190	language,master,run: support raw access to DDB from experiments. Closes #123	2015-10-04 18:29:39 +08:00
Sebastien Bourdeauducq	f552d62b69	use Python 3.5 coroutines	2015-10-03 19:28:57 +08:00
Sebastien Bourdeauducq	125503139e	remove workaround for Python bug in asyncio process.wait(). Requires Python 3.5. Closes #58	2015-10-03 14:33:18 +08:00
Sebastien Bourdeauducq	7ed8fe57fa	Git support	2015-08-07 15:51:56 +08:00
Sebastien Bourdeauducq	8402f1cdcd	master,gui: basic log support	2015-07-22 05:13:50 +08:00
Sebastien Bourdeauducq	9ed4dcd7d1	repository: load experiments in worker, list arguments	2015-07-15 10:54:44 +02:00
Sebastien Bourdeauducq	7770ab64f2	worker: factor timeouts	2015-07-14 23:43:08 +02:00
Sebastien Bourdeauducq	96a5d73c81	worker: split build stage from prepare	2015-07-09 13:18:12 +02:00
Sebastien Bourdeauducq	c71fe29792	simplify unit system and use floats by default	2015-06-26 16:34:37 +02:00
Sebastien Bourdeauducq	a6a476593e	worker: wait for process termination This prevents stray SIGCHLDs from crashing the program e.g. if the asyncio event loop is closed before the process actually terminates.	2015-06-05 00:37:26 +08:00
Sebastien Bourdeauducq	c843c353d7	worker: remove useless process wait	2015-06-05 00:05:38 +08:00
Yann Sionneau	60bdf74137	tests: use try/finally to close event loop + wait for process to die after killing it	2015-06-04 13:40:13 +02:00
Sebastien Bourdeauducq	78f9268277	worker: add note about correct use of close()	2015-06-04 11:30:34 +08:00
Sebastien Bourdeauducq	fc449509b8	scheduler: pass priority to experiments	2015-05-24 20:37:47 +08:00
Sebastien Bourdeauducq	b74b8d5826	Scheduling TNG	2015-05-17 16:11:00 +08:00
Sebastien Bourdeauducq	43a05c783d	worker: split write_results action	2015-03-11 19:06:46 +01:00
Sebastien Bourdeauducq	d5795fd619	master: watchdog support Introduces a watchdog context manager to use in the experiment code that terminates the process with an error if it times out. The syntax is: with self.scheduler.watchdog(20*s): ... Watchdogs timers are implemented by the master process (and the worker communicates the necessary information about them) so that they can be enforced even if the worker crashes. They can be nested arbitrarily. During yields, all watchdog timers for the yielding worker are suspended [TODO]. Setting up watchdogs is not supported in kernels, however, a kernel can be called within watchdog contexts (and terminating the worker will terminate the kernel [TODO]). It is possible to implement a heartbeat mechanism using a watchdog, e.g.: for i in range(...): with self.scheduler.watchdog(...): .... Crashes/freezes within the iterator or the loop management would not be detected, but they should be rare enough.	2015-03-11 16:43:14 +01:00
Sebastien Bourdeauducq	f2134fa4b2	master,worker: split prepare/run/analyze	2015-03-09 23:34:09 +01:00
Sebastien Bourdeauducq	4c280d5fcc	master: use a new worker process for each experiment	2015-03-09 16:22:41 +01:00
Sebastien Bourdeauducq	ec1d082730	remove timeout from run_params (to be replaced by a better mechanism)	2015-03-09 10:51:32 +01:00
Sebastien Bourdeauducq	cc172699ea	master: use RID + unit class name for HDF5 filenames	2015-02-20 14:11:55 -07:00
Sebastien Bourdeauducq	4d21b78314	master,client,gui: factor timeout into run_params	2015-02-19 20:03:55 -07:00
Sebastien Bourdeauducq	c69c4d5ce9	master: expose scheduler API to experiments	2015-02-19 12:09:11 -07:00

1 2

52 Commits