Commit Graph

18 Commits

Author SHA1 Message Date
96a5d73c81 worker: split build stage from prepare 2015-07-09 13:18:12 +02:00
c71fe29792 simplify unit system and use floats by default 2015-06-26 16:34:37 +02:00
a6a476593e worker: wait for process termination
This prevents stray SIGCHLDs from crashing the program e.g. if the asyncio event loop is closed before the process actually terminates.
2015-06-05 00:37:26 +08:00
c843c353d7 worker: remove useless process wait 2015-06-05 00:05:38 +08:00
Yann Sionneau
60bdf74137 tests: use try/finally to close event loop + wait for process to die after killing it 2015-06-04 13:40:13 +02:00
78f9268277 worker: add note about correct use of close() 2015-06-04 11:30:34 +08:00
fc449509b8 scheduler: pass priority to experiments 2015-05-24 20:37:47 +08:00
b74b8d5826 Scheduling TNG 2015-05-17 16:11:00 +08:00
43a05c783d worker: split write_results action 2015-03-11 19:06:46 +01:00
d5795fd619 master: watchdog support
Introduces a watchdog context manager to use in the experiment code that
terminates the process with an error if it times out. The syntax is:

with self.scheduler.watchdog(20*s):
   ...

Watchdogs timers are implemented by the master process (and the worker
communicates the necessary information about them) so that they can be
enforced even if the worker crashes. They can be nested arbitrarily.
During yields, all watchdog timers for the yielding worker are
suspended [TODO]. Setting up watchdogs is not supported in kernels,
however, a kernel can be called within watchdog contexts (and terminating
the worker will terminate the kernel [TODO]).

It is possible to implement a heartbeat mechanism using a watchdog, e.g.:

for i in range(...):
    with self.scheduler.watchdog(...):
        ....

Crashes/freezes within the iterator or the loop management would not be
detected, but they should be rare enough.
2015-03-11 16:43:14 +01:00
f2134fa4b2 master,worker: split prepare/run/analyze 2015-03-09 23:34:09 +01:00
4c280d5fcc master: use a new worker process for each experiment 2015-03-09 16:22:41 +01:00
ec1d082730 remove timeout from run_params (to be replaced by a better mechanism) 2015-03-09 10:51:32 +01:00
cc172699ea master: use RID + unit class name for HDF5 filenames 2015-02-20 14:11:55 -07:00
4d21b78314 master,client,gui: factor timeout into run_params 2015-02-19 20:03:55 -07:00
c69c4d5ce9 master: expose scheduler API to experiments 2015-02-19 12:09:11 -07:00
3e22fe86b5 reorganize files as per discussion with Robert 2015-01-17 19:38:20 +08:00
070788a680 separate master modules 2015-01-14 12:16:49 +08:00