idle kernel changes: run on startup, flashing new kernel after startup #276

Merged
sb10q merged 2 commits from srenblad/artiq-zynq:idle_kernel_rework_simple into master 2024-08-05 19:28:10 +08:00
1 changed files with 44 additions and 17 deletions

View File

@ -718,6 +718,26 @@ async fn handle_connection(
}
}
async fn load_and_run_idle_kernel(
buffer: &Vec<u8>,
control: &Rc<RefCell<kernel::Control>>,
up_destinations: &Rc<RefCell<[bool; drtio_routing::DEST_COUNT]>>,
aux_mutex: &Rc<Mutex<bool>>,
routing_table: &drtio_routing::RoutingTable,
timer: GlobalTimer,
) {
info!("Loading idle kernel");
let res = handle_flash_kernel(buffer, control, up_destinations, aux_mutex, routing_table, timer).await;
srenblad marked this conversation as resolved Outdated
Outdated
Review

formatting

formatting
match res {
Err(_) => warn!("error loading idle kernel"),
_ => (),
Outdated
Review

Why does this need to be aware of DRTIO?

DRTIO link status is to be handled manually by the user in idle kernels

Why does this need to be aware of DRTIO? DRTIO link status is to be handled manually by the user in idle kernels

Added in #279. Inspecting handle_flash_kernel reveals this Error indeed does not occur for idle kernels.

@mwojcik safe to remove?

Added in #279. Inspecting `handle_flash_kernel` reveals this Error indeed does not occur for idle kernels. @mwojcik safe to remove?

It was there to ensure that DRTIO was fully set up before a startup kernel would run. It could take a considerable while for all required destinations to start. With an idle kernel that can run on startup as well, I would recommend keeping it.

It was there to ensure that DRTIO was fully set up before a startup kernel would run. It could take a considerable while for all required destinations to start. With an idle kernel that can run on startup as well, I would recommend keeping it.
Outdated
Review

This is for the user to handle. That's why we have an API for it. The behavior hardcoded here of waiting for 1 second is messy and not applicable to all use cases.

This is for the user to handle. That's why we have an API for it. The behavior hardcoded here of waiting for 1 second is messy and not applicable to all use cases.

Since all subkernels are distributed before the kernel starts (to reduce load and latency when the kernels are actually running), that doesn't give the user an opportunity to act accordingly. However, since idle kernels will be retried and restarted until they can finally run, this check could be removed here.

Since all subkernels are distributed before the kernel starts (to reduce load and latency when the kernels are actually running), that doesn't give the user an opportunity to act accordingly. However, since idle kernels will be retried and restarted until they can finally run, this check could be removed here.
Outdated
Review

Idle subkernels and startup subkernels need to be flashed into their respective satellites. I think I mentioned this already.

Idle subkernels and startup subkernels need to be flashed into their respective satellites. I think I mentioned this already.

Oh then it would make perfect sense, but that's feasible only after artiq#2498 is implemented. So to get this PR merged before that, let's remove that part.

Oh then it would make perfect sense, but that's feasible only after [artiq#2498](https://github.com/m-labs/artiq/issues/2498) is implemented. So to get this PR merged before that, let's remove that part.
}
info!("Running idle kernel");
let _ = handle_run_kernel(None, control, up_destinations, aux_mutex, routing_table, timer)
srenblad marked this conversation as resolved Outdated

isn't there a way to reuse the idle kernel start-up code?

isn't there a way to reuse the idle kernel start-up code?

there might be, let me have a look!

there might be, let me have a look!
.await.map_err(|_| warn!("error running idle kernel"));
info!("Idle kernel terminated");
}
pub fn main(timer: GlobalTimer, cfg: Config) {
let net_addresses = net_settings::get_addresses(&cfg);
info!("network addresses: {}", net_addresses);
@ -808,8 +828,30 @@ pub fn main(timer: GlobalTimer, cfg: Config) {
mgmt::start(cfg);
task::spawn(async move {
let connection = Rc::new(Semaphore::new(1, 1));
let connection = Rc::new(Semaphore::new(0, 1));
let terminate = Rc::new(Semaphore::new(0, 1));
{
let control = control.clone();
let idle_kernel = idle_kernel.clone();
let connection = connection.clone();
let terminate = terminate.clone();
let up_destinations = up_destinations.clone();
let aux_mutex = aux_mutex.clone();
let routing_table = drtio_routing_table.clone();
task::spawn(async move {
let routing_table = routing_table.borrow();
select_biased! {
_ = (async {
if let Some(buffer) = &*idle_kernel {
load_and_run_idle_kernel(&buffer, &control, &up_destinations, &aux_mutex, &routing_table, timer).await;
}
}).fuse() => (),
_ = terminate.async_wait().fuse() => ()
}
connection.signal();
});
}
loop {
let mut stream = TcpStream::accept(1381, 0x10_000, 0x10_000).await.unwrap();
@ -837,22 +879,7 @@ pub fn main(timer: GlobalTimer, cfg: Config) {
.await
.map_err(|e| warn!("connection terminated: {}", e));
if let Some(buffer) = &*idle_kernel {
info!("Loading idle kernel");
let res = handle_flash_kernel(&buffer, &control, &up_destinations, &aux_mutex, &routing_table, timer)
.await;
match res {
#[cfg(has_drtio)]
Err(Error::DestinationDown) => {
let mut countdown = timer.countdown();
delay(&mut countdown, Milliseconds(500)).await;
}
Err(_) => warn!("error loading idle kernel"),
_ => (),
}
info!("Running idle kernel");
let _ = handle_run_kernel(None, &control, &up_destinations, &aux_mutex, &routing_table, timer)
.await.map_err(|_| warn!("error running idle kernel"));
info!("Idle kernel terminated");
load_and_run_idle_kernel(&buffer, &control, &up_destinations, &aux_mutex, &routing_table, timer).await;
}
}).fuse() => (),
_ = terminate.async_wait().fuse() => ()