netboot #100

Closed
pca006132 wants to merge 4 commits from pca006132/artiq-zynq:netboot into master

Implemented #96.

  • boot mode detected (JTAG/SD) by reading hardware register
  • if JTAG, enter netboot, if SD read boot.bin
  • read runtime and bitstream from boot.bin or the network in SZL (move init_gateware() functionality to SZL)
  • read IP/MAC in SZL with the same code as the runtime
  • use protocol compatible with artiq_netboot
  • add auto-retry feature to artiq_netboot
  • update HITL test script to use artiq_netboot
Implemented #96. - [x] boot mode detected (JTAG/SD) by reading hardware register - [x] if JTAG, enter netboot, if SD read boot.bin - [x] read runtime and bitstream from boot.bin or the network in SZL (move init_gateware() functionality to SZL) - [x] read IP/MAC in SZL with the same code as the runtime - [x] use protocol compatible with artiq_netboot - [ ] add auto-retry feature to artiq_netboot - [ ] update HITL test script to use artiq_netboot
sb10q reviewed 2020-08-31 16:36:50 +08:00
default.nix Outdated
@ -32,8 +32,10 @@ let
installPhase = ''
mkdir -p $out $out/nix-support
cp ../build/firmware.bin $out/firmware.bin

Call it runtime.bin for consistency.

Call it runtime.bin for consistency.
sb10q reviewed 2020-08-31 16:39:55 +08:00
@ -16,2 +17,4 @@
while core0_tx.is_none() {
core0_tx = CHANNEL_0TO1.lock().take();
// prevent deadlock...
// the compiler optimized this routine so fast that it would deadlock

Wait, why?

Wait, why?

I think the possible reason is due to the load-store exclusive used by the processor. The load-store exclusive would monitor for the cache line, and the store would fail if the cache lines was written.

Consider the following sequence:

core0: ldrex
; core0...
core1: ldrex
; core1...
core0: strex ; success as core1 have not yet modified the flag
core1: strex ; failed as core0 modified the flag
; repeat this sequence

With this sequence, core1 would be starved to death without every modifying the content of the mutex. Core0 would keep repeating the sequence as core1 haven't put the channels in the mutex.

Just my guess, not sure if this is what really happening in the hardware.

I think the possible reason is due to the load-store exclusive used by the processor. The load-store exclusive would monitor for the cache line, and the store would fail if the cache lines was written. Consider the following sequence: ``` core0: ldrex ; core0... core1: ldrex ; core1... core0: strex ; success as core1 have not yet modified the flag core1: strex ; failed as core0 modified the flag ; repeat this sequence ``` With this sequence, core1 would be starved to death without every modifying the content of the mutex. Core0 would keep repeating the sequence as core1 haven't put the channels in the mutex. Just my guess, not sure if this is what really happening in the hardware.
sb10q reviewed 2020-08-31 16:42:29 +08:00
@ -50,3 +50,3 @@
{
__heap0_start = .;
. += 0x800000;
. += 0x8000000;

Please move into separate commit.

Please move into separate commit.
sb10q reviewed 2020-08-31 16:45:49 +08:00
@ -96,0 +57,4 @@
ram::init_alloc_core0();
match slcr::RegisterBlock::unlocked(|slcr| slcr.boot_mode.read().boot_mode_pins()) {
slcr::BootModePins::Jtag => netboot::netboot(ddr.ptr()),

Maybe define a section in the linker script for the executable. Then both the start address and the maximum length can be obtained from there.

Maybe define a section in the linker script for the executable. Then both the start address and the maximum length can be obtained from there.
sb10q reviewed 2020-08-31 16:46:22 +08:00
@ -96,0 +73,4 @@
let runtime = cfg
.read("runtime")
.expect("Expected runtime in SD card for SD card boot mode");
assert!(runtime.len() < 0x800000, "runtime binary too large");

I suggest not crashing on those errors, but going into netboot mode.

I suggest not crashing on those errors, but going into netboot mode.
sb10q reviewed 2020-08-31 16:47:01 +08:00
@ -97,0 +82,4 @@
}
};
info!("Preparing for runtime execution!");

nitpick: remove the exclamation mark

nitpick: remove the exclamation mark
  • add auto-retry feature to artiq_netboot
  • update HITL test script to use artiq_netboot

Merging before this is done will break the tests, so let's merge all the relevant PRs at the same time.

> - [ ] add auto-retry feature to artiq_netboot > - [ ] update HITL test script to use artiq_netboot Merging before this is done will break the tests, so let's merge all the relevant PRs at the same time.
sb10q reviewed 2020-09-01 10:50:07 +08:00
local_run.sh Outdated
@ -25,2 +28,3 @@
fi
openocd -f zc706.cfg -c "$load_bitstream_cmd load_image ../build/firmware/armv7-none-eabihf/release/szl; resume 0; exit"
openocd -f zc706.cfg -c "load_image ../build/firmware/armv7-none-eabihf/debug/szl; resume 0; exit"
artiq_netboot $load_bitstream_cmd -f ../build/runtime.bin -b $board_host

No delay?

No delay?
sb10q reviewed 2020-09-01 10:51:27 +08:00
@ -18,0 +20,4 @@
// the compiler optimized this routine so fast that it would deadlock
for _ in 0..100 {
nop();
}

Please remove from the netboot PR, as well as optimizations. Let's do one thing at a time. Also this is a tricky issue that needs proper consideration, this nop fix isn't looking good.

Please remove from the netboot PR, as well as optimizations. Let's do one thing at a time. Also this is a tricky issue that needs proper consideration, this ``nop`` fix isn't looking good.
sb10q reviewed 2020-09-01 10:52:19 +08:00
@ -29,0 +35,4 @@
if (slot as usize) < (&__stack1_end as *const u32 as usize) {
panic!("Core1 stack overflow")
}
}

also split into separate PR

also split into separate PR
sb10q reviewed 2020-09-01 10:54:19 +08:00
@ -0,0 +141,4 @@
) -> Result<(), PlLoadingError> {
let size = locate_bitstream(file)?;
unsafe {
// align to 64 bytes

This wasn't done before?

This wasn't done before?

yes, I just found that the alignment is required.

perhaps the allocator would have some alignment by the smallest block size or something.

yes, I just found that the alignment is required. perhaps the allocator would have some alignment by the smallest block size or something.
sb10q reviewed 2020-09-01 10:56:02 +08:00
@ -75,0 +76,4 @@
}
};
info!("Loading gateware!");

remove exclamation mark

remove exclamation mark
sb10q reviewed 2020-09-01 11:05:24 +08:00
@ -75,0 +72,4 @@
let cfg = match Config::new() {
Ok(cfg) => cfg,
Err(_) => {
panic!("No config for SZL");

Go to netboot there as well

Go to netboot there as well
sb10q reviewed 2020-09-01 11:06:22 +08:00
@ -75,0 +80,4 @@
load_pl::load_bitstream_from_sd().unwrap();
info!("Loading runtime");
let runtime = cfg.read("runtime");

Why not use the same format as FSBL? We're already doing that for the bitstream. And FSBL compatibility is nice to have in practice, as #94 shows.

Why not use the same format as FSBL? We're already doing that for the bitstream. And FSBL compatibility is nice to have in practice, as https://git.m-labs.hk/M-Labs/artiq-zynq/issues/94 shows.

closing to split into multiple PRs.

closing to split into multiple PRs.
pca006132 closed this pull request 2020-09-01 13:54:11 +08:00
sb10q reviewed 2020-09-01 15:31:58 +08:00
@ -48,4 +44,2 @@
mod irq;
fn init_gateware() {
// Set up PS->PL clocks

This must stay

This must stay

Pull request closed

Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/artiq-zynq#100
There is no content yet.