running kernel in a loop freezes #40
Labels
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: M-Labs/artiq-zynq#40
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This prints:
...and does not complete
Nothing suspicious in the Zynq log.
Note that the Zynq does not crash from this; interrupting the experiment with Ctrl-C and running it again without rebooting the Zynq will give the same result.
I've added some logs:
It seems that it the comms module did not receive a LoadKernel request here...
Interrupting the experiment with Ctrl-C and re-run again did not give any log related to shutting down connection etc.
Looks like a smoltcp/async problem. When this occurs, in the
proto_async.rs
expect
function,stream.recv
keeps calling the closure withbuf == [90, 90, 90]
even though the client had sent much more data (as seen on wireshark trace).I had removed the RPC code when testing this (as RPCs tends to cause corruption and insane behavior) so this is probably an unrelated and regular bug.
Deleted the kernel loading and execution code completely from core1, behavior is exactly the same. This continues to look like a Bohrbug.
Seeing this repeatedly after the kernel gets stuck when enabling feature
log
for smoltcp:Two more observations:
window_len: 0
when the connection gets stuckThe issue happens when TCP RX packets were fragmented in an unfortunate way. We may only get a tiny fragment of the RX buffer in a
recv()
callback.I am considering a revision of our
recv()
API and how we use it. ReturningPoll::Pending
from the callback is not useful due to this behavior.To demonstate, this resolves the problem for me:
On the other hand, let's have another look if smoltcp shouldn't be able to merge these fragments...
Ah, doesn't
TcpSocket::recv()
only return a single contiguous slice from the ring buffer? That is, don't you need either keep a buffer inTcpStream
to make theTcpStream::recv
API work, or change clients to be able to accept a list of buffers?In other words, seems like you'll always need to expect smoltcp's recv() to return incomplete data.
See e.g. https://github.com/smoltcp-rs/smoltcp/issues/359 for a discussion about changing the buffer data structure to avoid the extra copy in client code.
@astro: Do you think you'll get around to finishing this soon? If not, I can have a look.
One way of solving this would be to mandate that consumers take at least one byte when there's any data available. It shouldn't be too hard to fix the runtime's socket I/O functions to ensure that AFAICT.
@astro @pca006132 @dpn Thanks for finding out the root cause of the problem. I fixed it, now I can run 1000 iterations of a kernel in a loop without problem. If you think you have a better solution let me know.
Even if smoltcp was able to merge subsequent TCP segegments, we may still end up receiving at the tail end of the ring buffer. The current approach seems best.