nal: Fix loops & socket handle management #7
Loading…
Reference in New Issue
No description provided.
Delete Branch "fix-nal"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
NetworkStack::connect()
:NetworkStack::update()
intoupdate()
(for controlling the clock) andpoll()
(for polling the smoltcp'sEthernetInterface
)update()
: Whenembedded_time::clock::Clock::try_now()
returns a timestamp earlier than the current timestamptime_ms
, it takes into consideration wheretry_now()
were non-sensical (e.g. the main application would reset the hardware cycle counter once during runtime as incortex-m-rtic
), and so advance time by the minimal amount (1ms for thesmoltcp::EthernetInterface::poll()
) instead of returning an errorupdate()
also now returns the amount of time advancedauto_time_update
; the main application is responsible for what valuesembedded_time::clock::Clock::try_now()
should returnClock::try_now()
until the hardware cycle counter is readyFix buffer reading/writing blocking leading to unhandled behaviour in minimq (issue was only relevant onFix read/write not pushing unused sockets (due to network errors) back to the stack83e946544c
)933687c2e4
nal::TcpStack::open()
is calledMqttClient::
read()
/write()
takes away the TCP socket handle (wrapped as anOption
) from itsRefCell
, and then callsnal::TcpStack::read()
/write()
; if NAL returnsnb::Error
, then the MQTT client will propagate and return the error, leavingNone
behindMqttClient::
socket_is_connected()
gets called (e.g. while polling the interface), it will detect that the socket handle isNone
, and attempt to callnal::TcpStack::open()
open()
pops a socket handle from the array (unused_handles
), we should ensure that any future calls ofopen()
will not pop from an empty array and returnNetworkError::NoSocket
open()
:MqttClient::poll()
, in whichsocket_is_connected()
is called, then as soon asMqttClient::read()
/write()
returns Err (e.g. due to network disconnection), the next iteration ofpoll()
must callopen()
, trying to pop from the arrayMqttClient
remains unable to replace the socket reference (e.g. byMqttClient::connect()
,MqttClient::poll()
would continue to loop and invokeopen()
until all socket handles have been popped from the arrayread()
andwrite()
.close()
from pushing duplicate handles for the samesmoltcp::socket::TcpSocket
smoltcp::socket::SocketHandle
implementsCopy
, so pushing the handle tousused_handles
does not actually take ownership of the handle but copies itopen()
andclose()
calls are imbalanced, then the program might start pushing the same socket multiple times; whileVec::pop()
automatically returns Err when the array is empty,Vec::push()
does not return Err when there are duplicates and so this should be handledExplanation
Currently, we primarily wish to use this driver to port the NGFW Booster to ENC424J600. Since quartiq's minimq is used, we want to provide an adapter in the ENC424J600 library for the network stack control required by minimq's MQTT client, such as connection to a broker at the client's instantiation, reconnection during polling, reading and writing to the data buffers. Therefore, we should aim to ensure that any call of the minimq functions would not lead to unexpected behaviour such as potential hanging and runtime errors.
As a proposed fix for hanging due to unsuccessful connection to the broker, the idea of adding a connection timeout is to let the function return at appropriate timing while still allowing some number of repeated attempts before relinquishing control over the main program. I am also allowing
connect()
to return the socket even if it is at TCP CLOSED state or if it still cannot connect to the broker before the timeout period. This was taking W5500's NAL as reference such that the main program or the MQTT client can try to re-connect in the future, instead of getting hanged by the network stack.In practice, the user might want to set the connection timeout to 0 such that
connect()
would not even try to take control over other important routines in the main application. For example, in the RTIC model both the MQTT client and a USB terminal are polling in parallel, and each of these tasks is monitored by the STM32 independent watchdog. This means each task is only allocated with a fixed time interval to return, and risking to loop would be undesirable.As a proposed fix for panicking due to blocking errors, the idea of replacing the blockingnb::Error
withnb::WouldBlock
can be seen as a temporary solution customised for minimq applications.WouldBlock
simply cannot give the most useful information about "what's wrong?" likeError
do.The proposed fix that enforces pushing the socket handle back to the array is mostly targeted on minimq applications, but I believe this also fits the general case because our read/write functionalities are already reporting network errors, which should signal the main application to restart connection on the socket.
Testing
This as been tested on NGFW Booster firmware with no occurence of any Ethernet-related errors or issues.
More on
nb::Error::WouldBlock
vsnb::Error::Other
: I just found there's an active PR on minimq which would eliminate the issue of panicking trying tounwrap()
aNone
, such that we can keep @occheung's original code to returnnb::Error::Other(NetworkError::ReadFailure)
andnb::Error::Other(NetworkError::WriteFailure)
. The portion of minimq's code concerned is through this link.I'm making a follow-up discussion with QUARTIQ's people over the comment threads of the said PR.In the minimq's PR I just mentioned, it also seems to prevent
MqttClient's
constructor from calling NAL'sconnect()
, which would keep looping until the connection is established. However, the PR doesn't change the way to handle failed re-connection attempts during the client's polling period, which means that my proposal of adding a time-out and/or returning a TCP CLOSED socket would still be helpful and might still be needed.29c8078ef9
tob372c37ae8
b372c37ae8
to2d3d4004e8
2d3d4004e8
toe7e08948d1
e7e08948d1
to99899e6657
I found a few small nitpicks. I do not have experience with minimq/rtic.
@ -53,0 +54,4 @@
interface: NetworkInterface<SPI, NSS>,
sockets: net::socket::SocketSet<'a>,
clock: IntClock,
connection_timeout_ms: u32,
Can you try to use an
embedded_time::duration
type wherever possible?I defined the connection timeout parameter as a
u32
because it is used to compare with the network stack struct'stime_ms
"time now", which is also stored asu32
. If I'm following @occheung 's logic correctly, we wanttime_ms
to stay asu32
rather than aembedded_time::duration::Milliseconds
because every time the Ethernet interface is busy polling, it has to borrow the "time now" value and conversion of the timestamp amongu32
, theembedded_time
duration type and thesmoltcp
duration type might consume more CPU cycles. Plus, I madeupdate()
return au32
representing the number of microseconds advanced, and so counting the total duration would be simpler.@ -149,0 +171,4 @@
{
let mut sockets = self.sockets.borrow_mut();
let internal_socket: &mut net::socket::TcpSocket = &mut *sockets.get(socket);
This isn't "internal" as in "some inner struct". I think it is more obvious to name this to what is behind this
SocketRef::T
, atcp_socket
.Perhaps @occheung meant that the
internal_socket
is the socket stored "internally" in thesmoltcp::socket::SocketSet
within theNetworkStack
, which returns the socket needed using thesocket
parameter passed to the method, which is actually asmoltcp::socket::SocketHandle
that was previously popped from the array of unused handles, done "externally" by callingopen()
?Anyway, since Rust doesn't do matching of names of method parameters between the trait and the implementation, in
66c3aa534f
I renamed thesocket
tohandle
andinternal_socket
tosocket
for clarity.@ -187,0 +220,4 @@
// "transient", so this function should keep waiting and let smoltcp poll
// (e.g. for handling echo reqeust/reply) at the same time.
timeout_ms += self.update()?;
self.poll()?;
While all the other steps are very nicely documented, this fairly important
self.poll()
disappears to my eyes. How about a pair of newlines around it?Right. Before,
poll()
was simply part ofupdate()
, and I decided to separate them simply for the sake of clarity.What do you think: is it good to keep them separated?
Smaller chunks with separated logic are always better.
In
232a08f110
, I added a pair of newlines, one before the comment line (that explains polling), and one afterself.poll()
. Would that be good?@astro Thanks a lot for reviewing this. I didn't update the PR description early enough before you pushed your review, and I just updated it to show some corrections.
The main thing I've changed is related to the minimq PR being merged to master recently, and thus has avoided the issue of not returning proper
Err()
because of theWouldBlock
issue.And with respect to this minimq commit, I've proposed closing the socket when
read()
/write()
returns error, since the MQTT client would now simply try toopen()
a new socket, without knowing that such a socket needs to be taken from a list of unusued socket handles in thesmoltcp
implementation of the TCP stack. And because of this, I've also proposed a way to avoid adding duplicate socket handles when callingclose()
. These changed are reflected on points 2 and 3 of the summary.nal: Fix blocking behavioursto nal: Fix loops & socket handle managementJust added
2eadb652ff
to deal with full TX buffer. We could checksmoltcp::socket::TcpSocket::can_send()
before doingsend_slice()
, but by just checking the number of bytes enqueued looks more straightforward thancan_send()
, which checks the state of the TX end on top of checking the buffer fullness.Testing with the Booster NGFW shows that it returns the
WriteFailure
error whenever the TCP connection has been closed, or the PHY linkup on the controller is down.@ -163,4 +194,3 @@
}
}
};
// Blocking connect
Please retain the wording blocking as it is exactly what this is.
Resolved in
232a08f110
by copying this original line of comment verbatim.ca2fea4ed4
to232a08f110