core0 memory leak #85
Labels
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: M-Labs/artiq-zynq#85
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When core1 get restarted after core0 putting some messages in the channel, the channel would be reset and the messages are lost, so we cannot free them.
This can be solved by exposing an unsafe API to drive the read index forward and free the messages until it reaches the write index. This should be done before asking core1 to restart.
Can this be done before restarting core1, where it could still be reading from the queue?
I'd imagined we'd do all such maintenance (resetting queue state, in particular freeing memory) after resetting core1, where it would wait for core0 to signal a condition. The flow would look like this:
This way, any cache coherence/… weirdness during startup is neatly encapsulated away, and things are easy to reason about – during step 5, core0 is effectively a single-threaded program with exclusive access to all shared resources.
Currently we are not actually doing a core reset, we just trigger an interrupt and jump into the main function in core1. The cache maintainance operations are not needed, and we have cache coherency provided by the SCU.
And yes, maybe we should add some init routine to clean up the memory instead of modifying the queue while core1 is possibly doing some memory read.
My previous solution could cause race condition when we are dropping the memory while core1 is copying from it.
A simpler routine maybe:
Most are done now, we only have to add the startup mutex and do some cleanup in core0.
Ah, yes, I didn't yet have a look at the latest reset implementation changes.
Your routine is as I described, only making the "reset" of core0's flag explicit (acquiring the mutex) – it should definitely work.