Region-based Memory Management #18
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The escape analysis in the existing artiq compiler is both unsound (could be fixed) and restrictive. The restriction is due to requiring everything to be stack-allocated, data used by both the caller and callee must be allocated by the caller. Professor Lionel PARREAUX (@LPTK) proposed an alternative design by using region-based memory management. This issue would give an overview of the idea.
Resources:
Our current memory management method is to allocate in the current stackframe, and the stackframe together with all allocated values would be forgotten once the current function returns. Hence, one cannot allocate in the callee and return it back to the caller, as the allocated value would be dropped. Instead, we may have to allocate the buffer from the caller and copy the data to it.
Region-based memory management is similar to existing stack allocation, the major difference is that, the allocation can choose which region to allocate. Consider the following pseudocode (the syntax is a bit different from the one in the notion notebook):
The
foo
function would take a region as a parameter, allocate the list in the region and return the list back to the caller. The caller could pass the region into thefoo
function, and use the returned list without memory corruption as the value is not dropped until the region is dropped.Note that we expect the user to write in an implicit syntax, which should be similar (if not the same) as the current artiq syntax. The above explicit syntax is the one we would get after doing region inference.
Each region is similar to a stackframe: 1) The region can only grow but not shrink, 2) Dropping the region would drop all values allocated in it, 3) Regions are dropped according to a stack discipline, the inner regions would be dropped before the outer region. To reduce memory overhead, one function can define multiple regions and drop some of them before the function is returned. However, note that as each region can grow independently, we cannot use a 1D stack to implement the region stack, otherwise an outer region can grow and overlap with an inner region. Skipping the implementation details for now, we should be able to allocate and drop variables and regions in constant time, under the assumption that they are smaller than a certain fixed size. This would be further explained at the end of this issue.
To avoid implicit memory leak, our system would prevent dropping allocated values (allocated in an outer region) within a loop. This would be done by a separate static analysis phase rather than in the type system.
Behavior of functions would be analyzed via effect system, which summerize the regions which the function would access/allocate/drop.
For implementation of the region, we would implement each region as a list of fixed-size blocks, and allocate variables within the blocks. When a block is full, we would allocate a new block for the region. These can be done in constant time by having a static buffer of blocks linked as a linked list. However, one problem would be allocating long arrays which can be a lot larger than the size of the fixed-size blocks. There are currently a few ways to implement them:
Thanks for the nice summary of the proposal, @pca006132.
Another approach to handling large arrays is to go through one layer of indirection. If the array size does not fit into a region, allocate all the elements of the array in separate regions, and represent the array as a table specifying in which of these regions each index range is allocated.
For instance, assume region blocks (or let's call them region pages) are 16KiB, or 16,384 bytes. If we have 1GiB of memory (which is apparently the case on the considered hardware), then there are at most 2^16 such pages to reference, so page addresses can be encoded on 16 bits, or two bytes.
Arrays up to about 16,378 bytes can be stored in a simple way, with each element after the other, except possibly for one discontinuity (if the array overlaps between two pages). Before the elements, the array representation includes both the array size and the number of elements of the array that are on the same page, as opposed to the next one. If we store pointers to the next page at the end of each region, we can easily access the following elements of the array which are located on the next page.
For bigger arrays, we could use an entire page as a table containing references to up to 8,192 other pages that hold the array elements. To know where the element at an index i is located, we divide i by 16,384 (using bit shift) and look up the page at this index in the table. We then use the remainder of the division to index the corresponding page. This method can accommodate arrays of up to 128MiB, which I assume would be enough for most applications.
Bigger arrays could probably be preallocated statically, and live for the duration of the program.
We want compatibility with FORTRAN (to be able to use linear algebra libraries), so this does not sound like a possibility.
Ah ok, I see.
In that case, the simplest solution is to force these arrays to be on the call stack (or on an auxiliary stack), as you say. This could be enforced in the language by having to declare and use them as
with
resources, and the type system could ensure that references to them would not leak.Reference-counting would not work, unless we choose to do a lot more work upon region deallocation, scanning the deallocated region for references whose counts need decrementing. But this would make region deallocation not constant-time anymore, and memory management would then introduce hard-to-predict latencies.