There have been multiple instances where I had the need to iterate over
type variables, only to discover that the traversal order is arbitrary.
This commit fixes that by adding SortedMapping, which utilizes BTreeMap
internally to guarantee a traversal order. All instances of VarMap are
now refactored to use this to ensure that type variables are iterated in
the order of its variable ID, which should be monotonically incremented
by the unifier.
A lot of refactoring was performed, specifically with relaxing
expression codegen to return Option in case where ellipsis are used
within a subexpression.
All allocas for temporary objects are now placed in the beginning of the
function. Allocas for on-temporary objects are not modified because
these variables may appear in a loop and thus must be uniquely
allocated by different allocas.
Previously, the IR which sets up the call to the target function will
have its debug location pointing at the last argument of the function
call instead of the function call itself.
All allocas for temporary objects are now placed in the beginning of the
function. Allocas for on-temporary objects are not modified because
these variables may appear in a loop and thus must be uniquely
represented.
old_loop_target is only assigned if ctx.loop_target is overwritten,
meaning that if ctx.loop_target is never overwritten, ctx.loop_target
will always be overwritten to None.
We fix this by only restoring from old_loop_target if we previously
assigned to old_loop_target.
All parameters with a structure type in extern functions are marked as
`byref` instead of `byval`, as most ABIs require the first several
arguments to be passed in registers before spilling into the stack.
`byval` breaks this contract by explicitly requiring all arguments to be
passed in the stack, breaking interop with libraries written in other
languages.
In LLVM, i1 represents a 1-byte integer with a single valid bit; The
rest of the 7 upper bits are undefined. This causes problems when
using these variables in memory operations (e.g. memcpy/memmove as
needed by List slicing and assignment).
We fix this by treating all local boolean variables as i8 so that they
are well-defined for memory operations. Function ABIs will continue to
use i1, as memory operations cannot be directly performed on function
arguments or return types, instead they are always converted back into
local boolean variables (which are i8s anyways).
Fixes#315.
Previously, the final value of the target expression would be one after
the last element of the loop, which does not match Python's behavior.
This commit fixes this problem while also preserving the last assigned
value of the loop beyond the loop, matching Python's behavior.
The current default prefix is only derived from the instruction type,
which is not helpful during the comprehension of the IR. Changing to
anonymous names (e.g. %1) helps understand that the variable is only
needed as part of a larger (possibly named) expression.
Because it is unclear which variables are expressions and
subexpressions, all variables which are previously anonymous are named
using (1) the control flow statement if available, (2) the possible name
of the variable as inferred from the variable name in Rust, and (3) the
"addr" prefix to indicate that the values are pointers. These three
strings are joint together using '.', forming "for.i.addr" for instance.
Use store and load to handle if expression as the blocks might be changed when generating sub-expressions.
Reviewed-on: M-Labs/nac3#250
Co-authored-by: ychenfo <yc@m-labs.hk>
Co-committed-by: ychenfo <yc@m-labs.hk>
1. Function type variables should not include class type variables,
because they are not bound to the function.
2. Defer type variable constraint evaluation until we get all fields
definition.
Raise error when index out of range. Note that we use llvm.expect to
tell the optimizer that we expect not to raise an exception, so the
normal path performance would be better. If this assumption is violated,
the exception overhead might be slightly larger, but the percentage
increase in overhead should not be high since exception unwinding is
already pretty slow.
- No longer check if the statement will return. Instead, we check if
the current basic block is terminated, which is simpler and handles
exception/break/continue better.
- Use invoke statement when unwind is needed.
- Moved codegen for a block of statements into a separate function.
- Added `Exception` primitive type and some builtin exception types.
Note that all exception types share the same layout, and should
inherit from the base `Exception` type. There are some hacks in the
toplevel module for handling exception types, we should revisit and
fix them later.
- Added new primitive types to concrete type module, otherwise there
would be some weird type errors.
- Changed the representation of strings to CSlice<u8>, instead of
CString.
Behavior of parallel and sequential:
Each function call (indirectly, can be inside a sequential block) within a parallel
block will update the end variable to the maximum now_mu in the block.
Each function call directly inside a parallel block will reset the timeline after
execution. A parallel block within a sequential block (or not within any block) will
set the timeline to the max now_mu within the block (and the outer max now_mu will also
be updated).
Implementation: We track the start and end separately.
- If there is a start variable, it indicates that we are directly inside a
parallel block and we have to reset the timeline after every function call.
- If there is a end variable, it indicates that we are (indirectly) inside a
parallel block, and we should update the max end value.
Note: requires testing, it is difficult to inspect the output IR
Previously, we have to copy types from one unification table to another,
and make the table sendable. This requires cloning (processing) the
whole table 3 times per function call which is not efficient and uses
more memory than required when the unification table is large.
We now use a concrete type table to only copy the type we need. This
reduces the overhead as we only need to process the unification table
for once (when we do the function codegen), and reduces memory usage by
a bit (but not noticeable when the unification table is small, i.e. the
types are simple).