Unsigned Integer Support #213

Merged
sb10q merged 1 commits from uint into master 2024-08-17 17:37:19 +08:00
Collaborator

Add support for uint32 and uint64, with corresponding integer type conversions and irrt support.


One thing unsure about the u64 constant support:

Currently I attempted to change the parser output of constant integers from i64 to u64 instead of i128 or a (u64, sign). Performance-wise i64 would be better than using i128 or (u64, sign), but now the constant integer bound check will be not available, and the interpretation of types of constant integers will be entirely depending on the annotations like int64(..) or uint32(..) or no special annotation(which will be int32).

Should we use

  • the current way
  • or use i128 or (u64, sign) for a more comprehensive support for u64 constants
  • or do not support u64 constants directly since they seems to be less common? (they should still be able to be created by leveraging llvm constant optimization, for example to get u64::MAX:
a = uint64(1) << uint64(63)
print_uint64((a - uint64(1)) + a)

output ir under current optimization:

define void @__modinit__() local_unnamed_addr personality i32 (...)* @__nac3_personality {
init:
 tail call void @print_uint64(i64 -1)
 ret void
}

print:

print_uint64: 18446744073709551615
  • or some other ways?
Add support for `uint32` and `uint64`, with corresponding integer type conversions and irrt support. --- **One thing unsure about the `u64` constant support:** Currently I attempted to change the parser output of constant integers from `i64` to `u64` instead of `i128` or a `(u64, sign)`. Performance-wise `i64` would be better than using `i128` or `(u64, sign)`, but now the constant integer bound check will be not available, and the interpretation of types of constant integers will be entirely depending on the annotations like `int64(..)` or `uint32(..)` or no special annotation(which will be int32). Should we use - the current way - or use `i128` or `(u64, sign)` for a more comprehensive support for `u64` constants - or do not support `u64` constants directly since they seems to be less common? (they should still be able to be created by leveraging llvm constant optimization, for example to get `u64::MAX`: ```python a = uint64(1) << uint64(63) print_uint64((a - uint64(1)) + a) ``` output ir under current optimization: ```llvm define void @__modinit__() local_unnamed_addr personality i32 (...)* @__nac3_personality { init: tail call void @print_uint64(i64 -1) ret void } ``` print: ``` print_uint64: 18446744073709551615 ``` - or some other ways?
ychenfo force-pushed uint from 562054bca1 to ba7afc8928 2022-03-06 04:38:01 +08:00 Compare
Author
Collaborator

force-pushed to rebase on the latest master branch

force-pushed to rebase on the latest master branch
Owner

What about making the parser represent:

  • positive numbers that fit in u64 as Int(Some(u64)).
  • positive numbers that do not fit in u64 as Int(None).
  • negative numbers whose absolute value fits in u64 as USub(Int(Some(u64))).
  • negative numbers whose absolute value does not fit in u64 as USub(Int(None)).

This way we do not lose any useful information at the parsing stage, and I think the code structure of the compiler should be the simplest with this design.

Note that the compiler could still report an overflow e.g. when processing USub(Int(Some(u64))), or when converting Int(Some(u64)) into i64.

What about making the parser represent: * positive numbers that fit in ``u64`` as ``Int(Some(u64))``. * positive numbers that do not fit in ``u64`` as ``Int(None)``. * negative numbers whose absolute value fits in ``u64`` as ``USub(Int(Some(u64)))``. * negative numbers whose absolute value does not fit in ``u64`` as ``USub(Int(None))``. This way we do not lose any useful information at the parsing stage, and I think the code structure of the compiler should be the simplest with this design. Note that the compiler could still report an overflow e.g. when processing ``USub(Int(Some(u64)))``, or when converting ``Int(Some(u64))`` into ``i64``.
Author
Collaborator

What about making the parser represent:

  • positive numbers that fit in u64 as Int(Some(u64)).
  • positive numbers that do not fit in u64 as Int(None).
  • negative numbers whose absolute value fits in u64 as USub(Int(Some(u64))).
  • negative numbers whose absolute value does not fit in u64 as USub(Int(None)).

This way we do not lose any useful information at the parsing stage, and I think the code structure of the compiler should be the simplest with this design.

Note that the compiler could still report an overflow e.g. when processing USub(Int(Some(u64))), or when converting Int(Some(u64)) into i64.

Thanks for the suggestion! After thinking and trying it for some time I think this would be good, as long as we are careful to handle a deeper ast representing integer constants in type check and codegen when we are traversing the ast.

> What about making the parser represent: > * positive numbers that fit in ``u64`` as ``Int(Some(u64))``. > * positive numbers that do not fit in ``u64`` as ``Int(None)``. > * negative numbers whose absolute value fits in ``u64`` as ``USub(Int(Some(u64)))``. > * negative numbers whose absolute value does not fit in ``u64`` as ``USub(Int(None))``. > > This way we do not lose any useful information at the parsing stage, and I think the code structure of the compiler should be the simplest with this design. > > Note that the compiler could still report an overflow e.g. when processing ``USub(Int(Some(u64)))``, or when converting ``Int(Some(u64))`` into ``i64``. Thanks for the suggestion! After thinking and trying it for some time I think this would be good, as long as we are careful to handle a deeper ast representing integer constants in type check and codegen when we are traversing the ast.
ychenfo force-pushed uint from ba7afc8928 to e7df755435 2022-03-07 04:41:34 +08:00 Compare
Author
Collaborator

force-pushed to rebase on the latest master branch

force-pushed to rebase on the latest master branch
Author
Collaborator

as long as we are careful to handle a deeper ast representing integer constants in type check and codegen when we are traversing the ast.

Update: deeper ast for constant should be fine without too much modification, we can modify the parser to limit the depth of ast representing int constants to be smaller than 2, and then handle these in the type check and codegen. I will finish the detailed implementation tomorrow.

> as long as we are careful to handle a deeper ast representing integer constants in type check and codegen when we are traversing the ast. Update: deeper ast for constant should be fine without too much modification, we can modify the parser to limit the depth of ast representing int constants to be smaller than 2, and then handle these in the type check and codegen. I will finish the detailed implementation tomorrow.
Contributor

Why not just i128? I don't think the performance penalty would be that large: we basically never touch the integer until we turn them into constants, and we will figure out their type when we turn them into constants. The memory overhead is not large either, Option<u64> will likely occupy similar size due to alignment requirements in 64 bits machine.

I think i128 will be a lot simpler to handle without so many different cases.

Why not just i128? I don't think the performance penalty would be that large: we basically never touch the integer until we turn them into constants, and we will figure out their type when we turn them into constants. The memory overhead is not large either, `Option<u64>` will likely occupy similar size due to alignment requirements in 64 bits machine. I think i128 will be a lot simpler to handle without so many different cases.
Owner

While they are uncommon, there would be a problem with constants (including in non-NAC3 code) that do not fit in i128.

While they are uncommon, there would be a problem with constants (including in non-NAC3 code) that do not fit in i128.
Contributor

Indeed, a simple hack would be to encode those constants with a i128 value that is invalid in nac3 code, such as i128::MAX. Option<i128> would require a bit larger size.

Indeed, a simple hack would be to encode those constants with a i128 value that is invalid in nac3 code, such as `i128::MAX`. `Option<i128>` would require a bit larger size.
Contributor

Yes it should be fine.

Yes it should be fine.
Owner

we can modify the parser to limit the depth of ast representing int constants to be smaller than 2, and then handle these in the type check and codegen

I didn't realize there would be an issue there. The i128::MAX option seems ok.

> we can modify the parser to limit the depth of ast representing int constants to be smaller than 2, and then handle these in the type check and codegen I didn't realize there would be an issue there. The ``i128::MAX`` option seems ok.
Author
Collaborator

Thanks for the suggestions! I will revert the changes and use i128::MAX instead.

Thanks for the suggestions! I will revert the changes and use `i128::MAX` instead.
ychenfo force-pushed uint from e7df755435 to 50bc5ee361 2022-03-08 02:38:49 +08:00 Compare
sb10q reviewed 2022-03-08 08:40:56 +08:00
@ -316,6 +323,20 @@ pub fn set_primitives_magic_methods(store: &PrimitiveStore, unifier: &mut Unifie
impl_comparison(unifier, store, int64_t, int64_t);
impl_eq(unifier, store, int64_t);
for t in [uint32_t, uint64_t] {
Owner

Why not t in [int32_t, int64_t, uint32_t, uint64_t] and remove the two blocks above?

Why not ``t in [int32_t, int64_t, uint32_t, uint64_t]`` and remove the two blocks above?
Author
Collaborator

Oh ok I will merge the two blocks above into this loop

Oh ok I will merge the two blocks above into this loop
Author
Collaborator

Done

Done
ychenfo marked this conversation as resolved
ychenfo force-pushed uint from 50bc5ee361 to 3ddfd85bf0 2022-03-08 13:43:36 +08:00 Compare
sb10q reviewed 2022-03-08 16:10:04 +08:00
@ -112,7 +112,7 @@ impl fmt::Display for Tok {
use Tok::*;
match self {
Name { name } => write!(f, "'{}'", ast::get_str_from_ref(&ast::get_str_ref_lock(), *name)),
Int { value } => if let Some(value) = value { write!(f, "'{}'", value) } else { write!(f, "'#OFL#'") },
Owner

We may want to keep the special treatment of i128::MAX here.

We may want to keep the special treatment of ``i128::MAX`` here.
Author
Collaborator

updated to keep the special treatment of i128::MAX

updated to keep the special treatment of `i128::MAX`
ychenfo force-pushed uint from 3ddfd85bf0 to d848c2284e 2022-03-08 18:23:33 +08:00 Compare
sb10q merged commit d848c2284e into master 2022-03-08 18:29:30 +08:00
sb10q deleted branch uint 2022-03-08 18:29:30 +08:00
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#213
No description provided.