implement scalar conversion functions #19

Closed
opened 2021-09-24 10:49:42 +08:00 by sb10q · 23 comments

They are ubiquituous in ARTIQ: int32(), int64(), float(), round().
We also probably want to special-case int64(round()).
Float to integer conversion that overflows or has other problems (NaN etc.) should raise an exception to match Numpy behavior.
bool() is not very important but it makes sense to implement it at the same time.

They are ubiquituous in ARTIQ: ``int32()``, ``int64()``, ``float()``, ``round()``. We also probably want to special-case ``int64(round())``. Float to integer conversion that overflows or has other problems (NaN etc.) should raise an exception to match Numpy behavior. ``bool()`` is not very important but it makes sense to implement it at the same time.
pca006132 was assigned by sb10q 2021-09-24 10:49:42 +08:00
sb10q added the
high-priority
label 2021-09-24 12:47:48 +08:00
Poster
Owner

Float to integer conversion that overflows or has other problems (NaN etc.) should raise an exception to match Numpy behavior.

Though using the saturating LLVM intrinsics would give us constant folding/propagation for free, which is an important feature - for example, many delays in ARTIQ code are fixed and specified in SI (e.g. 1e-6 seconds), and we want the conversion to machine units to happen at compilation time to improve runtime performance.

Otherwise LLVM cannot do it and we have to reimplement constant propagation ourselves.

@lriesebos @dhslichter @cjbe @bradbqc thoughts?

> Float to integer conversion that overflows or has other problems (NaN etc.) should raise an exception to match Numpy behavior. Though using the saturating LLVM intrinsics would give us constant folding/propagation for free, which is an important feature - for example, many delays in ARTIQ code are fixed and specified in SI (e.g. 1e-6 seconds), and we want the conversion to machine units to happen at compilation time to improve runtime performance. Otherwise LLVM cannot do it and we have to reimplement constant propagation ourselves. @lriesebos @dhslichter @cjbe @bradbqc thoughts?
Poster
Owner

Otherwise LLVM cannot do it and we have to reimplement constant propagation ourselves.

Or we can implement the whole function (with the overflow check) in LLVM IR and make sure that SCCP/DCE passes take care of the overflow check at compilation time.

> Otherwise LLVM cannot do it and we have to reimplement constant propagation ourselves. Or we can implement the whole function (with the overflow check) in LLVM IR and make sure that SCCP/DCE passes take care of the overflow check at compilation time.

using the saturating LLVM intrinsics would give us constant folding/propagation for free, which is an important feature

IMHO deviating from the Numpy behavior here for significantly better performance / ease of implementation is totally worth it - from the nature of the compiler there are already, and always will be, deviations, so the user already has to consider this "similar but different" from standard python/numpy behaviour.

> using the saturating LLVM intrinsics would give us constant folding/propagation for free, which is an important feature IMHO deviating from the Numpy behavior here for significantly better performance / ease of implementation is totally worth it - from the nature of the compiler there are already, and always will be, deviations, so the user already has to consider this "similar but different" from standard python/numpy behaviour.

I also think it is fine to deviate from numpy behavior in this case, especially with the mentioned benefits in mind.

Also note the following behavior regarding the overflow exception:

>>> np.int32(float(2**45))
0
>>> np.int32(float(2**65))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long
I also think it is fine to deviate from numpy behavior in this case, especially with the mentioned benefits in mind. Also note the following behavior regarding the overflow exception: ```python >>> np.int32(float(2**45)) 0 >>> np.int32(float(2**65)) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C long ```
Poster
Owner

I suppose this is because it converts to int64 internally with overflow check, and then int64 -> int32 is allowed to overflow. Doesn't make much sense from a usability POV, so maybe we just shouldn't bother. The consistent saturating behavior of the LLVM intrinsic is actually more usable.

>>> np.int32(float(2**32))
0
>>> np.int32(float(2**32+1))
1
>>> np.int64(float(2**64))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long
>>> np.int32(2**32)
0
I suppose this is because it converts to int64 internally with overflow check, and then int64 -> int32 is allowed to overflow. Doesn't make much sense from a usability POV, so maybe we just shouldn't bother. The consistent saturating behavior of the LLVM intrinsic is actually more usable. ``` >>> np.int32(float(2**32)) 0 >>> np.int32(float(2**32+1)) 1 >>> np.int64(float(2**64)) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C long >>> np.int32(2**32) 0 ```

Agreed that we don't need to match numpy behavior, as long as we are explicit in documenting what the behavior will be. We just want to make sure that this change in behavior doesn't cause silent failures of higher-level functions (especially built-in numpy-like kernel math functions) that might call these sorts of type cast operations.

Agreed that we don't need to match numpy behavior, as long as we are explicit in documenting what the behavior will be. We just want to make sure that this change in behavior doesn't cause silent failures of higher-level functions (especially built-in numpy-like kernel math functions) that might call these sorts of type cast operations.

Round is hard to deal with. From the documentation:

Return number rounded to ndigits precision after the decimal point. If ndigits is omitted or is None, it returns the nearest integer to its input.

Our type system does not allow such ad-hoc polymorphism. Special-case int64(round(x)) would also be ugly. The simplest way might be to allow the return type to be either float/int32/int64, and choose according to the type inference result. (This would be impossible to implement in the language but we can do it with custom codegen)

Example:

[pca006132@pca-yoga:~/code/rust/nac3/nac3standalone]$ cat demo/casts.py 
# definitions for quick prototype
# not actually extern, but custom codegen based on input type
@extern
def float(a: int32) -> float:
    pass

@extern
def int32(a: float) -> int32:
    pass

@extern
def round(a: float) -> Num:
    pass

def round_up(a: float) -> Num:
    return round(a + 0.5)

def run() -> int32:
    a = 0
    b = float(a)
    c = round(b + 0.51)
    output_int(c)

    output_int(int32(1.1 + round(b)))

    output_int(round_up(1.1))
    output_int(round_up(1.5))
    output_int(int32(round_up(1.5)))

    return 0

[pca006132@pca-yoga:~/code/rust/nac3/nac3standalone]$ cargo run --release demo/casts && clang -lm -Wall -o output demo/demo.c module*.o && ./output
    Finished release [optimized + debuginfo] target(s) in 0.03s
     Running `/home/pca006132/code/rust/nac3/target/release/nac3standalone demo/casts`
setup time: 0ms
parse time: 0ms
analysis time: 0ms
codegen time (including LLVM): 3ms
total time: 4ms
1
1
2
2
2

Num is a type variable that can range over int32/int64/float. Note that round to n decimal places is not implemented.

Round is hard to deal with. From the documentation: > Return number rounded to `ndigits` precision after the decimal point. If `ndigits` is omitted or is None, it returns the nearest integer to its input. Our type system does not allow such ad-hoc polymorphism. Special-case `int64(round(x))` would also be ugly. The simplest way might be to allow the return type to be either float/int32/int64, and choose according to the type inference result. (This would be impossible to implement in the language but we can do it with custom codegen) Example: ``` [pca006132@pca-yoga:~/code/rust/nac3/nac3standalone]$ cat demo/casts.py # definitions for quick prototype # not actually extern, but custom codegen based on input type @extern def float(a: int32) -> float: pass @extern def int32(a: float) -> int32: pass @extern def round(a: float) -> Num: pass def round_up(a: float) -> Num: return round(a + 0.5) def run() -> int32: a = 0 b = float(a) c = round(b + 0.51) output_int(c) output_int(int32(1.1 + round(b))) output_int(round_up(1.1)) output_int(round_up(1.5)) output_int(int32(round_up(1.5))) return 0 [pca006132@pca-yoga:~/code/rust/nac3/nac3standalone]$ cargo run --release demo/casts && clang -lm -Wall -o output demo/demo.c module*.o && ./output Finished release [optimized + debuginfo] target(s) in 0.03s Running `/home/pca006132/code/rust/nac3/target/release/nac3standalone demo/casts` setup time: 0ms parse time: 0ms analysis time: 0ms codegen time (including LLVM): 3ms total time: 4ms 1 1 2 2 2 ``` `Num` is a type variable that can range over int32/int64/float. Note that round to n decimal places is not implemented.
Poster
Owner

Just implement round(float) -> int32 and int64(round(float)) -> int64. The current compiler does not implement ndigits and nobody asked for it.

If the special case is so ugly, we can also just have another round64() function. Though it seems to me that the special case can be dealt with using a simple AST pattern matching that turns int64(round()) into round64() internally.

Just implement ``round(float) -> int32`` and ``int64(round(float)) -> int64``. The current compiler does not implement ``ndigits`` and nobody asked for it. If the special case is so ugly, we can also just have another ``round64()`` function. Though it seems to me that the special case can be dealt with using a simple AST pattern matching that turns ``int64(round())`` into ``round64()`` internally.
Poster
Owner

Is that special case substantially different from int64(numeric_constant)?

Is that special case substantially different from ``int64(numeric_constant)``?

Is that special case substantially different from int64(numeric_constant)?

Nope, but you would need a pretty long (and deep) pattern matching. Just wondering why do we need to add special case everywhere.

> Is that special case substantially different from ``int64(numeric_constant)``? Nope, but you would need a pretty long (and deep) pattern matching. Just wondering why do we need to add special case everywhere.
Poster
Owner

It's just syntactic sugar, nothing wrong with having this kind of special case IMHO. Implement a separate round64() function instead if you don't like it.

It's just syntactic sugar, nothing wrong with having this kind of special case IMHO. Implement a separate ``round64()`` function instead if you don't like it.

round64() would need a Python implementation too for proper simulation. from that standpoint, it would be better to limit the number of "builtin" functions and not add a new one.

for round() with ndigits, we do not have any desires for that and I do not think I can imagine any future use cases at this moment.

`round64()` would need a Python implementation too for proper simulation. from that standpoint, it would be better to limit the number of "builtin" functions and not add a new one. for `round()` with `ndigits`, we do not have any desires for that and I do not think I can imagine any future use cases at this moment.

Implemented in 1d2a32b140

int64(round(x)) is not yet implemented. A round64 is provided for this.

Implemented in 1d2a32b14034f00cb1b8734eadc9345c772eff74 `int64(round(x))` is not yet implemented. A `round64` is provided for this.

ok well if we stick with round64(), then we need to provide the Python implementation in the ARTIQ library. Not sure if I can still convince people of int64(round(x)), but that is still the preferred solution for me because it does not introduce a new builtin and is therefore immediately simulatable.

ok well if we stick with `round64()`, then we need to provide the Python implementation in the ARTIQ library. Not sure if I can still convince people of `int64(round(x))`, but that is still the preferred solution for me because it does not introduce a new builtin and is therefore immediately simulatable.

they are different things. round(x) has this implicit number of digits, and could reasonably be expected to produce a float, while round64(x) would be explicitly rounding and converting to an int64. Numpy has the function rint(x) that rounds to an integer, if people prefer that nomenclature.

Providing a Python implementation of round64() seems like a pretty simple thing to do?

they are different things. `round(x)` has this implicit number of digits, and could reasonably be expected to produce a float, while `round64(x)` would be explicitly rounding and converting to an `int64`. Numpy has the function `rint(x)` that rounds to an integer, if people prefer that nomenclature. Providing a Python implementation of `round64()` seems like a pretty simple thing to do?
Poster
Owner

they are different things. round(x) has this implicit number of digits, and could reasonably be expected to produce a float

No, round() without ndigits always produces an integer in Python.

> they are different things. round(x) has this implicit number of digits, and could reasonably be expected to produce a float No, ``round()`` without ``ndigits`` always produces an integer in Python.

No, round() without ndigits always produces an integer in Python.

Correct, but it can take the optional ndigits argument and produce a float if ndigits != 0. And it may be desirable to have round() produce a float for some use cases. My sentiment is that if one is making a rounding method that always rounds to an integer, it's better to name it something other than round() to reduce potential for confusion. Which is why I like the round64() idea.

> No, round() without ndigits always produces an integer in Python. Correct, but it can take the optional ndigits argument and produce a float if ndigits != 0. And it may be desirable to have `round()` produce a float for some use cases. My sentiment is that if one is making a rounding method that always rounds to an integer, it's better to name it something other than `round()` to reduce potential for confusion. Which is why I like the `round64()` idea.

Well, as mentioned earlier, ndigits does not seem to be on the list to be implemented, so that would mean round() will always produce an integer. int64(round()) also seems completely clear to me without introducing new functions that users have to learn. And even if ndigits would be implemented, that would not interfere with the int64(round()) construct, its semantics are still clear. So personally, I do not see the need to add a new function.

compiler wise, if having round64() would make a huge difference, then that is fine with me.

Well, as mentioned earlier, `ndigits` does not seem to be on the list to be implemented, so that would mean `round()` will always produce an integer. `int64(round())` also seems completely clear to me without introducing new functions that users have to learn. And even if `ndigits` would be implemented, that would not interfere with the `int64(round())` construct, its semantics are still clear. So personally, I do not see the need to add a new function. compiler wise, if having `round64()` would make a huge difference, then that is fine with me.

I guess my point is that Python users may interpret round() as the usual "makes a float or an int, depending" function, and that if we named it something different (e.g. rint() or round64()), then it would be more readily apparent to the casual user that it is different from the usual Python round(). I think one of the big problems that ARTIQ has is that people try things they expect to work a certain way in Python, and then discover that ARTIQ-Python is different. If we can reduce the chance for this confusion it's nice. However, it's not something I want to fight strongly for if it's not a big deal to others.

I guess my point is that Python users may interpret `round()` as the usual "makes a float or an int, depending" function, and that if we named it something different (e.g. `rint()` or `round64()`), then it would be more readily apparent to the casual user that it is different from the usual Python `round()`. I think one of the big problems that ARTIQ has is that people try things they expect to work a certain way in Python, and then discover that ARTIQ-Python is different. If we can reduce the chance for this confusion it's nice. However, it's not something I want to fight strongly for if it's not a big deal to others.
Poster
Owner

I think one of the big problems that ARTIQ has is that people try things they expect to work a certain way in Python, and then discover that ARTIQ-Python is different.

The behavior is exactly the same as round(), we just reject the ndigits argument if it is passed.
Most of the time people will not use it AFAICT and it will just work the same as Python. If the function is named differently on the other hand then there will be a small surprise.

> I think one of the big problems that ARTIQ has is that people try things they expect to work a certain way in Python, and then discover that ARTIQ-Python is different. The behavior is exactly the same as ``round()``, we just reject the ``ndigits`` argument if it is passed. Most of the time people will not use it AFAICT and it will just work the same as Python. If the function is named differently on the other hand then there will be a small surprise.
sb10q added
low-priority
and removed
high-priority
labels 2021-10-01 09:29:21 +08:00
Poster
Owner

int64(round()) also seems completely clear to me without introducing new functions that users have to learn.

They still have to learn that round() may overflow and that the trick is to add int64() around it. A purist may also think that the int64() fix isn't obvious since according to function call semantics it would only get the value after the overflow has already occurred.

Anyway enough bike-shedding. Summarizing the current status of this issue:

  • bool() is still missing.
  • int64(round()) vs. round64() I don't really mind either way. @pca006132 seems to have strong opinion about having round64() and it's less/simpler code, so let's keep it this way. Adding round64 to CPython is trivial.
> int64(round()) also seems completely clear to me without introducing new functions that users have to learn. They still have to learn that ``round()`` may overflow and that the trick is to add ``int64()`` around it. A purist may also think that the ``int64()`` fix isn't obvious since according to function call semantics it would only get the value after the overflow has already occurred. Anyway enough bike-shedding. Summarizing the current status of this issue: * ``bool()`` is still missing. * ``int64(round())`` vs. ``round64()`` I don't really mind either way. @pca006132 seems to have strong opinion about having ``round64()`` and it's less/simpler code, so let's keep it this way. Adding ``round64`` to CPython is trivial.

int64(round()) also seems completely clear to me without introducing new functions that users have to learn.

They still have to learn that round() may overflow and that the trick is to add int64() around it. A purist may also think that the int64() fix isn't obvious since according to function call semantics it would only get the value after the overflow has already occurred.

Anyway enough bike-shedding. Summarizing the current status of this issue:

  • bool() is still missing.
  • int64(round()) vs. round64() I don't really mind either way. @pca006132 seems to have strong opinion about having round64() and it's less/simpler code, so let's keep it this way. Adding round64 to CPython is trivial.

Actually, the idea I have in mind is to make round output either an int32 or an int64 depending on the usage of its result. Although there would be a quirk due to the inference rules, that 1 + round(1.0) can be inferred but round(1.0) + 1 cannot be inferred, as + is treated as a function and the lhs is the object.

> > int64(round()) also seems completely clear to me without introducing new functions that users have to learn. > > They still have to learn that ``round()`` may overflow and that the trick is to add ``int64()`` around it. A purist may also think that the ``int64()`` fix isn't obvious since according to function call semantics it would only get the value after the overflow has already occurred. > > Anyway enough bike-shedding. Summarizing the current status of this issue: > > * ``bool()`` is still missing. > * ``int64(round())`` vs. ``round64()`` I don't really mind either way. @pca006132 seems to have strong opinion about having ``round64()`` and it's less/simpler code, so let's keep it this way. Adding ``round64`` to CPython is trivial. Actually, the idea I have in mind is to make round output either an int32 or an int64 depending on the usage of its result. Although there would be a quirk due to the inference rules, that `1 + round(1.0)` can be inferred but `round(1.0) + 1` cannot be inferred, as + is treated as a function and the lhs is the object.
Poster
Owner

Just leave it as it is now.

Just leave it as it is now.
Sign in to join this conversation.
No Milestone
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#19
There is no content yet.