List Slice Support (#72) #140

Merged
sb10q merged 1 commits from list_slice into master 2022-01-13 18:53:48 +08:00
Collaborator

now getting list slice is done.


unsure: a good way to support dropping/extending part of the list by list slice assignment?

Python 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = list(range(10))
>>> a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> a[2:5] = []
>>> a
[0, 1, 5, 6, 7, 8, 9]
>>> a[2:5] = list(range(5))
>>> a
[0, 1, 0, 1, 2, 3, 4, 8, 9]
>>> a[2:5:2] = []
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: attempt to assign sequence of size 0 to extended slice of size 2
now getting list slice is done. --- unsure: a good way to support dropping/extending part of the list by list slice assignment? ```python Python 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> a = list(range(10)) >>> a [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> a[2:5] = [] >>> a [0, 1, 5, 6, 7, 8, 9] >>> a[2:5] = list(range(5)) >>> a [0, 1, 0, 1, 2, 3, 4, 8, 9] >>> a[2:5:2] = [] Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: attempt to assign sequence of size 0 to extended slice of size 2 ```
ychenfo changed title from WIP: List Slice Support to WIP: List Slice Support (#72) 2021-12-18 04:54:11 +08:00
Contributor

Do we have to support extending the list by slice assignment? We have to reallocate the list in that case. @sb10q

Do we have to support extending the list by slice assignment? We have to reallocate the list in that case. @sb10q
Owner

Do we have to support extending the list by slice assignment?

No, not very important. Not sure it can be done without fancy memory management, which is for later - there are far more pressing issues at the moment.

> Do we have to support extending the list by slice assignment? No, not very important. Not sure it can be done without fancy memory management, which is for later - there are far more pressing issues at the moment.
Owner

Should we merge this then?
We need an error message if it attempts to extend the list though. Right now it seems it would just corrupt memory?

Should we merge this then? We need an error message if it attempts to extend the list though. Right now it seems it would just corrupt memory?
Contributor

I wonder if we need such a complicated implementation... Can't we implement a builtin function for this?

T = TypeVar('T')

def slice_assign(target: list[T], value: list[T]):
    # implementation here...
    pass

and call this builtin function when we have slice assignment? It seems to me that such complicated code should be put in another function, otherwise the generated code would be bloated and probably not efficient (in terms of i-cache...). It would also be much more human readable this way.

I wonder if we need such a complicated implementation... Can't we implement a builtin function for this? ```python T = TypeVar('T') def slice_assign(target: list[T], value: list[T]): # implementation here... pass ``` and call this builtin function when we have slice assignment? It seems to me that such complicated code should be put in another function, otherwise the generated code would be bloated and probably not efficient (in terms of i-cache...). It would also be much more human readable this way.
Author
Collaborator

Can't we implement a builtin function for this?

Oh I think I get what you mean. Should we do this on the python side or on the rust side?

For getting list slice, I propose we still do it on the rust side, since do that on python side would require users to allocate the list of correct size prior to calling the builtin function if I understand it correctly?

For list slice assignment, we can implement it in the python side, and maybe somehow hard code in codegen to call that builtin function implicitly so that users can still use the python list slice assignment syntax?

> Can't we implement a builtin function for this? Oh I think I get what you mean. Should we do this on the python side or on the rust side? For getting list slice, I propose we still do it on the rust side, since do that on python side would require users to allocate the list of correct size prior to calling the builtin function if I understand it correctly? For list slice assignment, we can implement it in the python side, and maybe somehow hard code in codegen to call that builtin function implicitly so that users can still use the python list slice assignment syntax?
Contributor

Can't we implement a builtin function for this?

Oh I think I get what you mean. Should we do this on the python side or on the rust side?

For getting list slice, I propose we still do it on the rust side, since do that on python side would require users to allocate the list of correct size prior to calling the builtin function if I understand it correctly?

For list slice assignment, we can implement it in the python side, and maybe somehow hard code in codegen to call that builtin function implicitly so that users can still use the python list slice assignment syntax?

Yes I think we can do it like this.

> > Can't we implement a builtin function for this? > > > Oh I think I get what you mean. Should we do this on the python side or on the rust side? > > For getting list slice, I propose we still do it on the rust side, since do that on python side would require users to allocate the list of correct size prior to calling the builtin function if I understand it correctly? > > For list slice assignment, we can implement it in the python side, and maybe somehow hard code in codegen to call that builtin function implicitly so that users can still use the python list slice assignment syntax? Yes I think we can do it like this.
Owner

Could it go inside the compiler just like the current "def __modinit__():\n base.{}({})" of nac3artiq?

Could it go inside the compiler just like the current ``"def __modinit__():\n base.{}({})"`` of nac3artiq?
Contributor

Could it go inside the compiler just like the current "def __modinit__():\n base.{}({})" of nac3artiq?

Maybe we can embed some python files that contain things like this? Similar to how we handle the linker script now.

> Could it go inside the compiler just like the current ``"def __modinit__():\n base.{}({})"`` of nac3artiq? Maybe we can embed some python files that contain things like this? Similar to how we handle the linker script now.
Owner

As long as it is embedded in the nac3 library and doesn't create a package management mess, I'm fine with any method.

As long as it is embedded in the nac3 library and doesn't create a package management mess, I'm fine with any method.
Author
Collaborator

Dropping part of the list is done, need to throw exceptions later when slice assignment and access is out of bound/step and the size are mismatch/step is 0.

I found directly using python to write such internal functions a bit messy as we need to parse them, type check them, and make another symbol resolver just to deal with it (like adding another small nac3standalone in nac3core...), which might also reduce the compilation speed. So I still write llvm in rust, and internally make some functions for these complex operations so that the output ir will not be bloated. I am not fully confident whether adding functions to llvm module this way (and their symbol name) is the most adequete but it should be working. Please have a review, thanks!

Dropping part of the list is done, need to throw exceptions later when slice assignment and access is out of bound/step and the size are mismatch/step is 0. I found directly using python to write such internal functions a bit messy as we need to parse them, type check them, and make another symbol resolver just to deal with it (like adding another small nac3standalone in nac3core...), which might also reduce the compilation speed. So I still write llvm in rust, and internally make some functions for these complex operations so that the output ir will not be bloated. I am not fully confident whether adding functions to llvm module this way (and their symbol name) is the most adequete but it should be working. Please have a review, thanks!
Contributor

This still looks a bit too complicated.

  1. I think it suffices to generate two kinds of function, one for elements with size 32 bits, and one for 64 bits. (objects and tuples are stored as pointers, which are 32 bits on our boards iirc. we should add a usize type to codegen later)
  2. We should have fast path for step size = 1, using memmove.
  3. Is the implementation correct when the src and dest overlap? Consider a[1:2] = a[0:1], this implementation will do
a[1] = a[0]
a[2] = a[1] # = a[0]
This still looks a bit too complicated. 1. I think it suffices to generate two kinds of function, one for elements with size 32 bits, and one for 64 bits. (objects and tuples are stored as pointers, which are 32 bits on our boards iirc. we should add a usize type to codegen later) 2. We should have fast path for step size = 1, using memmove. 3. Is the implementation correct when the src and dest overlap? Consider `a[1:2] = a[0:1]`, this implementation will do ``` a[1] = a[0] a[2] = a[1] # = a[0] ```
Author
Collaborator

Is the implementation correct when the src and dest overlap?

Yes, getting list slices a[0:1] will always copy that sliced part according to python semantic, and llvm memmove intrinsic allows overlap? I can also optimize about this later but I think if the there are overlaps we would need to do some alloc anyway.. And we also do not support extending the list (only a[:] = a will be correct), so if I did not miss anythink I think currently the assignment rhs will not be the exact same list on the lhs?

We should have fast path for step size = 1, using memmove.

Thanks I see, will implement later.

I think it suffices to generate two kinds of function, one for elements with size 32 bits, and one for 64 bits. (objects and tuples are stored as pointers, which are 32 bits on our boards iirc. we should add a usize type to codegen later)

I think I get the point, but I am not sure how to examine the size (at compile time?) for now. I need to look into inkwell and llvm documentation later.


So it seems that we indeed need to implement this in rust directly since we need things like memmove and dropping the list and modify the list's len behind the scene?

> Is the implementation correct when the src and dest overlap? Yes, getting list slices ``a[0:1]`` will always copy that sliced part according to python semantic, and [llvm memmove intrinsic](https://llvm.org/docs/LangRef.html#llvm-memmove-intrinsic) allows overlap? I can also optimize about this later but I think if the there are overlaps we would need to do some alloc anyway.. And we also do not support extending the list (only ``a[:] = a`` will be correct), so if I did not miss anythink I think currently the assignment rhs will not be the exact same list on the lhs? > We should have fast path for step size = 1, using memmove. Thanks I see, will implement later. > I think it suffices to generate two kinds of function, one for elements with size 32 bits, and one for 64 bits. (objects and tuples are stored as pointers, which are 32 bits on our boards iirc. we should add a usize type to codegen later) I think I get the point, but I am not sure how to examine the size (at compile time?) for now. I need to look into inkwell and llvm documentation later. --- So it seems that we indeed need to implement this in rust directly since we need things like memmove and dropping the list and modify the list's len behind the scene?
Author
Collaborator

I not sure how to do this.. or can we implement this in C or rust and link them to our library?

I not sure how to do this.. or can we implement this in C or rust and link them to our library?
Owner

or can we implement this in C or rust and link them to our library?

Output LLVM IR from rustc, use include_bytes!, and then read it with Module.parse_bitcode_from_buffer ?

> or can we implement this in C or rust and link them to our library? Output LLVM IR from rustc, use ``include_bytes!``, and then read it with ``Module.parse_bitcode_from_buffer`` ?
Author
Collaborator

or can we implement this in C or rust and link them to our library?

Output LLVM IR from rustc, use include_bytes!, and then read it with Module.parse_bitcode_from_buffer ?

Thanks for the suggestion! I will look into this and also maybe refer to this cpython implementation to further confirm our implementation.

> > or can we implement this in C or rust and link them to our library? > > Output LLVM IR from rustc, use ``include_bytes!``, and then read it with ``Module.parse_bitcode_from_buffer`` ? Thanks for the suggestion! I will look into this and also maybe refer to this [cpython implementation](https://github.com/python/cpython/blob/main/Objects/listobject.c#L2977-L3138) to further confirm our implementation.
Contributor

Note that #146 changes the slice representation, so this PR should be merged after that and change to the correct representation.

and llvm memmove intrinsic allows overlap?

Yes. The ‘llvm.memmove.*’ intrinsics move a block of memory from the source location to the destination location. It is similar to the ‘llvm.memcpy’ intrinsic but allows the two memory locations to overlap.

I think if the there are overlaps we would need to do some alloc anyway..

No.

>>> a = [1, 2, 3]
>>> a[1:3] = a[0:2]
>>> a
[1, 1, 2]

I think I get the point, but I am not sure how to examine the size (at compile time?) for now. I need to look into inkwell and llvm documentation later.

You can now get the pointer size from the code generator struct after merging #146. You can use inkwell::types::BasicType::size_of to get the size of basic types (integers, floats, etc.), and if the result is none it means that it is a pointer.

So it seems that we indeed need to implement this in rust directly since we need things like memmove and dropping the list and modify the list's len behind the scene?

I think there can are 2 cases.

  1. Slice assignment where both the src and dest have step 1. You can just emit 2 memmove calls. one for copying the content, and one for shifting the remaining elements in the dest array. Note that you have to check for whether dest.len >= src.len.
  2. Otherwise, allocate a temporary array to store the src slice, and copy the temporary array content to the dest slice with appropriate step. You can implement this in 2 functions (32/64 bits element size), and decide which one to call based on the element size. You can use stacksave and stackrestore to restore the stack space occupied by the temporary array after copying its content to reduce stack memory usage.
Note that #146 changes the slice representation, so this PR should be merged after that and change to the correct representation. > and llvm memmove intrinsic allows overlap? Yes. *The ‘llvm.memmove.\*’ intrinsics move a block of memory from the source location to the destination location. It is similar to the ‘llvm.memcpy’ intrinsic but allows the two memory locations to overlap.* > I think if the there are overlaps we would need to do some alloc anyway.. No. ```python >>> a = [1, 2, 3] >>> a[1:3] = a[0:2] >>> a [1, 1, 2] ``` > I think I get the point, but I am not sure how to examine the size (at compile time?) for now. I need to look into inkwell and llvm documentation later. You can now get the pointer size from the code generator struct after merging #146. You can use [inkwell::types::BasicType::size_of](https://thedan64.github.io/inkwell/inkwell/types/trait.BasicType.html#method.size_of) to get the size of basic types (integers, floats, etc.), and if the result is none it means that it is a pointer. > So it seems that we indeed need to implement this in rust directly since we need things like memmove and dropping the list and modify the list's len behind the scene? I think there can are 2 cases. 1. Slice assignment where both the src and dest have step 1. You can just emit 2 `memmove` calls. one for copying the content, and one for shifting the remaining elements in the dest array. Note that you have to check for whether `dest.len >= src.len`. 2. Otherwise, allocate a temporary array to store the src slice, and copy the temporary array content to the dest slice with appropriate step. You can implement this in 2 functions (32/64 bits element size), and decide which one to call based on the element size. You can use [stacksave](https://llvm.org/docs/LangRef.html#llvm-stacksave-intrinsic) and stackrestore to restore the stack space occupied by the temporary array after copying its content to reduce stack memory usage.
ychenfo force-pushed list_slice from 8986da22d3 to 1f71eacbd4 2021-12-29 05:56:26 +08:00 Compare
Author
Collaborator

We may need to document that we do not support a[:None:-1] (maybe we can after implementing the Option type?), which is equivalent to a[::-1], which is NOT equivalent to a[:0:-1] since the latter would exclude 0.

We may need to document that we do not support `a[:None:-1]` (maybe we can after implementing the Option type?), which is equivalent to `a[::-1]`, which is NOT equivalent to `a[:0:-1]` since the latter would exclude 0.
Contributor

We may need to document that we do not support a[:None:-1] (maybe we can after implementing the Option type?), which is equivalent to a[::-1], which is NOT equivalent to a[:0:-1] since the latter would exclude 0.

(I don't even know you can write something like this)

> We may need to document that we do not support `a[:None:-1]` (maybe we can after implementing the Option type?), which is equivalent to `a[::-1]`, which is NOT equivalent to `a[:0:-1]` since the latter would exclude 0. (I don't even know you can write something like this)
Owner

Neither did I and I suspect none of our users need it. We can omit it.

Neither did I and I suspect none of our users need it. We can omit it.
ychenfo force-pushed list_slice from e10ed55914 to ec43d7df7e 2021-12-30 05:19:04 +08:00 Compare
ychenfo force-pushed list_slice from ec43d7df7e to e00ddfbc68 2022-01-03 07:37:28 +08:00 Compare
Author
Collaborator

I end up still using c to implement the slice assignment function. Because if I did not miss something, we need VLA or explicit stack alloca for list assignment, and in rust, these two features are not easily available (rfc#618, rfc#1909) (it can be done for sure, but with some external crates(which does not seem very popular) that still calls some wrapper c code).

A issue is that I cannot seem to find a way to make clang emit bitcode without target information in our case, so when compiling for non-host targets, there will be some invalid flags warnings in the llvm function attribute. This should be fine since they seem to be ignored, but I am not fully confident whether this will cause any problem later.. I will look into how to delete all these function attributes in inkwell.

'x86-64' is not a recognized processor for this target (ignoring processor)
'generic' is not a recognized processor for this target (ignoring processor)
'+cx8' is not a recognized feature for this target (ignoring feature)
'+fxsr' is not a recognized feature for this target (ignoring feature)
'+mmx' is not a recognized feature for this target (ignoring feature)
'+sse' is not a recognized feature for this target (ignoring feature)
'+sse2' is not a recognized feature for this target (ignoring feature)
'+x87' is not a recognized feature for this target (ignoring feature)
'generic' is not a recognized processor for this target (ignoring processor)
'x86-64' is not a recognized processor for this target (ignoring processor)
'generic' is not a recognized processor for this target (ignoring processor)
'+cx8' is not a recognized feature for this target (ignoring feature)
'+fxsr' is not a recognized feature for this target (ignoring feature)
'+mmx' is not a recognized feature for this target (ignoring feature)
'+sse' is not a recognized feature for this target (ignoring feature)
'+sse2' is not a recognized feature for this target (ignoring feature)
'+x87' is not a recognized feature for this target (ignoring feature)
'generic' is not a recognized processor for this target (ignoring processor)
I end up still using c to implement the slice assignment function. Because if I did not miss something, we need VLA or explicit stack alloca for list assignment, and in rust, these two features are not easily available ([rfc#618](https://github.com/rust-lang/rfcs/issues/618), [rfc#1909](https://github.com/rust-lang/rfcs/pull/1909)) (it can be done for sure, but with some external crates([which](https://crates.io/crates/alloca) does not seem very popular) that still calls some wrapper c code). A issue is that I cannot seem to find a way to make clang emit bitcode without target information in our case, so when compiling for non-host targets, there will be some invalid flags warnings in the llvm function attribute. This should be fine since they seem to be ignored, but I am not fully confident whether this will cause any problem later.. I will look into how to delete all these function attributes in inkwell. ``` 'x86-64' is not a recognized processor for this target (ignoring processor) 'generic' is not a recognized processor for this target (ignoring processor) '+cx8' is not a recognized feature for this target (ignoring feature) '+fxsr' is not a recognized feature for this target (ignoring feature) '+mmx' is not a recognized feature for this target (ignoring feature) '+sse' is not a recognized feature for this target (ignoring feature) '+sse2' is not a recognized feature for this target (ignoring feature) '+x87' is not a recognized feature for this target (ignoring feature) 'generic' is not a recognized processor for this target (ignoring processor) 'x86-64' is not a recognized processor for this target (ignoring processor) 'generic' is not a recognized processor for this target (ignoring processor) '+cx8' is not a recognized feature for this target (ignoring feature) '+fxsr' is not a recognized feature for this target (ignoring feature) '+mmx' is not a recognized feature for this target (ignoring feature) '+sse' is not a recognized feature for this target (ignoring feature) '+sse2' is not a recognized feature for this target (ignoring feature) '+x87' is not a recognized feature for this target (ignoring feature) 'generic' is not a recognized processor for this target (ignoring processor) ```
ychenfo changed title from WIP: List Slice Support (#72) to List Slice Support (#72) 2022-01-03 08:02:10 +08:00
sb10q reviewed 2022-01-03 08:33:36 +08:00
sb10q reviewed 2022-01-03 08:36:26 +08:00
@ -87,3 +96,1 @@
"trunc",
)
.into(),
codegen_callback: Some(Arc::new(GenCall::new(Box::new(|ctx, _, fun, args| {
Owner

Is this changing more than the indentation?

Is this changing more than the indentation?
ychenfo marked this conversation as resolved
sb10q reviewed 2022-01-03 08:36:33 +08:00
@ -135,3 +144,1 @@
"zext",
)
.into(),
codegen_callback: Some(Arc::new(GenCall::new(Box::new(|ctx, _, fun, args| {
Owner

Is this changing more than the indentation?

Is this changing more than the indentation?
ychenfo marked this conversation as resolved
sb10q reviewed 2022-01-03 08:38:38 +08:00
@ -626,0 +642,4 @@
) -> FunctionValue<'ctx> {
let out_dir = env!("OUT_DIR");
let bitcode_path = format!("{}/slice_assign.bc", out_dir);
let new_mod = Module::parse_bitcode_from_path(bitcode_path, ctx.ctx).unwrap();
Owner

This will break with packaging and create a mess that nobody wants to solve. I had told you several times to use include_bytes! and Module.parse_bitcode_from_buffer.

This will break with packaging and create a mess that nobody wants to solve. I had told you several times to use ``include_bytes!`` and ``Module.parse_bitcode_from_buffer``.
ychenfo marked this conversation as resolved
sb10q reviewed 2022-01-03 08:40:56 +08:00
@ -0,0 +1,380 @@
use indoc::indoc;
Owner

Can this go in a separate PR? This one is already very large and hard to review.

Can this go in a separate PR? This one is already very large and hard to review.
ychenfo marked this conversation as resolved
sb10q reviewed 2022-01-03 08:43:08 +08:00
@ -437,3 +438,3 @@
args.insert(0, FuncArg { name: "self".into(), ty: obj.0, default_value: None });
}
let params = args.iter().map(|arg| ctx.get_llvm_type(generator, arg.ty).into()).collect_vec();
let params =
Owner

Please put all pure style changes into a separate PR.

Please put all pure style changes into a separate PR.
ychenfo marked this conversation as resolved
Owner

I will look into how to delete all these function attributes in inkwell.

That doesn't sound very nice... Are you sure there isn't a way to make Clang output generic LLVM bitcode?

clang -target wasm32 seems to at least remove many CPU-specific options. Note that you need to use llvmPackages_13.clang-unwrapped for -target to work.

> I will look into how to delete all these function attributes in inkwell. That doesn't sound very nice... Are you sure there isn't a way to make Clang output generic LLVM bitcode? ``clang -target wasm32`` seems to at least remove many CPU-specific options. Note that you need to use ``llvmPackages_13.clang-unwrapped`` for ``-target`` to work.
Author
Collaborator

This will break with packaging and create a mess that nobody wants to solve. I had told you several times to use include_bytes! and Module.parse_bitcode_from_buffer.

Oh ok I will use include_bytes! and parse_bitcode_from_buffer. Sorry that I think previously I did not fully get the point.. I thought it was only about outputing the llvm bitcode to the appropriate cargo OUT_DIR directory - just curious, may I ask why include_bytes! and parse_bitcode_from_buffer is better? and what package management mess will there be if using parse from path? Thanks!

Is this changing more than the indentation?
Please put all pure style changes into a separate PR.

Ok I will put style changes into a seoarate PR. Sorry I enabled auto format this time so changes are melded together.

Can this go in a separate PR? This one is already very large and hard to review.

Ok I will also put tests into a separate PR after this one

> This will break with packaging and create a mess that nobody wants to solve. I had told you several times to use include_bytes! and Module.parse_bitcode_from_buffer. Oh ok I will use include_bytes! and parse_bitcode_from_buffer. Sorry that I think previously I did not fully get the point.. I thought it was only about outputing the llvm bitcode to the appropriate cargo OUT_DIR directory - just curious, may I ask why `include_bytes!` and `parse_bitcode_from_buffer` is better? and what package management mess will there be if using parse from path? Thanks! > Is this changing more than the indentation? > Please put all pure style changes into a separate PR. Ok I will put style changes into a seoarate PR. Sorry I enabled auto format this time so changes are melded together. > Can this go in a separate PR? This one is already very large and hard to review. Ok I will also put tests into a separate PR after this one
Owner

and what package management mess will there be if using parse from path?

If nac3artiq.so depends on some external files:

  1. they need to be copied along with it in an appropriate location when installing the software - not always easy with things like Python setuptools.
  2. nac3artiq.so needs to be able to locate them, which likely requires more calls into CPython and complications, since I don't think a .so/.dll is able to find its own location especially in a cross-platform way (and the mimalloc/musl experience has taught me that the Linux dynamic linker is garbage, involvement with it is better avoided).

It is much simpler to just embed all the required data into the nac3artiq.so binary.

> and what package management mess will there be if using parse from path? If nac3artiq.so depends on some external files: 1. they need to be copied along with it in an appropriate location when installing the software - not always easy with things like Python setuptools. 2. nac3artiq.so needs to be able to locate them, which likely requires more calls into CPython and complications, since I don't think a .so/.dll is able to find its own location especially in a cross-platform way (and the mimalloc/musl experience has taught me that the Linux dynamic linker is garbage, involvement with it is better avoided). It is much simpler to just embed all the required data into the nac3artiq.so binary.
Author
Collaborator

and what package management mess will there be if using parse from path?

If nac3artiq.so depends on some external files:

  1. they need to be copied along with it in an appropriate location when installing the software - not always easy with things like Python setuptools.
  2. nac3artiq.so needs to be able to locate them, which likely requires more calls into CPython and complications, since I don't think a .so/.dll is able to find its own location especially not in a cross-platform way (and the mimalloc/musl experience has taught me that the Linux dynamic linker is garbage).

It is much simpler to just embed all the required data into the nac3artiq.so binary.

Thanks very much! Now I get the point!

> > and what package management mess will there be if using parse from path? > > If nac3artiq.so depends on some external files: > 1. they need to be copied along with it in an appropriate location when installing the software - not always easy with things like Python setuptools. > 2. nac3artiq.so needs to be able to locate them, which likely requires more calls into CPython and complications, since I don't think a .so/.dll is able to find its own location especially not in a cross-platform way (and the mimalloc/musl experience has taught me that the Linux dynamic linker is garbage). > > It is much simpler to just embed all the required data into the nac3artiq.so binary. > Thanks very much! Now I get the point!
ychenfo force-pushed list_slice from e00ddfbc68 to d494b6a3cc 2022-01-04 01:57:52 +08:00 Compare
ychenfo force-pushed list_slice from d494b6a3cc to a323a8475b 2022-01-04 03:45:38 +08:00 Compare
Author
Collaborator

I will look into how to delete all these function attributes in inkwell.

That doesn't sound very nice... Are you sure there isn't a way to make Clang output generic LLVM bitcode?

clang -target wasm32 seems to at least remove many CPU-specific options. Note that you need to use llvmPackages_13.clang-unwrapped for -target to work.

If I did not miss anything I think I confirmed that currently there is not a way to make clang or rustc to output target independent llvm bitcode (here and here).

Deleting function attributes in inkwell is indeed also not good.. since there seems not to be a way to iterate over all the attributes of a function in inkwell.

Specifying -target wasm32 also makes llvm refuse to work on target rv32g and rv32ima:

'generic' is not a recognized processor for this target (ignoring processor)
'generic' is not a recognized processor for this target (ignoring processor)
LLVM ERROR: CPU 'generic' is not supported. Use generic-rv32
Aborted

One way that I currently think of to workaround this problem: refactor to make nac3core target aware (just add another function in the generator trait to return isa?), then compile the slice_assign.c to all the possible targets in build.rs, and choose the bitcode of the correct version when doing codegen according to the isa information.

> > I will look into how to delete all these function attributes in inkwell. > > That doesn't sound very nice... Are you sure there isn't a way to make Clang output generic LLVM bitcode? > > ``clang -target wasm32`` seems to at least remove many CPU-specific options. Note that you need to use ``llvmPackages_13.clang-unwrapped`` for ``-target`` to work. If I did not miss anything I think I confirmed that currently there is not a way to make clang or rustc to output target independent llvm bitcode ([here](https://llvm.org/docs/FAQ.html#can-i-compile-c-or-c-code-to-platform-independent-llvm-bitcode) and [here](https://users.rust-lang.org/t/cross-platform-rust-to-llvm-ir-and-is-llvm-ir-is-portable/7767)). Deleting function attributes in inkwell is indeed also not good.. since there seems not to be a way to iterate over all the attributes of a function in inkwell. Specifying `-target wasm32` also makes llvm refuse to work on target rv32g and rv32ima: ``` 'generic' is not a recognized processor for this target (ignoring processor) 'generic' is not a recognized processor for this target (ignoring processor) LLVM ERROR: CPU 'generic' is not supported. Use generic-rv32 Aborted ``` --- One way that I currently think of to workaround this problem: refactor to make nac3core target aware (just add another function in the `generator` trait to return isa?), then compile the `slice_assign.c` to all the possible targets in `build.rs`, and choose the bitcode of the correct version when doing codegen according to the isa information.
ychenfo force-pushed list_slice from a323a8475b to 5e545c229b 2022-01-04 03:50:38 +08:00 Compare
Owner

Specifying -target wasm32 also makes llvm refuse to work on target rv32g and rv32ima:

You just pass it to that one clang invokation in build.rs without touching the rest of LLVM. There is no rv32 involved there.

> Specifying -target wasm32 also makes llvm refuse to work on target rv32g and rv32ima: You just pass it to that one clang invokation in build.rs without touching the rest of LLVM. There is no rv32 involved there.
sb10q reviewed 2022-01-04 07:57:00 +08:00
@ -0,0 +13,4 @@
"-Wno-implicit-function-declaration",
"-o",
];
let host_compile_ok = Command::new("clang-13")
Owner

Just use "clang" to make changing the version easier (only flake.nix needs to be edited).

Just use "clang" to make changing the version easier (only flake.nix needs to be edited).
Author
Collaborator

So is it ok to use some patch like the below so that we can call the unwrapped clang without calling clang-13 to output wasm32 llvm bitcode here?

diff --git a/flake.nix b/flake.nix
index 5a51737..c388a95 100644
--- a/flake.nix
+++ b/flake.nix
@@ -157,6 +157,7 @@
               export PATH=$PATH:`pwd`/llvm-cfg
 
               export CARGO_TARGET_X86_64_PC_WINDOWS_GNU_RUSTFLAGS="-C link-arg=-lz -C link-arg=-luuid -C link-arg=-lole32 -C link-arg=-lmcfgthread"
+              export UNWRAPPED_CLANG=${pkgs.llvmPackages_13.clang-unwrapped}/bin/clang
               '';
             cargoBuildFlags = [ "--package" "nac3artiq" ];
             doCheck = false;
@@ -185,6 +186,7 @@
           clippy
           (packages.x86_64-linux.python3-mimalloc.withPackages(ps: [ ps.numpy ]))
         ];
+        UNWRAPPED_CLANG = "${pkgs.llvmPackages_13.clang-unwrapped}/bin/clang";
       };
 
       hydraJobs = {
diff --git a/nac3core/build.rs b/nac3core/build.rs
index ab347c0..c737ee4 100644
--- a/nac3core/build.rs
+++ b/nac3core/build.rs
@@ -8,7 +8,9 @@ use std::{
 fn main() {
     println!("cargo:rerun-if-changed=src/toplevel/slice_assign.c");
     let out_dir = env::var("OUT_DIR").unwrap();
+    let unwrapped_clang = env::var("UNWRAPPED_CLANG").unwrap();
     const FLAG: &[&str] = &[
+        "--target=wasm32",
         "src/toplevel/slice_assign.c",
         "-O3",
         "-emit-llvm",
@@ -19,7 +21,7 @@ fn main() {
         "-o",
         "-",
     ];
-    let output = Command::new("clang")
+    let output = Command::new(unwrapped_clang)
         .args(FLAG)
         .output()
         .map(|o| {
So is it ok to use some patch like the below so that we can call the unwrapped clang without calling `clang-13` to output wasm32 llvm bitcode here? ```diff diff --git a/flake.nix b/flake.nix index 5a51737..c388a95 100644 --- a/flake.nix +++ b/flake.nix @@ -157,6 +157,7 @@ export PATH=$PATH:`pwd`/llvm-cfg export CARGO_TARGET_X86_64_PC_WINDOWS_GNU_RUSTFLAGS="-C link-arg=-lz -C link-arg=-luuid -C link-arg=-lole32 -C link-arg=-lmcfgthread" + export UNWRAPPED_CLANG=${pkgs.llvmPackages_13.clang-unwrapped}/bin/clang ''; cargoBuildFlags = [ "--package" "nac3artiq" ]; doCheck = false; @@ -185,6 +186,7 @@ clippy (packages.x86_64-linux.python3-mimalloc.withPackages(ps: [ ps.numpy ])) ]; + UNWRAPPED_CLANG = "${pkgs.llvmPackages_13.clang-unwrapped}/bin/clang"; }; hydraJobs = { diff --git a/nac3core/build.rs b/nac3core/build.rs index ab347c0..c737ee4 100644 --- a/nac3core/build.rs +++ b/nac3core/build.rs @@ -8,7 +8,9 @@ use std::{ fn main() { println!("cargo:rerun-if-changed=src/toplevel/slice_assign.c"); let out_dir = env::var("OUT_DIR").unwrap(); + let unwrapped_clang = env::var("UNWRAPPED_CLANG").unwrap(); const FLAG: &[&str] = &[ + "--target=wasm32", "src/toplevel/slice_assign.c", "-O3", "-emit-llvm", @@ -19,7 +21,7 @@ fn main() { "-o", "-", ]; - let output = Command::new("clang") + let output = Command::new(unwrapped_clang) .args(FLAG) .output() .map(|o| { ```
ychenfo marked this conversation as resolved
sb10q reviewed 2022-01-04 07:58:02 +08:00
@ -0,0 +28,4 @@
.unwrap()
.success();
// TODO: confirm flags
let cortexa9_compile_ok = Command::new("clang-13")
Owner

They probably all produce the same bitcode in the part that we care about, so this is getting too complicated. IMO the wasm32 output should be generic enough for our purposes. Can inkwell load it without warning?

They probably all produce the same bitcode in the part that we care about, so this is getting too complicated. IMO the wasm32 output should be generic enough for our purposes. Can inkwell load it without warning?
ychenfo marked this conversation as resolved
Author
Collaborator

They probably all produce the same bitcode in the part that we care about, so this is getting too complicated. IMO the wasm32 output should be generic enough for our purposes. Can inkwell load it without warning?

Specifying -target wasm32 also makes llvm refuse to work on target rv32g and rv32ima:

You just pass it to that one clang invokation in build.rs without touching the rest of LLVM. There is no rv32 involved there.

I mean if I just do this in build.rs, without touching other part of llvm:

clang-13 --target=wasm32 ./slice_assign.c -emit-llvm -S

and then in nac3artiq, in device_db.py, set target to be rv32g, modify ./demo.py to make it generate slice assignment code, and run python ./demo.py, llvm will give the error(not warning):

'generic' is not a recognized processor for this target (ignoring processor)
'generic' is not a recognized processor for this target (ignoring processor)
LLVM ERROR: CPU 'generic' is not supported. Use generic-rv32
Aborted
> They probably all produce the same bitcode in the part that we care about, so this is getting too complicated. IMO the wasm32 output should be generic enough for our purposes. Can inkwell load it without warning? > > Specifying -target wasm32 also makes llvm refuse to work on target rv32g and rv32ima: > > You just pass it to that one clang invokation in build.rs without touching the rest of LLVM. There is no rv32 involved there. I mean if I just do this in `build.rs`, without touching other part of llvm: ``` clang-13 --target=wasm32 ./slice_assign.c -emit-llvm -S ``` and then in nac3artiq, in `device_db.py`, set `target` to be `rv32g`, modify `./demo.py` to make it generate slice assignment code, and run `python ./demo.py`, llvm will give the error(not warning): ``` 'generic' is not a recognized processor for this target (ignoring processor) 'generic' is not a recognized processor for this target (ignoring processor) LLVM ERROR: CPU 'generic' is not supported. Use generic-rv32 Aborted ```
Owner

Oh I see, you mean that it doesn't like the included raw bitcode when targeting rv32.

Oh I see, you mean that it doesn't like the included raw bitcode when targeting rv32.
Author
Collaborator

Oh I see, you mean that it doesn't like the included raw bitcode when targeting rv32.

Yes.

> Oh I see, you mean that it doesn't like the included raw bitcode when targeting rv32. Yes.
Owner

OK. What about something like:

clang --target=wasm32 ./slice_assign.c -emit-llvm -S -o - | sed s/\"target-cpu\"=\"generic\"// | llvm-as -o slice_assign.bc

Not pretty but at least the consequences of this new LLVM disappointment stay close to their source, and it will be easy to fix later if the LLVM developers ever implement a generic bitcode output.

OK. What about something like: ``` clang --target=wasm32 ./slice_assign.c -emit-llvm -S -o - | sed s/\"target-cpu\"=\"generic\"// | llvm-as -o slice_assign.bc ``` Not pretty but at least the consequences of this new LLVM disappointment stay close to their source, and it will be easy to fix later if the LLVM developers ever implement a generic bitcode output.
Owner

We should probably remove those lines as well:

target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128-ni:1:10:20"
target triple = "wasm32"

and perhaps even the entire attributes or most of it:

attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" }
We should probably remove those lines as well: ``` target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128-ni:1:10:20" target triple = "wasm32" ``` and perhaps even the entire attributes or most of it: ``` attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" } ```
Owner

In fact we really just want the functions...

clang --target=wasm32 ./slice_assign.c -emit-llvm -S -o - | awk "/define/,/}/" | llvm-as -o slice_assign.bc
In fact we really just want the functions... ``` clang --target=wasm32 ./slice_assign.c -emit-llvm -S -o - | awk "/define/,/}/" | llvm-as -o slice_assign.bc ```
Author
Collaborator

In fact we really just want the functions...

clang --target=wasm32 ./slice_assign.c -emit-llvm -S -o - | awk "/define/,/}/" | llvm-as -o slice_assign.bc

Thanks for the idea it works! Implemented this in the new commit.

> In fact we really just want the functions... > ``` > clang --target=wasm32 ./slice_assign.c -emit-llvm -S -o - | awk "/define/,/}/" | llvm-as -o slice_assign.bc > ``` > Thanks for the idea it works! Implemented this in the new commit.
Owner

I think including only the parts we want (the function definitions) instead of removing the others might be more robust. That awk command does that. Later LLVM versions could add other unwanted attributes or information.

I think including only the parts we want (the function definitions) instead of removing the others might be more robust. That ``awk`` command does that. Later LLVM versions could add other unwanted attributes or information.
ychenfo force-pushed list_slice from 52581a213b to b1635c7ab9 2022-01-05 06:52:22 +08:00 Compare
Author
Collaborator

I think including only the parts we want (the function definitions) instead of removing the others might be more robust. That awk command does that. Later LLVM versions could add other unwanted attributes or information.

Thanks! Since there are metadata and attributes inside the function body so I still use the regex in rust to filter out unwanted parts in the new commit.

> I think including only the parts we want (the function definitions) instead of removing the others might be more robust. That ``awk`` command does that. Later LLVM versions could add other unwanted attributes or information. Thanks! Since there are metadata and attributes inside the function body so I still use the regex in rust to filter out unwanted parts in the new commit.
sb10q reviewed 2022-01-05 08:26:41 +08:00
@ -0,0 +42,4 @@
.unwrap()
.replace_all(&filtered_output, "");
println!("{}", filtered_output);
Owner

I don't think this should be here. AFAIK build.rs is only supposed to print cargo commands.

I don't think this should be here. AFAIK build.rs is only supposed to print cargo commands.
Author
Collaborator

Oh sure I can delete this. But I think this should be ok since cargo will just ignore all the printed out lines that are not started with cargo: in build scripts.

Oh sure I can delete this. But I think this should be ok since cargo will just ignore all the printed out lines that are not started with `cargo:` in build scripts.
ychenfo marked this conversation as resolved
sb10q reviewed 2022-01-05 08:29:22 +08:00
@ -623,6 +642,24 @@ pub fn get_builtins(primitives: &mut (PrimitiveStore, Unifier)) -> BuiltinInfo {
)
}
fn load_fn_from_bit_code<'ctx, 'a>(
Owner

Rename to load_irrt

Rename to load_irrt
ychenfo marked this conversation as resolved
sb10q reviewed 2022-01-05 08:29:35 +08:00
@ -626,0 +647,4 @@
fun: &str,
) -> FunctionValue<'ctx> {
let bitcode_buf = MemoryBuffer::create_from_memory_range(
include_bytes!(concat!(env!("OUT_DIR"), "/slice_assign.bc")),
Owner

Rename to irrt.bc

Rename to irrt.bc
Author
Collaborator

Ok I will do the renaming.

Ok I will do the renaming.
ychenfo marked this conversation as resolved
sb10q reviewed 2022-01-05 08:30:45 +08:00
@ -11,0 +26,4 @@
pub const IRRT_LEN_ID: usize = 1;
pub const IRRT_SLICE_ASSIGN_8_ID: usize = 2;
pub const IRRT_SLICE_ASSIGN_32_ID: usize = 3;
pub const IRRT_SLICE_ASSIGN_64_ID: usize = 4;
Owner

Isn't there an idiomatic way to combine this list of integer constants with IRRT_SYMBOL_TABLE? This looks like C programming to me.

Isn't there an idiomatic way to combine this list of integer constants with IRRT_SYMBOL_TABLE? This looks like C programming to me.
Author
Collaborator

Oh I just found this, should be useful to make things here prettier.

Oh I just found [this](https://doc.rust-lang.org/reference/items/associated-items.html#associated-constants), should be useful to make things here prettier.
ychenfo marked this conversation as resolved
sb10q reviewed 2022-01-05 08:34:41 +08:00
@ -0,0 +8,4 @@
# define MAX(a, b) (a > b ? a : b)
# define MIN(a, b) (a > b ? b : a)
int32_t __nac3_irrt_slice_index_bound(int32_t i, int32_t const len) {
Owner

It's equivalent but you usually see const int and not int const.

It's equivalent but you usually see ``const int`` and not ``int const``.
Author
Collaborator

Ok I will change to const int

Ok I will change to `const int`
ychenfo marked this conversation as resolved
Owner

Since there are metadata and attributes inside the function body

Even with wasm32 target?

> Since there are metadata and attributes inside the function body Even with wasm32 target?
Contributor

tbh do we really need 700+ lines for this feature? it doesn't make sense to me... I thought it would be relatively simple to implement those functions in (multiple) memmove.

tbh do we really need 700+ lines for this feature? it doesn't make sense to me... I thought it would be relatively simple to implement those functions in (multiple) memmove.
Contributor

I think I would have a look at how to implement it later after the exception allocation stuff.

I think I would have a look at how to implement it later after the exception allocation stuff.
Author
Collaborator

Since there are metadata and attributes inside the function body

Even with wasm32 target?

Yes, the attached file is the output from running unwrapped clang:

clang --target=wasm32 ./tmp.c -O3 -S -emit-llvm

still some amount of attributes and metadata.

> > Since there are metadata and attributes inside the function body > > Even with wasm32 target? Yes, the attached file is the output from running unwrapped clang: ``` clang --target=wasm32 ./tmp.c -O3 -S -emit-llvm ``` still some amount of attributes and metadata.
14 KiB
Author
Collaborator

tbh do we really need 700+ lines for this feature? it doesn't make sense to me... I thought it would be relatively simple to implement those functions in (multiple) memmove.

Sure, the core is just ~250 loc (including c and rust), the rest are mainly just about:

  • slice index handling (the pseudo code for handling list slice indecies in the comment already take about 50 loc)
  • fast paths (for a[:] = b[:], directly call the list slice runtime function, for a[:] = [...], the list on the rhs should be dropped after assignment by stackrestore, fast path for list.len == 0, etc.)

I would not say that currently the code is perfectly without any repetitive boilerplate, but as far as I see I cannot seem to find a way to further fundamentally drop the code size... The cpython implementation also takes more than about ~350 lines of c code (here, here, and here)

> tbh do we really need 700+ lines for this feature? it doesn't make sense to me... I thought it would be relatively simple to implement those functions in (multiple) memmove. Sure, the core is just ~250 loc (including c and rust), the rest are mainly just about: - slice index handling (the pseudo code for handling list slice indecies in the comment already take about 50 loc) - fast paths (for `a[:] = b[:]`, directly call the list slice runtime function, for `a[:] = [...]`, the list on the rhs should be dropped after assignment by stackrestore, fast path for list.len == 0, etc.) I would not say that currently the code is perfectly without any repetitive boilerplate, but as far as I see I cannot seem to find a way to further fundamentally drop the code size... The cpython implementation also takes more than about ~350 lines of c code ([here](https://github.com/python/cpython/blob/main/Objects/listobject.c#L2922-L3138), [here](https://github.com/python/cpython/blob/main/Objects/listobject.c#L464-L509), and [here](https://github.com/python/cpython/blob/main/Objects/listobject.c#L618-L738))
ychenfo force-pushed list_slice from cebf5b16a2 to d4969d2ee2 2022-01-05 17:55:01 +08:00 Compare
ychenfo force-pushed list_slice from d4969d2ee2 to 1cbddebc6d 2022-01-05 19:04:10 +08:00 Compare
ychenfo force-pushed list_slice from 1cbddebc6d to 0bb799a72b 2022-01-05 20:21:58 +08:00 Compare
sb10q reviewed 2022-01-05 21:49:44 +08:00
@ -375,2 +422,4 @@
}
// return true if we take this fast path, false if we do not
fn slice_assign_fast_path<'ctx, 'a, G: CodeGenerator>(
Owner

Do we need this "fast path"?
I'm also a bit taken aback by the size and complexity of this PR.

Do we need this "fast path"? I'm also a bit taken aback by the size and complexity of this PR.
Author
Collaborator

Without this fast path, all the list slice thing can work.

This fast path handles this 2 cases:

  1. a[::] = b[::](both lhs and rhs are list slices)
  2. a[::] = [..](lhs is list slice and rhs is a new list).

for case 1, without fast path it will be equivalent to tmp = b[::]; a[::] = tmp, whereas if using this fast path, b will only be copied if b == a and their ranges overlap. And if b is copied, the memory space taken by the temporary array will be freed after we exit the slice assign irrt function.

And for case 2 without fast path it will be tmp = [..]; a[::] = tmp, where the stack memory taken by tmp will not be freed before exiting current function, which will be bad if we do this in a loop. This fast path will call stacksave and stackrestore to free the memory taken by the tmp array immediately after assignment is done.


This fast path can indeed be simpler though, by always alloc tmp array, not optimizing cases where we do not actually need to do stacksave, alloca tmparr, stackrestore.

Without this fast path, all the list slice thing can work. This fast path handles this 2 cases: 1. `a[::] = b[::]`(both lhs and rhs are list slices) 2. `a[::] = [..]`(lhs is list slice and rhs is a new list). for case 1, without fast path it will be equivalent to `tmp = b[::]; a[::] = tmp`, whereas if using this fast path, `b` will only be copied if `b == a` and their ranges overlap. And if `b` is copied, the memory space taken by the temporary array will be freed after we exit the slice assign irrt function. And for case 2 without fast path it will be `tmp = [..]; a[::] = tmp`, where the stack memory taken by `tmp` will not be freed before exiting current function, which will be bad if we do this in a loop. This fast path will call stacksave and stackrestore to free the memory taken by the `tmp` array immediately after assignment is done. --- This fast path can indeed be simpler though, by always alloc tmp array, not optimizing cases where we do not actually need to do `stacksave, alloca tmparr, stackrestore`.
Contributor

Actually I think we don't need fast path now. I think the length calculation is already pretty fast for these cases as we don't have to do division, and the actual assignment operation already does memcpy for the fast case.

Actually I think we don't need fast path now. I think the length calculation is already pretty fast for these cases as we don't have to do division, and the actual assignment operation already does memcpy for the fast case.
Author
Collaborator

Removed in the new commits. Also this fast path indeed only optimize the a small amount of common cases (even not things like a[:], b[:] = c[:], [..] due to the assignment unpacking).

Removed in the new commits. Also this fast path indeed only optimize the a small amount of common cases (even not things like `a[:], b[:] = c[:], [..]` due to the assignment unpacking).
ychenfo marked this conversation as resolved
Contributor

I just have a quick look at this. One problem I see is the use of conditions, zext it and do multiplication to implement something like c if cond else 0. Please use the select instruction for this, which would generate better IR and easier to optimize.

I just have a quick look at this. One problem I see is the use of conditions, zext it and do multiplication to implement something like `c if cond else 0`. Please use the [select](https://llvm.org/docs/LangRef.html#select-instruction) instruction for this, which would generate better IR and easier to optimize.
Author
Collaborator

I just have a quick look at this. One problem I see is the use of conditions, zext it and do multiplication to implement something like c if cond else 0. Please use the select instruction for this, which would generate better IR and easier to optimize.

Oh I was not aware of select instruction before, will implement this, thanks!

> I just have a quick look at this. One problem I see is the use of conditions, zext it and do multiplication to implement something like `c if cond else 0`. Please use the [select](https://llvm.org/docs/LangRef.html#select-instruction) instruction for this, which would generate better IR and easier to optimize. Oh I was not aware of select instruction before, will implement this, thanks!
pca006132 reviewed 2022-01-06 17:11:12 +08:00
@ -0,0 +8,4 @@
# define MAX(a, b) (a > b ? a : b)
# define MIN(a, b) (a > b ? b : a)
int32_t __nac3_irrt_slice_index_bound(int32_t i, const int32_t len) {
Contributor

For the slice index bound, I think you do not need to bound it like this, just allow it to have invalid access. We will fix this later when we implement exception. With this, you can just have a rust helper function to help you generate the access (something like icmp and select). Using a rust helper function can also allow us to modify this and add exception later.

For the slice index bound, I think you do not need to bound it like this, just allow it to have invalid access. We will fix this later when we implement exception. With this, you can just have a rust helper function to help you generate the access (something like icmp and select). Using a rust helper function can also allow us to modify this and add exception later.
Author
Collaborator

The reason I do this is because python allows slice indecies to be out-of-bound:

Python 3.9.6 (default, Jun 28 2021, 08:57:49)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = list(range(10))
>>> a[5:100]
[5, 6, 7, 8, 9]

I did intentionally leave the c functions to only handle correct input, leaving the part that we may need to raise exception in rust

The reason I do this is because python allows slice indecies to be out-of-bound: ```python Python 3.9.6 (default, Jun 28 2021, 08:57:49) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> a = list(range(10)) >>> a[5:100] [5, 6, 7, 8, 9] ``` --- I did intentionally leave the c functions to only handle correct input, leaving the part that we may need to raise exception in rust
Contributor

Interesting, never know that they allow this...

Interesting, never know that they allow this...
pca006132 marked this conversation as resolved
Contributor

Simple way to reduce the code size: check for common pattern...

--- a/nac3core/src/codegen/stmt.rs
+++ b/nac3core/src/codegen/stmt.rs
@@ -111,14 +111,8 @@ pub fn gen_assign<'ctx, 'a, G: CodeGenerator>(
                         .unwrap()
                         .to_basic_value_enum(ctx, generator)
                         .into_pointer_value();
-                    let (start, end, step) = handle_slice_indices(
-                        lower.as_ref().map(|i| i.as_ref()),
-                        upper.as_ref().map(|i| i.as_ref()),
-                        step.as_ref().map(|i| i.as_ref()),
-                        ctx,
-                        generator,
-                        ls,
-                    );
+                    let (start, end, step) =
+                        handle_slice_indices(lower, upper, step, ctx, generator, ls);
                     let value = value.to_basic_value_enum(ctx, generator).into_pointer_value();
                     let ty = if let TypeEnum::TList { ty } =
                         &*ctx.unifier.get_ty(target.custom.unwrap())
@@ -127,7 +121,7 @@ pub fn gen_assign<'ctx, 'a, G: CodeGenerator>(
                     } else {
                         unreachable!()
                     };
-                    let src_ind = handle_slice_indices(None, None, None, ctx, generator, value);
+                    let src_ind = handle_slice_indices(&None, &None, &None, ctx, generator, value);
                     list_slice_assignment(
                         ctx,
                         generator.get_size_type(ctx.ctx),
@@ -451,20 +445,10 @@ fn slice_assign_fast_path<'ctx, 'a, G: CodeGenerator>(
                                 .to_basic_value_enum(ctx, generator)
                                 .into_pointer_value();
                             let (d_s, d_e, d_step) = handle_slice_indices(
-                                dst_start.as_ref().map(|i| i.as_ref()),
-                                dst_end.as_ref().map(|i| i.as_ref()),
-                                dst_step.as_ref().map(|i| i.as_ref()),
-                                ctx,
-                                generator,
-                                dst,
+                                dst_start, dst_end, dst_step, ctx, generator, dst,
                             );
                             let (s_s, s_e, s_step) = handle_slice_indices(
-                                src_start.as_ref().map(|i| i.as_ref()),
-                                src_end.as_ref().map(|i| i.as_ref()),
-                                src_step.as_ref().map(|i| i.as_ref()),
-                                ctx,
-                                generator,
-                                src,
+                                src_start, src_end, src_step, ctx, generator, src,
                             );
                             let ty = if let TypeEnum::TList { ty } =
                                 &*ctx.unifier.get_ty(t_v.custom.unwrap())
diff --git a/nac3core/src/toplevel/builtins.rs b/nac3core/src/toplevel/builtins.rs
index a50ff8e..014283a 100644
--- a/nac3core/src/toplevel/builtins.rs
+++ b/nac3core/src/toplevel/builtins.rs
@@ -695,7 +695,7 @@ pub fn calculate_len_for_slice_range<'ctx, 'a>(
 
 /// NOTE: the output value of the end index of this function should be compared ***inclusively***,
 /// because python allows `a[2::-1]`, whose semantic is `[a[2], a[1], a[0]]`, which is equivalent to
-/// NO numeric slice in python. 
+/// NO numeric slice in python.
 ///
 /// equivalent code:
 /// ```pseudo_code
@@ -741,16 +741,19 @@ pub fn calculate_len_for_slice_range<'ctx, 'a>(
 ///                     else:
 ///                         e + 1
 ///             ,step
-///         )       
+///         )
 /// ```
 pub fn handle_slice_indices<'a, 'ctx, G: CodeGenerator>(
-    start: Option<&Expr<Option<Type>>>,
-    end: Option<&Expr<Option<Type>>>,
-    step: Option<&Expr<Option<Type>>>,
+    start: &Option<Box<Expr<Option<Type>>>>,
+    end: &Option<Box<Expr<Option<Type>>>>,
+    step: &Option<Box<Expr<Option<Type>>>>,
     ctx: &mut CodeGenContext<'ctx, 'a>,
     generator: &mut G,
     list: PointerValue<'ctx>,
 ) -> (IntValue<'ctx>, IntValue<'ctx>, IntValue<'ctx>) {
+    let start = start.as_ref().map(|i| i.as_ref());
+    let end = end.as_ref().map(|i| i.as_ref());
+    let step = step.as_ref().map(|i| i.as_ref());
     // TODO: throw exception when step is 0
     let int32 = ctx.ctx.i32_type();
     let zero = int32.const_zero();
Simple way to reduce the code size: check for common pattern... ```diff --- a/nac3core/src/codegen/stmt.rs +++ b/nac3core/src/codegen/stmt.rs @@ -111,14 +111,8 @@ pub fn gen_assign<'ctx, 'a, G: CodeGenerator>( .unwrap() .to_basic_value_enum(ctx, generator) .into_pointer_value(); - let (start, end, step) = handle_slice_indices( - lower.as_ref().map(|i| i.as_ref()), - upper.as_ref().map(|i| i.as_ref()), - step.as_ref().map(|i| i.as_ref()), - ctx, - generator, - ls, - ); + let (start, end, step) = + handle_slice_indices(lower, upper, step, ctx, generator, ls); let value = value.to_basic_value_enum(ctx, generator).into_pointer_value(); let ty = if let TypeEnum::TList { ty } = &*ctx.unifier.get_ty(target.custom.unwrap()) @@ -127,7 +121,7 @@ pub fn gen_assign<'ctx, 'a, G: CodeGenerator>( } else { unreachable!() }; - let src_ind = handle_slice_indices(None, None, None, ctx, generator, value); + let src_ind = handle_slice_indices(&None, &None, &None, ctx, generator, value); list_slice_assignment( ctx, generator.get_size_type(ctx.ctx), @@ -451,20 +445,10 @@ fn slice_assign_fast_path<'ctx, 'a, G: CodeGenerator>( .to_basic_value_enum(ctx, generator) .into_pointer_value(); let (d_s, d_e, d_step) = handle_slice_indices( - dst_start.as_ref().map(|i| i.as_ref()), - dst_end.as_ref().map(|i| i.as_ref()), - dst_step.as_ref().map(|i| i.as_ref()), - ctx, - generator, - dst, + dst_start, dst_end, dst_step, ctx, generator, dst, ); let (s_s, s_e, s_step) = handle_slice_indices( - src_start.as_ref().map(|i| i.as_ref()), - src_end.as_ref().map(|i| i.as_ref()), - src_step.as_ref().map(|i| i.as_ref()), - ctx, - generator, - src, + src_start, src_end, src_step, ctx, generator, src, ); let ty = if let TypeEnum::TList { ty } = &*ctx.unifier.get_ty(t_v.custom.unwrap()) diff --git a/nac3core/src/toplevel/builtins.rs b/nac3core/src/toplevel/builtins.rs index a50ff8e..014283a 100644 --- a/nac3core/src/toplevel/builtins.rs +++ b/nac3core/src/toplevel/builtins.rs @@ -695,7 +695,7 @@ pub fn calculate_len_for_slice_range<'ctx, 'a>( /// NOTE: the output value of the end index of this function should be compared ***inclusively***, /// because python allows `a[2::-1]`, whose semantic is `[a[2], a[1], a[0]]`, which is equivalent to -/// NO numeric slice in python. +/// NO numeric slice in python. /// /// equivalent code: /// ```pseudo_code @@ -741,16 +741,19 @@ pub fn calculate_len_for_slice_range<'ctx, 'a>( /// else: /// e + 1 /// ,step -/// ) +/// ) /// ``` pub fn handle_slice_indices<'a, 'ctx, G: CodeGenerator>( - start: Option<&Expr<Option<Type>>>, - end: Option<&Expr<Option<Type>>>, - step: Option<&Expr<Option<Type>>>, + start: &Option<Box<Expr<Option<Type>>>>, + end: &Option<Box<Expr<Option<Type>>>>, + step: &Option<Box<Expr<Option<Type>>>>, ctx: &mut CodeGenContext<'ctx, 'a>, generator: &mut G, list: PointerValue<'ctx>, ) -> (IntValue<'ctx>, IntValue<'ctx>, IntValue<'ctx>) { + let start = start.as_ref().map(|i| i.as_ref()); + let end = end.as_ref().map(|i| i.as_ref()); + let step = step.as_ref().map(|i| i.as_ref()); // TODO: throw exception when step is 0 let int32 = ctx.ctx.i32_type(); let zero = int32.const_zero(); ```
pca006132 reviewed 2022-01-06 17:32:18 +08:00
@ -0,0 +113,4 @@
(*dest_arr_len - dest_end - 1) * sizeof(T) \
); \
*dest_arr_len = *dest_arr_len - (dest_end - dest_ind) - 1; \
} \
Contributor

Should we have a stackrestore to remove that alloca?

Should we have a stackrestore to remove that alloca?
Author
Collaborator

In c I seem to also failed to find a way to call stacksave and stackrestore manually... But here I think this should be fine since this temp array will be freed when exiting this slice assignment function? And later in this function we do not make another function call, and most operations inside the function after this alloca needs it..

So currently eventually the only place that I call stacksave and stackrestore is in that fastpath

In c I seem to also failed to find a way to call `stacksave` and `stackrestore` manually... But here I think this should be fine since this temp array will be freed when exiting this slice assignment function? And later in this function we do not make another function call, and most operations inside the function after this alloca needs it.. So currently eventually the only place that I call `stacksave` and `stackrestore` is in that fastpath
Contributor

Sorry forgot that this is a separate function call. If so it should be fine. I guess you fast path can be removed too.

Sorry forgot that this is a separate function call. If so it should be fine. I guess you fast path can be removed too.
pca006132 marked this conversation as resolved
pca006132 reviewed 2022-01-06 17:35:16 +08:00
@ -0,0 +93,4 @@
); \
if (need_alloca) { \
T *tmp = alloca(src_arr_len * sizeof(T)); \
__builtin_memmove(tmp, src_arr, src_arr_len * sizeof(T)); \
Contributor

You should use memcpy, which is usually faster than memmove. memmove should only be used when the buffers overlap, i.e. when you shift the remaining elements in the list.

You should use `memcpy`, which is usually faster than `memmove`. `memmove` should only be used when the buffers overlap, i.e. when you shift the remaining elements in the list.
Author
Collaborator

changed to memcpy in the new commit

changed to memcpy in the new commit
Owner

Is memcpy really faster? I thought memcpy was just a relic from 1970.

Is memcpy really faster? I thought memcpy was just a relic from 1970.
Author
Collaborator

I think it should be a bit faster, both __builtin_memmove and __builtin_memcpy are translated into llvm intrinsics, and llvm intrinsics for memmove allows overlap while memcpy does not allow. So memmove might do more work to guarantee its correctness.

I think it should be a bit faster, both `__builtin_memmove` and `__builtin_memcpy` are translated into llvm intrinsics, and llvm intrinsics for memmove allows overlap while memcpy does not allow. So memmove might do more work to guarantee its correctness.
ychenfo marked this conversation as resolved
pca006132 reviewed 2022-01-06 17:36:42 +08:00
@ -0,0 +54,4 @@
/* if dest_arr_len == 0, do nothing since we do not support extending list */ \
if (*dest_arr_len == 0) return; \
/* if src_arr_len == 0 and need drop */ \
if (src_arr_len == 0) { \
Contributor

I guess you can make this a special case of the next case. Just ignore the step if src_len == 0 and avoid the first memcpy but keep the second.

I guess you can make this a special case of the next case. Just ignore the step if `src_len == 0` and avoid the first memcpy but keep the second.
Author
Collaborator

handled in the new commits

handled in the new commits
ychenfo marked this conversation as resolved
pca006132 reviewed 2022-01-06 17:40:02 +08:00
@ -699,0 +946,4 @@
ctx.builder
.build_bitcast(
unsafe { ctx.builder.build_gep(dest_arr, &[zero, one], "dest_arr") },
int32.ptr_type(AddressSpace::Generic),
Contributor

Well this is actually unsafe. We may have u64 length pointer for 64 bits target... Maybe you can make a temporary integer, take its address and store it back? I don't think we will ever do something that big so it should be fine. I think LLVM should be able to optimize this for 32 bits target.

Well this is actually unsafe. We may have u64 length pointer for 64 bits target... Maybe you can make a temporary integer, take its address and store it back? I don't think we will ever do something that big so it should be fine. I think LLVM should be able to optimize this for 32 bits target.
Author
Collaborator

Ah thanks I see, then I think I also need to change the code to modify the list length in the rust side.

Ah thanks I see, then I think I also need to change the code to modify the list length in the rust side.
ychenfo marked this conversation as resolved
ychenfo force-pushed list_slice from 0bb799a72b to 7b9da1835c 2022-01-06 19:17:41 +08:00 Compare
Author
Collaborator

Simple way to reduce the code size: check for common pattern...

Ah thanks for pointing this out! Indeed I should have noticed this one.

> Simple way to reduce the code size: check for common pattern... Ah thanks for pointing this out! Indeed I should have noticed this one.
ychenfo force-pushed list_slice from 7b9da1835c to 94216fb1ab 2022-01-06 21:36:13 +08:00 Compare
ychenfo force-pushed list_slice from 0b1c3df091 to 8e16ecab5f 2022-01-07 02:37:24 +08:00 Compare
Author
Collaborator

I tentatively move all the ir runtime library related code to nac3core/codegen/irrt/, hopefully this will be better

I tentatively move all the ir runtime library related code to `nac3core/codegen/irrt/`, hopefully this will be better
ychenfo force-pushed list_slice from 8e16ecab5f to a227d99516 2022-01-07 02:46:42 +08:00 Compare
ychenfo force-pushed list_slice from a227d99516 to 5edec788b2 2022-01-07 05:26:26 +08:00 Compare
ychenfo force-pushed list_slice from 5edec788b2 to 96a2881a08 2022-01-09 19:59:32 +08:00 Compare
Author
Collaborator

rebased on the current master branch

rebased on the current master branch
sb10q reviewed 2022-01-09 22:11:51 +08:00
@ -6,0 +14,4 @@
use nac3parser::ast::Expr;
pub struct IrrtSymbols;
impl IrrtSymbols {
Owner

Not sure if there's much point in this structure vs. just using the names directly. I had removed it.

Not sure if there's much point in this structure vs. just using the names directly. I had removed it.
Author
Collaborator

Oh here as the names are getting more, I just thought that since there are still some places in the following code that these names repeatedly appear (especially when adding inline attributes, and in list slice assignment function when checking function bit width), so I personally think that it might be cleaner to add this back...

Oh here as the names are getting more, I just thought that since there are still some places in the following code that these names repeatedly appear (especially when adding inline attributes, and in list slice assignment function when checking function bit width), so I personally think that it might be cleaner to add this back...
Owner

The IrrtSymbols::... constants would also repeatedly appear...

The ``IrrtSymbols::...`` constants would also repeatedly appear...
Author
Collaborator

Ok I see, I will delete this struct and just use their names directly.

Ok I see, I will delete this struct and just use their names directly.
ychenfo marked this conversation as resolved
Contributor

Just wondering: Do we really need thhe IRRT prefix? The symbol names are super long and I think __nac3 prefix is enough?

Just wondering: Do we really need thhe IRRT prefix? The symbol names are super long and I think `__nac3` prefix is enough?
Author
Collaborator

Just wondering: Do we really need thhe IRRT prefix? The symbol names are super long and I think __nac3 prefix is enough?

They might be enough indeed. After all I think the purpose of the prefix is just to 1. mark that these are the functions that nac3 will call internally to perform some operations, and 2. not to collide with other symbol names that users would use. I am not sure.. will long symbol names be a potential problem?

> Just wondering: Do we really need thhe IRRT prefix? The symbol names are super long and I think `__nac3` prefix is enough? They might be enough indeed. After all I think the purpose of the prefix is just to 1. mark that these are the functions that nac3 will call internally to perform some operations, and 2. not to collide with other symbol names that users would use. I am not sure.. will long symbol names be a potential problem?
Owner

They're just more typing for the developer. The toolchain doesn't have issues with long symbols, by C++ name mangler standards those aren't even barely long symbols.

They're just more typing for the developer. The toolchain doesn't have issues with long symbols, by C++ name mangler standards those aren't even barely long symbols.
Author
Collaborator

They're just more typing for the developer. The toolchain doesn't have issues with long symbols, by C++ name mangler standards those aren't even barely long symbols.

Ok I see, so should I just change the prefix to __nac3 only?

> They're just more typing for the developer. The toolchain doesn't have issues with long symbols, by C++ name mangler standards those aren't even barely long symbols. Ok I see, so should I just change the prefix to `__nac3` only?
Owner

Yes

Yes
ychenfo force-pushed list_slice from 96a2881a08 to 23db915046 2022-01-10 16:33:16 +08:00 Compare
Author
Collaborator

Renamed the prefix to __nac3_ and removed the IrrtSymbols struct in the new commit.

Renamed the prefix to `__nac3_` and removed the IrrtSymbols struct in the new commit.
sb10q reviewed 2022-01-13 12:49:23 +08:00
@ -46,0 +273,4 @@
int32.into(),
int32.into(),
elem_ptr_type.into(),
int32.into(),
Owner

Add comments with the parameter names. It's hard to follow this long list.

Add comments with the parameter names. It's hard to follow this long list.
ychenfo marked this conversation as resolved
sb10q reviewed 2022-01-13 12:54:20 +08:00
@ -46,0 +165,4 @@
ctx.builder
.build_select(
ctx.builder.build_and(
ctx.builder.build_int_compare(EQ, s, length, "s_eq_len"),
Owner

Better use IntPredicate::EQ here, those short constants are a bit confusing.

Better use ``IntPredicate::EQ`` here, those short constants are a bit confusing.
ychenfo marked this conversation as resolved
Owner

We really should add some unit tests (probably using the LLVM JIT or bitcode interpreter to run the compiler's output). It seems there are lots of corner cases and it's not obvious from reading the code that they are all handled correctly. That could go into a separate PR though.

We really should add some unit tests (probably using the LLVM JIT or bitcode interpreter to run the compiler's output). It seems there are lots of corner cases and it's not obvious from reading the code that they are all handled correctly. That could go into a separate PR though.
Author
Collaborator

We really should add some unit tests (probably using the LLVM JIT or bitcode interpreter to run the compiler's output). It seems there are lots of corner cases and it's not obvious from reading the code that they are all handled correctly. That could go into a separate PR though.

Yes, all the code in this PR is tested using a full-stack test I wrote in nac3standalone (but I do not think it is the best as I just wrote the test to use a lot of Commands to compile and run programs and compare the print output), I will port these tests into codegen/irrt using llvm jit or bitcode interpreter in a new PR.

> We really should add some unit tests (probably using the LLVM JIT or bitcode interpreter to run the compiler's output). It seems there are lots of corner cases and it's not obvious from reading the code that they are all handled correctly. That could go into a separate PR though. Yes, all the code in this PR is tested using a full-stack test I wrote in nac3standalone (but I do not think it is the best as I just wrote the test to use a lot of `Command`s to compile and run programs and compare the print output), I will port these tests into codegen/irrt using llvm jit or bitcode interpreter in a new PR.
ychenfo force-pushed list_slice from 23db915046 to 276f55b1d4 2022-01-13 15:07:45 +08:00 Compare
Author
Collaborator

Updated to use IntPredicate, add comments and rebase on the latest master branch.

Updated to use `IntPredicate`, add comments and rebase on the latest master branch.
ychenfo force-pushed list_slice from 276f55b1d4 to 4976e89ae2 2022-01-13 16:53:52 +08:00 Compare
Author
Collaborator

fixed some typo

fixed some typo
sb10q reviewed 2022-01-13 17:19:44 +08:00
@ -48,0 +241,4 @@
let int64_ptr = ctx.ctx.i64_type().ptr_type(AddressSpace::Generic);
let fun_symbol = if let BasicTypeEnum::IntType(ty) = ty {
match ty.get_bit_width() {
w if w < 32 => "__nac3_list_slice_assign_uint8_t",
Owner

When would we use that?

When would we use that?
Author
Collaborator

We can have list of boolean, which is of width 1, and is represented by a byte in our supported targets?

I tried to make clang to output i1 but the _ExtInt must require bitwidth >= 2.

Maybe it would be better to change the condition here to exactly w == 1, or maybe w <= 8? since I think we do not have other type represented by bit width smaller than 32.

We can have list of boolean, which is of width 1, and is represented by a byte in our supported targets? I tried to make clang to output `i1` but the `_ExtInt` must require `bitwidth >= 2`. Maybe it would be better to change the condition here to exactly `w == 1`, or maybe `w <= 8`? since I think we do not have other type represented by bit width smaller than 32.
sb10q merged commit 4976e89ae2 into master 2022-01-13 18:53:48 +08:00
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#140
No description provided.