NDArray with strides + NDArrayObject + Models + Exceptions in IRRT. #506

Closed
lyken wants to merge 51 commits from ndstrides-intro into ndstrides
Collaborator

This PR is primarily about adding strides to NAC3's ndarray definition. Merging into the separate branch ndstrides that is in sync with master as the refactoring of nac3core with Model<'ctx> and *Object<'ctx> are incomplete (See below).

Solves #397, #411, and partially solves #278.

To help in implementing ndarray with strides, the following things are also introduced:

  1. Adding Model<'ctx> abstraction - a thin layer of Rust type-hints over LLVM values. It also has mechanisms to help defining new LLVM struct types with very little boilerplate. LLVM 15 opaque pointers have also been accounted for. Shortly there will be a PR that refactors nac3core with Models.
  2. Adding Object - an LLVM value with related typechecker types in a single struct. This is used to organize programming interfaces. This PR primarily works with NDArrayObject<'ctx>, but there are a few others including {Tuple,List,Any}Object to help with interfacing. Shortly there will be a PR that refactors nac3core with Objects.
  3. About IRRT:
    • IRRT is now a multi-file source tree, a .clang-format is added.
    • IRRT has mechanisms to throw exceptions, either in 32-bits or 64-bits. The location details are to set be within the C++ source code itself using CPP magic macros. The exception IDs are initialized at link time.
    • IRRT can do debug assertions if compiled with -DIRRT_DEBUG_ASSERT in nac3core/build.rs. Debug assertions throw exceptions.

Overview on changes/additions about NDArrays

  1. Slicing an ndarray no longer copies ndarray->data and only creates a new ndarray with different strides but points to the same data ndarray->data, which is very cheap. This is only possible with the addition of strides to NAC3's ndarray definition. This is how NumPy implements ndarrays too.
    1. ... and np.newaxis have been implemented, but they are only there to help with implementing other aspects of ndarray with strides. You cannot write my_array[..., none] in NAC3 Python as the type inferencer/code generator cannot understand it. I have encountered some problems when trying to implement this into NAC3:
    2. ... does not have a concrete type. #486. Probably requires a hack on the typechecker.
    3. Consider my_array[..., none] again; none has type <class 'Option'>, but NumPy wants a NoneType; and consider @portable.
  2. Reimplemented np_array() + Proper assertions when the input is a list (e.g., raise an exception when the list has inhomogeneous dimensions).
  3. Added np_strides(), np_shape(), and np_size(). These were initially used for debugging but they have been implemented as real NAC3 functions anyway. Note that in NumPy, np.strides() is not an actual function, but <ndarray>.strides is used instead.
  4. Reimplemented np_reshape() + More concise checks for when there is an unknown dimension. Implemented in IRRT.
    • np_reshape() can reshape an ndarray without making a copy of the data, under certain conditions.
    • (Partially) solves issue: #278 (ndarray: Implement reshaping). See below for why this is "partial".
    • NOTE: In NumPy, np.reshape(<ndarray>) may or may not make a copy depending on if a reshape is "possible". Currently, NAC3's criterion is by simply checking NDArrayObject::is_c_contiguous() to decide whether or not to make a copy by playing around with the ndarray stride values, but the criterion is incomplete.
  5. Added np_broadcast_to().
  6. Reimplemented np_transpose().
    • np_transpose can transpose an ndarray without making a copy of the data.
    • NOTE: The logic for handling arbitrary <axes> is implemented in IRRT, but not in typechecker and code generator yet.
  7. Added general ndarray subscript assignment with referential integrity.
    • For example, you can do funky things like np_transpose(np_transpose(my_array)[::3])[0, 2:100] = 1.0 and my_array would update. This is only possible with ndarray with strides.
    • Solves #411 (Implement subscript-assignment for NDArray) in a different way than described in the issue.
  8. Added general matrix multiplication.
    • 1D @ 2D, 2D @ 1D, and stacking are now supported.
    • Solves #397 (ndarray 1D matrix multiplication).
  9. Everything else that works with ndarray has been reimplemented to work with ndarray with strides.

Other notes

  1. On np_size()'s function signature: Is the function signature correct? I have encountered issues where if I directly use self.primitives.ndarray as the type for a, self.primitives.ndarray's dtype and ndims are substituted permanently. See commit "core/ndstrides: implement np_size()".
  2. On matmul signature: Is the typechecker type correct? I have also encountered the same issue like np_size() when the other_ty of impl_matmul is simply set to ndarray_unsized_t. See commit "core/ndstrides: implement general matmul".
This PR is primarily about adding strides to NAC3's ndarray definition. Merging into the separate branch `ndstrides` that is in sync with `master` as the refactoring of `nac3core` with `Model<'ctx>` and `*Object<'ctx>` are incomplete (See below). Solves https://git.m-labs.hk/M-Labs/nac3/issues/397, https://git.m-labs.hk/M-Labs/nac3/issues/411, and partially solves https://git.m-labs.hk/M-Labs/nac3/issues/278. To help in implementing ndarray with strides, the following things are also introduced: 1. Adding `Model<'ctx>` abstraction - a thin layer of Rust type-hints over LLVM values. It also has mechanisms to help defining new LLVM struct types with very little boilerplate. LLVM 15 opaque pointers have also been accounted for. Shortly there will be a PR that refactors `nac3core` with `Model`s. 2. Adding `Object` - an LLVM value with related typechecker types in a single struct. This is used to organize programming interfaces. This PR primarily works with `NDArrayObject<'ctx>`, but there are a few others including `{Tuple,List,Any}Object` to help with interfacing. Shortly there will be a PR that refactors `nac3core` with `Object`s. 3. About IRRT: - IRRT is now a multi-file source tree, a `.clang-format` is added. - IRRT has mechanisms to throw exceptions, either in 32-bits or 64-bits. The location details are to set be within the C++ source code itself using CPP magic macros. The exception IDs are initialized at link time. - IRRT can do debug assertions if compiled with `-DIRRT_DEBUG_ASSERT` in `nac3core/build.rs`. Debug assertions throw exceptions. ### Overview on changes/additions about NDArrays 1. Slicing an ndarray no longer copies `ndarray->data` and only creates a new ndarray with different strides but points to the same data `ndarray->data`, which is very cheap. This is only possible with the addition of strides to NAC3's ndarray definition. This is how NumPy implements ndarrays too. 1. `...` and `np.newaxis` have been implemented, but they are only there to help with implementing other aspects of ndarray with strides. You cannot write `my_array[..., none]` in NAC3 Python as the type inferencer/code generator cannot understand it. I have encountered some problems when trying to implement this into NAC3: 2. `...` does not have a concrete type. https://git.m-labs.hk/M-Labs/nac3/issues/486. Probably requires a hack on the typechecker. 3. Consider `my_array[..., none]` again; `none` has type `<class 'Option'>`, but NumPy wants a `NoneType`; and consider `@portable`. 2. Reimplemented `np_array()` + Proper assertions when the input is a list (e.g., raise an exception when the list has inhomogeneous dimensions). 3. Added `np_strides(),` `np_shape()`, and `np_size()`. These were initially used for debugging but they have been implemented as real NAC3 functions anyway. Note that in NumPy, `np.strides()` is not an actual function, but `<ndarray>.strides` is used instead. 4. Reimplemented `np_reshape()` + More concise checks for when there is an unknown dimension. Implemented in IRRT. - `np_reshape()` can reshape an ndarray without making a copy of the data, under certain conditions. - (Partially) solves issue: https://git.m-labs.hk/M-Labs/nac3/issues/278 (ndarray: Implement reshaping). See below for why this is "partial". - NOTE: In NumPy, `np.reshape(<ndarray>)` may or may not make a copy depending on if a reshape is "possible". Currently, NAC3's criterion is by simply checking `NDArrayObject::is_c_contiguous()` to decide whether or not to make a copy by playing around with the ndarray stride values, but the criterion is incomplete. 5. Added `np_broadcast_to()`. 6. Reimplemented `np_transpose()`. - `np_transpose` can transpose an ndarray without making a copy of the data. - NOTE: The logic for handling arbitrary `<axes>` is implemented in IRRT, but not in typechecker and code generator yet. 7. Added general ndarray subscript assignment with referential integrity. - For example, you can do funky things like `np_transpose(np_transpose(my_array)[::3])[0, 2:100] = 1.0` and `my_array` would update. This is only possible with ndarray with strides. - Solves https://git.m-labs.hk/M-Labs/nac3/issues/411 (Implement subscript-assignment for NDArray) in a different way than described in the issue. 8. Added general matrix multiplication. - 1D @ 2D, 2D @ 1D, and stacking are now supported. - Solves https://git.m-labs.hk/M-Labs/nac3/issues/397 (ndarray 1D matrix multiplication). 9. Everything else that works with ndarray has been reimplemented to work with ndarray with strides. ### Other notes 1. On `np_size()`'s function signature: Is the function signature correct? I have encountered issues where if I directly use `self.primitives.ndarray` as the type for `a`, `self.primitives.ndarray`'s `dtype` and `ndims` are substituted permanently. See commit "core/ndstrides: implement np_size()". 2. On matmul signature: Is the typechecker type correct? I have also encountered the same issue like `np_size()` when the `other_ty` of `impl_matmul` is simply set to `ndarray_unsized_t`. See commit "core/ndstrides: implement general matmul".
lyken added 51 commits 2024-08-26 12:07:32 +08:00
Achieved through defining all the needed Exception ID constants at link
time.

Secondly, since `Exception` is `size_t` dependent, `__nac3_raise()`
takes an opaque pointer to `Exception`.
A small abstraction to simplify implementations.
NDArray with strides.
Needed for implementing other ndarray utils.
Needed for implementing general ndarray indexing.

Currently the IRRT slice and range have nothing to do with NAC3's slice
and range.
The name `NDIndex` is used in later commits.
The functionality for `...` and `np.newaxis` is there in IRRT, but there
is no implementation of them for @kernel Python expressions because of
#486.
Needed for implementing np_array()
It also checks for inconsistent dimensions if the input is a list.
e.g., rejecting `[[1.0, 2.0], [3.0]]`.
These functions are not important, but they are handy for debugging.

`np.strides()` is not an actual NumPy function, but `ndarray.strides` is used.
The IRRT implementation knows how to handle axes. But the argument is
not in NAC3 yet.
Currently this is used to interop with nalgebra.
Print their shapes and exhaustively print all contents.
Nothing depends on the old ndarray implementation now.
New type vars are introduced when programming new ndarray functions.
lyken requested review from derppening 2024-08-26 12:07:38 +08:00
Author
Collaborator

Closing, to be broken down into smaller PRs.

Closing, to be broken down into smaller PRs.
lyken closed this pull request 2024-08-26 14:37:11 +08:00

Pull request closed

Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#506
No description provided.