implement numpy arrays, math functions, and matrix math #149

Open
opened 2021-12-23 16:11:50 +08:00 by sb10q · 14 comments
same as legacy compiler https://github.com/m-labs/artiq/issues/1485 https://git.m-labs.hk/M-Labs/artiq-zynq/issues/64
sb10q added the
future-extension
label 2021-12-23 16:11:50 +08:00

We also have to implement the numpy array type.

We also have to implement the numpy array type.
sb10q changed title from implement numpy matrix math and math functions to implement numpy arrays, math functions, and matrix math 2021-12-23 19:14:29 +08:00
sb10q removed the
future-extension
label 2022-04-04 14:33:26 +08:00
dpn was assigned by sb10q 2022-04-04 14:33:31 +08:00
sb10q added this to the Beta milestone 2022-04-04 14:33:33 +08:00

Do we want to try to implement this as a library, or hard-code an array[element_type, num_dims] type into the compiler like before? The first option would involve adding enough in terms of template value parameters, function overloading, and operator overloading to implement something like a C++ array library (Eigen, …), which is probably a non-trivial undertaking. Just implementing the array type and simple numpy function broadcasting in the compiler like in artiq.compiler definitely works, but is maybe a bit inflexible.

Do we want to try to implement this as a library, or hard-code an `array[element_type, num_dims]` type into the compiler like before? The first option would involve adding enough in terms of template value parameters, function overloading, and operator overloading to implement something like a C++ array library (Eigen, …), which is probably a non-trivial undertaking. Just implementing the array type and simple numpy function broadcasting in the compiler like in `artiq.compiler` definitely works, but is maybe a bit inflexible.

I guess we probably have to hardcode the array type in the compiler? Not sure how should we handle num_dims otherwise.

I guess we probably have to hardcode the array type in the compiler? Not sure how should we handle `num_dims` otherwise.

But for function implementations we might be able to implement it as a library.

But for function implementations we might be able to implement it as a library.

Not sure how should we handle num_dims otherwise.

It's easily possible to implement template value parameters if you don't need to do global type inference for them (as long as you have function templates).

But yes, it seems like adding an array type is overall less complexity and effort, given that at least the nac3artiq layer thens needs to explicitly handle the conversion from host NumPy arrays to whatever array type anyway.

> Not sure how should we handle num_dims otherwise. It's easily possible to implement template value parameters if you don't need to do global type inference for them (as long as you have function templates). But yes, it seems like adding an array type is overall less complexity and effort, given that at least the nac3artiq layer thens needs to explicitly handle the conversion from host NumPy arrays to whatever array type anyway.

With the general move towards "proper" Python type annotations in NAC3, it would be nice if we could choose a syntax in line with host NumPy, such that users can use static analysis tools, etc.

Unfortunately, this is very much still brewing in upstream NumPy, and what seems like a tentative proposal won't be finalised until PEP646 is in (Python 3.11) and has trickled down into mypy, etc.: https://github.com/numpy/numpy/issues/16544. Then again, much of that complexity comes from trying to encode the actual shape, not just its rank (which adds considerable complexity).

I guess what we could do is to represent our type as ndarray[num_dims, elem_type] in anticipation that this will probably be compatible with whatever rank-only syntax they choose.

There is also the separate https://github.com/ramonhagenaars/nptyping.

With the general move towards "proper" Python type annotations in NAC3, it would be nice if we could choose a syntax in line with host NumPy, such that users can use static analysis tools, etc. Unfortunately, this is very much still brewing in upstream NumPy, and what seems like a tentative proposal won't be finalised until PEP646 is in (Python 3.11) and has trickled down into mypy, etc.: https://github.com/numpy/numpy/issues/16544. Then again, much of that complexity comes from trying to encode the actual shape, not just its rank (which adds considerable complexity). I guess what we could do is to represent our type as `ndarray[num_dims, elem_type]` in anticipation that this will probably be compatible with whatever rank-only syntax they choose. There is also the separate https://github.com/ramonhagenaars/nptyping.

A subscriptable np.ndarray has been merged into NumPy https://github.com/numpy/numpy/pull/19879, although the extra np.dtype wrapper seems undesirable, and I'm not sure what form the shape type is supposed to take (some comments indicate that they haven't figured this out yet; perhaps this is even https://github.com/numpy/numpy/issues/16544). There is also numpy.typing.NDArray, but that doesn't allow specifying a shape type at all.

A subscriptable `np.ndarray` has been merged into NumPy https://github.com/numpy/numpy/pull/19879, although the extra `np.dtype` wrapper seems undesirable, and I'm not sure what form the shape type is supposed to take (some comments indicate that they haven't figured this out yet; perhaps this is even https://github.com/numpy/numpy/issues/16544). There is also `numpy.typing.NDArray`, but that doesn't allow specifying a shape type at all.
Collaborator

@sb10q Revisiting this problem, I currently have a design dilemma.

Right now, the builtin NDArray[dtype] is represented as type { usize, usize*, dtype* }. This has the benefit of allowing multidimensional arrays to be represented by a flattened 1D representation, but type checking for indexing is to be solved.

Consider the following example:

n = numpy.zeros((2, 3))
n[0]	 # should be NDArray
n[0][0]	 # should be float

Since the shape information is a runtime property, it will probably be difficult (if not impossible) to tell the number of dimensions, and hence, the type of the expression after each index operation.

The alternative would be to use naive type nested, where a (2, 3)-shaped NDArray would be represented as NDArray[NDArray[dtype]], but if this is directly translated into LLVM it would introduce overhead due to the increased number of indirections.

A middle-ground solution would be to decouple the type from its memory representation, e.g. use NDArray[NDArray[dtype]] in the type system but use a flattened 1D array in its in-memory representation. Special lowering will need to be performed for getting and/or setting individual rows/elements, but otherwise should work. The only issue is we will lose the ability to declare NDArray types with an unknown number of dimensions, but that probably would not be necessary?

Let me know what you think about this.

@sb10q Revisiting this problem, I currently have a design dilemma. Right now, the builtin `NDArray[dtype]` is represented as `type { usize, usize*, dtype* }`. This has the benefit of allowing multidimensional arrays to be represented by a flattened 1D representation, but type checking for indexing is to be solved. Consider the following example: ```py n = numpy.zeros((2, 3)) n[0] # should be NDArray n[0][0] # should be float ``` Since the shape information is a runtime property, it will probably be difficult (if not impossible) to tell the number of dimensions, and hence, the type of the expression after each index operation. The alternative would be to use naive type nested, where a `(2, 3)`-shaped NDArray would be represented as `NDArray[NDArray[dtype]]`, but if this is directly translated into LLVM it would introduce overhead due to the increased number of indirections. A middle-ground solution would be to decouple the type from its memory representation, e.g. use `NDArray[NDArray[dtype]]` in the type system but use a flattened 1D array in its in-memory representation. Special lowering will need to be performed for getting and/or setting individual rows/elements, but otherwise should work. The only issue is we will lose the ability to declare `NDArray` types with an unknown number of dimensions, but that probably would not be necessary? Let me know what you think about this.

Did you discard requiring the number of dimensions to be known at compile-time (like in the current compiler) for some particular reason?

This struck me as the most suitable tradeoff between flexibility and runtime performance/indexing ergonomics.

There is some prior art in this domain in the form of all the linear-algebra-related libraries for C++, D and so on, for instance Eigen.

Did you discard requiring the number of dimensions to be known at compile-time (like in the current compiler) for some particular reason? This struck me as the most suitable tradeoff between flexibility and runtime performance/indexing ergonomics. There is some prior art in this domain in the form of all the linear-algebra-related libraries for C++, D and so on, for instance [Eigen](https://eigen.tuxfamily.org).

In actual scientific code using NumPy/SciPy, reshaping between arrays with different numbers of dimensions is reasonably common. While IIRC not currently supported in artiq.compiler, staying with NumPy's model of a flat data buffer + whatever extra indexing metadata made a lot of sense. It also helps with being able to do strided/multidimensional/… slicing in a zero-copy way.

In actual scientific code using NumPy/SciPy, reshaping between arrays with different numbers of dimensions is reasonably common. While IIRC not currently supported in `artiq.compiler`, staying with NumPy's model of a flat data buffer + whatever extra indexing metadata made a lot of sense. It also helps with being able to do strided/multidimensional/… slicing in a zero-copy way.
Collaborator

After discussing with @sb10q a few days ago, we decided to embed the NDArray number of dimensions into the type itself while keeping the in-memory representation the same.

Effectively the array would be represented as NDArray[dtype, ndims] in the type system, where ndims is a const-generic argument specifying the number of dimensions. The only drawback of this approach AFAICT is some checks will need to be performed at runtime, e.g. checking the number of dimensions in shape against the number of declared ndims in the type parameter for array creation or reshape functions, e.g.

np.zeros[float, 2]([2, 2])  # need to ensure that ndims=2 matches len([2, 2])
After discussing with @sb10q a few days ago, we decided to embed the NDArray number of dimensions into the type itself while keeping the in-memory representation the same. Effectively the array would be represented as `NDArray[dtype, ndims]` in the type system, where `ndims` is a const-generic argument specifying the number of dimensions. The only drawback of this approach AFAICT is some checks will need to be performed at runtime, e.g. checking the number of dimensions in `shape` against the number of declared `ndims` in the type parameter for array creation or reshape functions, e.g. ```py np.zeros[float, 2]([2, 2]) # need to ensure that ndims=2 matches len([2, 2]) ```

Effectively the array would be represented as NDArray[dtype, ndims] in the type system, where ndims is a const-generic argument specifying the number of dimensions.

Yep, this then matches my artiq.compiler implementation, and also what I had sketched out for NAC3 a few years ago.

while keeping the in-memory representation the same.

By "the same", presumably you mean (dtype*, (size_t, size_t, …)), where the tuple of lengths is ndims in size? That sounds good; we can always add support for explicit strides if we want to support multi-dimensional slicing in the future.

> Effectively the array would be represented as NDArray[dtype, ndims] in the type system, where ndims is a const-generic argument specifying the number of dimensions. Yep, this then matches my `artiq.compiler` implementation, and also what I had sketched out for NAC3 a few years ago. > while keeping the in-memory representation the same. By "the same", presumably you mean `(dtype*, (size_t, size_t, …))`, where the tuple of lengths is `ndims` in size? That sounds good; we can always add support for explicit strides if we want to support multi-dimensional slicing in the future.

Effectively the array would be represented as NDArray[dtype, ndims] in the type system, where ndims is a const-generic argument specifying the number of dimensions.

Yep, this then matches my artiq.compiler implementation, and also what I had sketched out for NAC3 a few years ago.

while keeping the in-memory representation the same.

By "the same", presumably you mean (dtype*, (size_t, size_t, …)), where the tuple of lengths is ndims in size?

> Effectively the array would be represented as NDArray[dtype, ndims] in the type system, where ndims is a const-generic argument specifying the number of dimensions. Yep, this then matches my `artiq.compiler` implementation, and also what I had sketched out for NAC3 a few years ago. > while keeping the in-memory representation the same. By "the same", presumably you mean `(dtype*, (size_t, size_t, …))`, where the tuple of lengths is `ndims` in size?
dpn closed this issue 2023-12-01 00:25:33 +08:00
dpn reopened this issue 2023-12-01 11:09:09 +08:00

(apologies, clicked the "Comment and Close" button by mistake)

(apologies, clicked the "Comment and Close" button by mistake)
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#149
There is no content yet.