Fix RPC of ndarrays from host to device #527

Merged
sb10q merged 2 commits from fix/ndarray-rpc into master 2024-08-29 16:26:58 +08:00
Collaborator

Thanks @lyken for the help debugging this issue.

Kernel used to test this functionality:

from numpy import int32, ndarray, arange as np_arange
import numpy as np

from artiq.experiment import *
from artiq.coredevice.core import Core

@nac3
class Test(EnvExperiment):
    core: KernelInvariant[Core]

    def build(self):
        self.setattr_device("core")

    @rpc
    def get_ndarray_1d(self) -> ndarray[int32, 1]:
        return np_arange(4, dtype=int32)

    @rpc
    def get_ndarray_2d(self) -> ndarray[int32, 2]:
        return np_arange(4, dtype=int32).reshape((2, 2))

    @kernel
    def run(self):
        print_rpc(self.get_ndarray_1d())
        print_rpc(self.get_ndarray_2d())

Fixes #496.

Thanks @lyken for the help debugging this issue. Kernel used to test this functionality: ```py from numpy import int32, ndarray, arange as np_arange import numpy as np from artiq.experiment import * from artiq.coredevice.core import Core @nac3 class Test(EnvExperiment): core: KernelInvariant[Core] def build(self): self.setattr_device("core") @rpc def get_ndarray_1d(self) -> ndarray[int32, 1]: return np_arange(4, dtype=int32) @rpc def get_ndarray_2d(self) -> ndarray[int32, 2]: return np_arange(4, dtype=int32).reshape((2, 2)) @kernel def run(self): print_rpc(self.get_ndarray_1d()) print_rpc(self.get_ndarray_2d()) ``` Fixes #496.
derppening added 1 commit 2024-08-29 15:23:26 +08:00
derppening force-pushed fix/ndarray-rpc from 006e60bc50 to f98d91c3c6 2024-08-29 15:25:18 +08:00 Compare
derppening requested review from sb10q 2024-08-29 15:25:30 +08:00
derppening requested review from lyken 2024-08-29 15:25:30 +08:00
lyken approved these changes 2024-08-29 15:48:36 +08:00
@ -531,0 +673,4 @@
let ndarray_nbytes = ctx
.build_call_or_invoke(
rpc_recv,
&[buffer.base_ptr(ctx, generator).into()], // Reads [i32; ndims]. NOTE: We are allocated [size_t; ndims].
Collaborator

We should delete/change my comment here. I think it is not [i32; ndims]. Also I messed up the sentence after NOTE:.

Tag::Array(it, num_dims) => {
    consume_value!(*mut (), |buffer| {
        // Deserialize length along each dimension and compute total number of
        // elements.
        let mut total_len: usize = 1;
        for _ in 0..num_dims {
            let len = reader.read_u32()? as usize;
            total_len *= len;
            consume_value!(usize, |ptr| *ptr = len ) // <----- not i32
        }

        // Allocate backing storage for elements; deserialize them.
        let elt_tag = it.clone().next().expect("truncated tag");
        *buffer = alloc(elt_tag.size() * total_len)?;
        recv_elements(reader, elt_tag, total_len, *buffer, alloc)
    })
}
We should delete/change my comment here. I think it is not `[i32; ndims]`. Also I messed up the sentence after `NOTE:`. ```rust Tag::Array(it, num_dims) => { consume_value!(*mut (), |buffer| { // Deserialize length along each dimension and compute total number of // elements. let mut total_len: usize = 1; for _ in 0..num_dims { let len = reader.read_u32()? as usize; total_len *= len; consume_value!(usize, |ptr| *ptr = len ) // <----- not i32 } // Allocate backing storage for elements; deserialize them. let elt_tag = it.clone().next().expect("truncated tag"); *buffer = alloc(elt_tag.size() * total_len)?; recv_elements(reader, elt_tag, total_len, *buffer, alloc) }) } ```
Author
Collaborator

Updated.

Updated.
@ -531,0 +680,4 @@
.unwrap();
// debug_assert(ndarray_nbytes > 0)
if ctx.registry.llvm_options.opt_level == OptimizationLevel::None {
Collaborator

Unrelated: I think there are inconsistencies with how we insert assertion tests in NAC3.

Sometimes we use cfg!(debug_assertions) and sometimes we are testing against OptimizationLevel::None. It might be necessary to standardize this approach in the future, say with a purpose-defined function like CodeGenContext::should_add_assertions() -> bool.

Unrelated: I think there are inconsistencies with how we insert assertion tests in NAC3. Sometimes we use `cfg!(debug_assertions)` and sometimes we are testing against `OptimizationLevel::None`. It might be necessary to standardize this approach in the future, say with a purpose-defined function like `CodeGenContext::should_add_assertions() -> bool`.
Author
Collaborator

IMO debug_assertions are for assertions that can be checked at compile-time (think function contract violations in Rust), whereas OptimizationNone are for assertions that can only be checked at runtime...

I should write some documentation on this.

IMO `debug_assertions` are for assertions that can be checked at compile-time (think function contract violations in Rust), whereas `OptimizationNone` are for assertions that can only be checked at runtime... I should write some documentation on this.
@ -531,0 +716,4 @@
// `ndarray.shape` must be initialized beforehand in this implementation
// (for ndarray.create_data() to know how many elements to allocate)
let num_elements =
call_ndarray_calc_size(generator, ctx, &ndarray.dim_sizes(), (None, None));
Collaborator

Probably some super insignificant details: I used ndarray_nbytes / itemsize to derive the num_elements for ndarray.create_data because it is computationally less expensive. Doing it like this is a little bit less efficient but I am not sure if this is concerning.

Also, it is possible to simply do let ndarray_data = ctx.builder.build_alloca(ndarray_nbytes, i8_type) and just set ndarray->data = ndarray_data.

Unless LLVM just figures everything out and optimizes this.

Either way, probably not a concern for now.

Probably some super insignificant details: I used `ndarray_nbytes / itemsize` to derive the `num_elements` for `ndarray.create_data` because it is computationally less expensive. Doing it like this is a little bit less efficient but I am not sure if this is concerning. Also, it is possible to simply do `let ndarray_data = ctx.builder.build_alloca(ndarray_nbytes, i8_type)` and just set `ndarray->data = ndarray_data`. Unless LLVM just figures everything out and optimizes this. Either way, probably not a concern for now.
Author
Collaborator

Yeah, but using alloca with type T guarantees that objects are properly aligned. I think let's get the correct implementation out there first, then we can think about how to optimize it further.

Yeah, but using `alloca` with type `T` guarantees that objects are properly aligned. I think let's get the correct implementation out there first, then we can think about how to optimize it further.
@ -531,0 +767,4 @@
let alloc_ptr =
ctx.builder.build_array_alloca(llvm_pi8, alloc_size, "rpc.alloc").unwrap();
let alloc_ptr =
ctx.builder.build_pointer_cast(alloc_ptr, llvm_pi8, "rpc.alloc.ptr").unwrap();
Collaborator

Redundant pointer cast I think.

Redundant pointer cast I think.
Author
Collaborator

Not redundant; First alloc_ptr is i8** (note that we are alloca-ing i8*), second one casts it back to i8*.

Not redundant; First `alloc_ptr` is `i8**` (note that we are `alloca`-ing `i8*`), second one casts it back to `i8*`.
Collaborator

Oh

Oh
Collaborator

Wait actually, why do we allocate [i8*; alloc_size] here? I thought alloc_size is the number of bytes to allocate on the kernel's end.

Wait actually, why do we allocate `[i8*; alloc_size]` here? I thought `alloc_size` is the number of *bytes* to allocate on the kernel's end.
Author
Collaborator

Ah, it's for 4-byte alignment. And it's overallocating 4 times the buffer size as well...

Let me fix that.

Ah, it's for 4-byte alignment. And it's overallocating 4 times the buffer size as well... Let me fix that.
Author
Collaborator

We now allocate alloc_size by allocating round_up(alloc_size, sizeof(T)) / sizeof(T) objects of T, which should also address any alignment issues.

We now allocate `alloc_size` by allocating `round_up(alloc_size, sizeof(T)) / sizeof(T)` objects of `T`, which should also address any alignment issues.
derppening force-pushed fix/ndarray-rpc from f98d91c3c6 to a59c26aa99 2024-08-29 16:08:56 +08:00 Compare
sb10q merged commit a59c26aa99 into master 2024-08-29 16:26:58 +08:00
sb10q deleted branch fix/ndarray-rpc 2024-08-29 16:26:58 +08:00
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#527
No description provided.