Added more tests and use normal rpc instead of async rpc.
Async RPC does not represent the real throughput which is limited by the
hardware and the network. Normal RPC which requires a response from the
remote is closer to real usecases.
Even though the code already used non-greedy wildcards before,
it would not find the shortest match, as earlier match starts
would still take precedence.
This could possibly be sped up a bit in CPython by doing
everything inside re using lookahead-assertion trickery, but the
current code is already imperceptibly fast for hundreds of
choices.
This generates rather more code than necessary, but has
the advantage of automatically handling incomplete
multi-dimensional subscripts which still leave arrays
behind.
Lists and arrays no longer have the same representation all
the way through codegen, as used to be the case.
This could/should be made more efficient later, eliding the
temporary copies.
Left generic transpose (shape order inversion) for now, as that
would be less ugly if we implement forwarding to Python function
bodies for array function implementations.
Needs a runtime test case.
LLVM will take care of optimising the loops. This was still
unnecessarily painful; implementing generics and implementing
this in ARTIQ Python looks very attractive right now.
Relies on the runtime to provide the necessary
(libm-compatible) functions.
The test is nifty, but a bit brittle; if this breaks in the
future because of optimizer changes, do not hesitate to convert
this into a more pedestrian test case.
So far, this is not exposed to the user beyond implicit conversions.
Note that all the implicit conversions, such as triggered by adding
arrays of mismatching types, or dividing integer arrays, are currently
emitted in a maximally inefficient way, where a temporary copy is first
made for the type conversion. The conversions would more sensibly be
implemented during the per-element operations to save on the extra
copies, but the current behaviour fell out of the rest of the IR
generator structure without extra changes.
Matches NumPy. Slicing a TList reallocates, this doesn't; offsetting
couldn't be handled in the IR without introducing new semantics
(the Alloc kludge; could/should be made its own IR type).