* master: (23 commits)
RELEASE_NOTES: update
pipistrello: add some inputs
Remove last vestiges of nist_qc1.
Fully drop AD9858 and kc705-nist_qc1 support (closes#576).
coredevice.dds: reimplement fully in ARTIQ Python.
compiler: unbreak casts to int32/int64.
analyses.constness: fix false positive on x[...].
inferencer: significantly improve the op-assignment diagnostic.
Fix tests.
Move mu_to_seconds, seconds_to_mu to Core.
artiq_devtool: don't crash on invalid utf-8.
artiq_devtool: detect a race condition during connect.
llvm_ir_generator: handle no-op coercions.
conda: use development version of migen/misoc
Revert accidentally committed code.
Revert "gateware: increase RTIO FIFO sizes for NIST_CLOCK. Closes#623"
analyses.invariant_detection: implement (#622).
Fix whitespace.
coredevice.dds: work around the round(numpy.float64()) snafu.
coredevice.dds: update from obsolete int(width=) syntax (fixes#621).
...
Before this commit, it displayed incorrect output if an error
appeared on 2nd run and beyond, and displayed messages for trying
to do "num32 -= num64" that made very little sense.
See also commit feed91d; that commit fixed the test_rpc_timing test,
but caused frequent hangs elsewhere, which were also caused by buggy
Nagle implementation. Just disable this entirely, as with our
explicit buffering it provides no benefit anyway.
This brings mean RPC time from ~45ms to ~2ms.
The cause of the slowness without buffering is, primarily, that lwip
is severely pessimized by small writes, whether with Nagle on or off.
(In fact, disabling Nagle makes it function *better* on many small
writes, which begs the question of what's the point of having Nagle
there in the first place.) In practical terms, the slowness appears
only when writing a 4-byte buffer (the synchronization segment);
writing buffers of other sizes does not trigger the problem.
This all is extremely confusing and the fix is partly palliative,
but since it seems to work reliably and we're migrating off lwip
I think it is unwise to spend any more time debugging this.
It takes ~4ms to print an empty log line because of how slow
the UART is. This makes the log timestamps useless for debugging
performance problems.
After this commit, it takes ~75us to print an empty log line instead,
which pessimizes test_rpc_timing by less than 2ms with tracing
enabled.
This enables constant propagation optimisations, as verified by
the included test case. This is only a first stop-gap measure, though;
we should support optimisation based on kernel invariants on a more
fine-grained level.