Merge pull request #208 from vertigo-designs/feature/io-docs

Adding documentation, updating DAC output timing
This commit is contained in:
Ryan Summers 2021-01-18 13:54:56 +01:00 committed by GitHub
commit d447501c47
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 231 additions and 61 deletions

View File

@ -14,6 +14,9 @@ break DefaultHandler
break HardFault
break rust_begin_unwind
source ../../PyCortexMDebug/cmdebug/svd_gdb.py
svd_load ~/Downloads/STM32H743x.svd
load
# tbreak cortex_m_rt::reset_handler
monitor reset halt

View File

@ -1,22 +1,77 @@
///! Stabilizer ADC management interface
///!
///! The Stabilizer ADCs utilize three DMA channels each: one to trigger sampling, one to collect
///! samples, and one to clear the EOT flag betwen samples. The SPI interfaces are configured
///! for receiver-only operation. A timer channel is
///! configured to generate a DMA write into the SPI CR1 register, which initiates a SPI transfer and
///! results in a single ADC sample read for both channels. A separate timer channel is configured to
///! occur immediately before the trigger channel, which initiates a write to the IFCR (flag-clear)
///! register to clear the EOT flag, which allows for a new transmission to be generated by the
///! trigger channel.
///! # Design
///!
///! In order to read multiple samples without interrupting the CPU, a separate DMA transfer is
///! configured to read from each of the ADC SPI RX FIFOs. Due to the design of the SPI peripheral,
///! these DMA transfers stall when no data is available in the FIFO. Thus, the DMA transfer only
///! completes after all samples have been read. When this occurs, a CPU interrupt is generated so
///! that software can process the acquired samples from both ADCs. Only one of the ADC DMA streams
///! is configured to generate an interrupt to handle both transfers, so it is necessary to ensure
///! both transfers are completed before reading the data. This is usually not significant for
///! busy-waiting because the transfers should complete at approximately the same time.
///! Stabilizer ADCs are connected to the MCU via a simplex, SPI-compatible interface. The ADCs
///! require a setup conversion time after asserting the CSn (convert) signal to generate the ADC
///! code from the sampled level. Once the setup time has elapsed, the ADC data is clocked out of
///! MISO. The internal setup time is managed by the SPI peripheral via a CSn setup time parameter
///! during SPI configuration, which allows offloading the management of the setup time to hardware.
///!
///! Because of the SPI-compatibility of the ADCs, a single SPI peripheral + DMA is used to automate
///! the collection of multiple ADC samples without requiring processing by the CPU, which reduces
///! overhead and provides the CPU with more time for processing-intensive tasks, like DSP.
///!
///! The automation of sample collection utilizes three DMA streams, the SPI peripheral, and two
///! timer compare channel for each ADC. One timer comparison channel is configured to generate a
///! comparison event every time the timer is equal to a specific value. Each comparison then
///! generates a DMA transfer event to write into the SPI CR1 register to initiate the transfer.
///! This allows the SPI interface to periodically read a single sample. The other timer comparison
///! channel is configured to generate a comparison event slightly before the first (~10 timer
///! cycles). This channel triggers a separate DMA stream to clear the EOT flag within the SPI
///! peripheral. The EOT flag must be cleared after each transfer or the SPI peripheral will not
///! properly complete the single conversion. Thus, by using two DMA streams and timer comparison
///! channels, the SPI can regularly acquire ADC samples.
///!
///! In order to collect the acquired ADC samples into a RAM buffer, a final DMA transfer is
///! configured to read from the SPI RX FIFO into RAM. The request for this transfer is connected to
///! the SPI RX data signal, so the SPI peripheral will request to move data into RAM whenever it is
///! available. When enough samples have been collected, a transfer-complete interrupt is generated
///! and the ADC samples are available for processing.
///!
///! The SPI peripheral internally has an 8- or 16-byte TX and RX FIFO, which corresponds to a 4- or
///! 8-sample buffer for incoming ADC samples. During the handling of the DMA transfer completion,
///! there is a small window where buffers are swapped over where it's possible that a sample could
///! be lost. In order to avoid this, the SPI RX FIFO is effectively used as a "sample overflow"
///! region and can buffer a number of samples until the next DMA transfer is configured. If a DMA
///! transfer is still not set in time, the SPI peripheral will generate an input-overrun interrupt.
///! This interrupt then serves as a means of detecting if samples have been lost, which will occur
///! whenever data processing takes longer than the collection period.
///!
///!
///! ## Starting Data Collection
///!
///! Because the DMA data collection is automated via timer count comparisons and DMA transfers, the
///! ADCs can be initialized and configured, but will not begin sampling the external ADCs until the
///! sampling timer is enabled. As such, the sampling timer should be enabled after all
///! initialization has completed and immediately before the embedded processing loop begins.
///!
///!
///! ## Batch Sizing
///!
///! The ADCs collect a group of N samples, which is referred to as a batch. The size of the batch
///! is configured by the user at compile-time to allow for a custom-tailored implementation. Larger
///! batch sizes generally provide for lower overhead and more processing time per sample, but come
///! at the expense of increased input -> output latency.
///!
///!
///! # Note
///!
///! While there are two ADCs, only a single ADC is configured to generate transfer-complete
///! interrupts. This is done because it is assumed that the ADCs will always be sampled
///! simultaneously. If only a single ADC is used, it must always be ADC0, as ADC1 will not generate
///! transfer-complete interrupts.
///!
///! There is a very small amount of latency between sampling of ADCs due to bus matrix priority. As
///! such, one of the ADCs will be sampled marginally earlier before the other because the DMA
///! requests are generated simultaneously. This can be avoided by providing a known offset to the
///! sample DMA requests, which can be completed by setting e.g. ADC0's comparison to a counter
///! value of 0 and ADC1's comparison to a counter value of 1.
///!
///! In this implementation, single buffer mode DMA transfers are used because the SPI RX FIFO can
///! be used as a means to both detect and buffer ADC samples during the buffer swap-over. Because
///! of this, double-buffered mode does not offer any advantages over single-buffered mode (unless
///! double-buffered mode offers less overhead due to the DMA disable/enable procedure).
use super::{
hal, timers, DMAReq, DmaConfig, MemoryToPeripheral, PeripheralToMemory,
Priority, TargetAddress, Transfer, SAMPLE_BUFFER_SIZE,
@ -191,7 +246,7 @@ macro_rules! adc_input {
// Generate DMA events when an output compare of the timer hits the specified
// value.
trigger_channel.listen_dma();
trigger_channel.to_output_compare(2);
trigger_channel.to_output_compare(2 + $index);
// The trigger stream constantly writes to the SPI CR1 using a static word
// (which is a static value to enable the SPI transfer). Thus, neither the

View File

@ -1,8 +1,55 @@
///! Stabilizer DAC management interface
///!
///! The Stabilizer DAC utilize a DMA channel to generate output updates. A timer channel is
///! configured to generate a DMA write into the SPI TXFIFO, which initiates a SPI transfer and
///! results in DAC update for both channels.
///! # Design
///!
///! Stabilizer DACs are connected to the MCU via a simplex, SPI-compatible interface. Each DAC
///! accepts a 16-bit output code.
///!
///! In order to maximize CPU processing time, the DAC code updates are offloaded to hardware using
///! a timer compare channel, DMA stream, and the DAC SPI interface.
///!
///! The timer comparison channel is configured to generate a DMA request whenever the comparison
///! occurs. Thus, whenever a comparison happens, a single DAC code can be written to the output. By
///! configuring a DMA stream for a number of successive DAC codes, hardware can regularly update
///! the DAC without requiring the CPU.
///!
///! In order to ensure alignment between the ADC sample batches and DAC output code batches, a DAC
///! output batch is always exactly 3 batches after the ADC batch that generated it.
///!
///! The DMA transfer for the DAC output codes utilizes a double-buffer mode to avoid losing any
///! transfer events generated by the timer (for example, when 2 update cycles occur before the DMA
///! transfer completion is handled). In this mode, by the time DMA swaps buffers, there is always a valid buffer in the
///! "next-transfer" double-buffer location for the DMA transfer. Once a transfer completes,
///! software then has exactly one batch duration to fill the next buffer before its
///! transfer begins. If software does not meet this deadline, old data will be repeatedly generated
///! on the output and output will be shifted by one batch.
///!
///! ## Multiple Samples to Single DAC Codes
///!
///! For some applications, it may be desirable to generate a single DAC code from multiple ADC
///! samples. In order to maintain timing characteristics between ADC samples and DAC code outputs,
///! applications are required to generate one DAC code for each ADC sample. To accomodate mapping
///! multiple inputs to a single output, the output code can be repeated a number of times in the
///! output buffer corresponding with the number of input samples that were used to generate it.
///!
///!
///! # Note
///!
///! There is a very small amount of latency between updating the two DACs due to bus matrix
///! priority. As such, one of the DACs will be updated marginally earlier before the other because
///! the DMA requests are generated simultaneously. This can be avoided by providing a known offset
///! to other DMA requests, which can be completed by setting e.g. DAC0's comparison to a
///! counter value of 2 and DAC1's comparison to a counter value of 3. This will have the effect of
///! generating the DAC updates with a known latency of 1 timer tick to each other and prevent the
///! DMAs from racing for the bus. As implemented, the DMA channels utilize natural priority of the
///! DMA channels to arbitrate which transfer occurs first.
///!
///!
///! # Limitations
///!
///! While double-buffered mode is used for DMA to avoid lost DAC-update events, there is no check
///! for re-use of a previously provided DAC output buffer. It is assumed that the DMA request is
///! served promptly after the transfer completes.
use super::{
hal, timers, DMAReq, DmaConfig, MemoryToPeripheral, TargetAddress,
Transfer, SAMPLE_BUFFER_SIZE,
@ -13,8 +60,8 @@ use super::{
// processed). Note that the contents of AXI SRAM is uninitialized, so the buffer contents on
// startup are undefined. The dimensions are `ADC_BUF[adc_index][ping_pong_index][sample_index]`.
#[link_section = ".axisram.buffers"]
static mut DAC_BUF: [[[u16; SAMPLE_BUFFER_SIZE]; 2]; 2] =
[[[0; SAMPLE_BUFFER_SIZE]; 2]; 2];
static mut DAC_BUF: [[[u16; SAMPLE_BUFFER_SIZE]; 3]; 2] =
[[[0; SAMPLE_BUFFER_SIZE]; 3]; 2];
macro_rules! dac_output {
($name:ident, $index:literal, $data_stream:ident,
@ -32,6 +79,16 @@ macro_rules! dac_output {
) -> Self {
Self { _channel, spi }
}
/// Start the SPI and begin operating in a DMA-driven transfer mode.
pub fn start_dma(&mut self) {
// Allow the SPI FIFOs to operate using only DMA data channels.
self.spi.enable_dma_tx();
// Enable SPI and start it in infinite transaction mode.
self.spi.inner().cr1.modify(|_, w| w.spe().set_bit());
self.spi.inner().cr1.modify(|_, w| w.cstart().started());
}
}
// Note(unsafe): This is safe because the DMA request line is logically owned by this module.
@ -60,7 +117,6 @@ macro_rules! dac_output {
MemoryToPeripheral,
&'static mut [u16; SAMPLE_BUFFER_SIZE],
>,
first_transfer: bool,
}
impl $name {
@ -78,11 +134,12 @@ macro_rules! dac_output {
// Generate DMA events when an output compare of the timer hitting zero (timer roll over)
// occurs.
trigger_channel.listen_dma();
trigger_channel.to_output_compare(0);
trigger_channel.to_output_compare(4 + $index);
// The stream constantly writes to the TX FIFO to write new update codes.
let trigger_config = DmaConfig::default()
.memory_increment(true)
.double_buffer(true)
.peripheral_increment(false);
// Listen for any potential SPI error signals, which may indicate that we are not generating
@ -90,64 +147,53 @@ macro_rules! dac_output {
let mut spi = spi.disable();
spi.listen(hal::spi::Event::Error);
// Allow the SPI FIFOs to operate using only DMA data channels.
spi.enable_dma_tx();
// Enable SPI and start it in infinite transaction mode.
spi.inner().cr1.modify(|_, w| w.spe().set_bit());
spi.inner().cr1.modify(|_, w| w.cstart().started());
// AXISRAM is uninitialized. As such, we manually zero-initialize it here before
// starting the transfer.
// Note(unsafe): We currently own all DAC_BUF[index] buffers and are not using them
// elsewhere, so it is safe to access them here.
for buf in unsafe { DAC_BUF[$index].iter_mut() } {
for byte in buf.iter_mut() {
*byte = 0;
}
}
// Construct the trigger stream to write from memory to the peripheral.
let transfer: Transfer<_, _, MemoryToPeripheral, _> =
let mut transfer: Transfer<_, _, MemoryToPeripheral, _> =
Transfer::init(
stream,
$spi::new(trigger_channel, spi),
// Note(unsafe): This buffer is only used once and provided for the DMA transfer.
unsafe { &mut DAC_BUF[$index][0] },
None,
// Note(unsafe): This buffer is only used once and provided for the DMA transfer.
unsafe { Some(&mut DAC_BUF[$index][1]) },
trigger_config,
);
transfer.start(|spi| spi.start_dma());
Self {
transfer,
// Note(unsafe): This buffer is only used once and provided for the next DMA transfer.
next_buffer: unsafe { Some(&mut DAC_BUF[$index][1]) },
first_transfer: true,
next_buffer: unsafe { Some(&mut DAC_BUF[$index][2]) },
}
}
/// Acquire the next output buffer to populate it with DAC codes.
pub fn acquire_buffer(
&mut self,
) -> &'static mut [u16; SAMPLE_BUFFER_SIZE] {
self.next_buffer.take().unwrap()
}
pub fn acquire_buffer(&mut self) -> &mut [u16; SAMPLE_BUFFER_SIZE] {
// Note: If a device hangs up, check that this conditional is passing correctly, as
// there is no time-out checks here in the interest of execution speed.
while !self.transfer.get_transfer_complete_flag() {}
/// Enqueue the next buffer for transmission to the DAC.
///
/// # Args
/// * `data` - The next data to write to the DAC.
pub fn release_buffer(
&mut self,
next_buffer: &'static mut [u16; SAMPLE_BUFFER_SIZE],
) {
// If the last transfer was not complete, we didn't write all our previous DAC codes.
// Wait for all the DAC codes to get written as well.
if self.first_transfer {
self.first_transfer = false
} else {
// Note: If a device hangs up, check that this conditional is passing correctly, as
// there is no time-out checks here in the interest of execution speed.
while !self.transfer.get_transfer_complete_flag() {}
}
let next_buffer = self.next_buffer.take().unwrap();
// Start the next transfer.
self.transfer.clear_interrupts();
let (prev_buffer, _, _) =
self.transfer.next_transfer(next_buffer).unwrap();
// .unwrap_none() https://github.com/rust-lang/rust/issues/62633
self.next_buffer.replace(prev_buffer);
self.next_buffer.as_mut().unwrap()
}
}
};

View File

@ -949,6 +949,22 @@ const APP: () = {
}
}
/// Main DSP processing routine for Stabilizer.
///
/// # Note
/// Processing time for the DSP application code is bounded by the following constraints:
///
/// DSP application code starts after the ADC has generated a batch of samples and must be
/// completed by the time the next batch of ADC samples has been acquired (plus the FIFO buffer
/// time). If this constraint is not met, firmware will panic due to an ADC input overrun.
///
/// The DSP application code must also fill out the next DAC output buffer in time such that the
/// DAC can switch to it when it has completed the current buffer. If this constraint is not met
/// it's possible that old DAC codes will be generated on the output and the output samples will
/// be delayed by 1 batch.
///
/// Because the ADC and DAC operate at the same rate, these two constraints actually implement
/// the same time bounds, meeting one also means the other is also met.
#[task(binds=DMA1_STR4, resources=[pounder_stamper, adcs, dacs, iir_state, iir_ch, dds_output, input_stamper], priority=2)]
fn process(c: process::Context) {
if let Some(stamper) = c.resources.pounder_stamper {
@ -960,6 +976,7 @@ const APP: () = {
c.resources.adcs.0.acquire_buffer(),
c.resources.adcs.1.acquire_buffer(),
];
let dac_samples = [
c.resources.dacs.0.acquire_buffer(),
c.resources.dacs.1.acquire_buffer(),
@ -993,10 +1010,6 @@ const APP: () = {
builder.write_profile();
}
let [dac0, dac1] = dac_samples;
c.resources.dacs.0.release_buffer(dac0);
c.resources.dacs.1.release_buffer(dac1);
}
#[idle(resources=[net_interface, pounder, mac_addr, eth_mac, iir_state, iir_ch, afes])]

View File

@ -1,4 +1,57 @@
///! The DdsOutput is used as an output stream to the pounder DDS.
///!
///! # Design
///!
///! The DDS stream interface is a means of quickly updating pounder DDS (direct digital synthesis)
///! outputs of the AD9959 DDS chip. The DDS communicates via a quad-SPI interface and a single
///! IO-update output pin.
///!
///! In order to update the DDS interface, the frequency tuning word, amplitude control word, and
///! the phase offset word for a channel can be modified to change the frequency, amplitude, or
///! phase on any of the 4 available output channels. Changes do not propagate to DDS outputs until
///! the IO-update pin is toggled high to activate the new configurations. This allows multiple
///! channels or parameters to be updated and then effects can take place simultaneously.
///!
///! In this implementation, the phase, frequency, or amplitude can be updated for any single
///! collection of outputs simultaneously. This is done by serializing the register writes to the
///! DDS into a single buffer of data and then writing the data over QSPI to the DDS.
///!
///! In order to minimize software overhead, data is written directly into the QSPI output FIFO. In
///! order to accomplish this most efficiently, serialized data is written as 32-bit words to
///! minimize the number of bus cycles necessary to write to the peripheral FIFO. A consequence of
///! this is that additional unneeded register writes may be appended to align a transfer to 32-bit
///! word sizes.
///!
///! In order to pulse the IO-update signal, the high-resolution timer output is used. The timer is
///! configured to assert the IO-update signal after a predefined delay and then de-assert the
///! signal after a predefined assertion duration. This allows for the actual QSPI transfer and
///! IO-update toggle to be completed asynchronously to the rest of software processing - that is,
///! software can schedule the DDS updates and then continue data processing. DDS updates then take
///! place in the future when the IO-update is toggled by hardware.
///!
///!
///! # Limitations
///!
///! The QSPI output FIFO is used as an intermediate buffer for holding pending QSPI writes. Because
///! of this, the implementation only supports up to 16 serialized bytes (the QSPI FIFO is 4 32-bit
///! words wide) in a single update.
///!
///! There is currently no synchronization between completion of the QSPI data write and the
///! IO-update signal. It is currently assumed that the QSPI transfer will always complete within a
///! predefined delay (the pre-programmed IO-update timer delay).
///!
///!
///! # Future Improvement
///!
///! In the future, it would be possible to utilize a DMA transfer to complete the QSPI transfer.
///! Once the QSPI transfer completed, this could trigger the IO-update timer to start to
///! asynchronously complete IO-update automatically. This would allow for arbitrary profile sizes
///! and ensure that IO-update was in-sync with the QSPI transfer.
///!
///! Currently, serialization is performed on each processing cycle. If there is a
///! compile-time-known register update sequence needed for the application, the serialization
///! process can be done once and then register values can be written into a pre-computed serialized
///! buffer to avoid the software overhead of much of the serialization process.
use super::QspiInterface;
use crate::hrtimer::HighResTimerE;
use ad9959::{Channel, DdsConfig, ProfileSerializer};