diff --git a/openocd.gdb b/openocd.gdb index e903a33..48040b1 100644 --- a/openocd.gdb +++ b/openocd.gdb @@ -14,6 +14,9 @@ break DefaultHandler break HardFault break rust_begin_unwind +source ../../PyCortexMDebug/cmdebug/svd_gdb.py +svd_load ~/Downloads/STM32H743x.svd + load # tbreak cortex_m_rt::reset_handler monitor reset halt diff --git a/src/adc.rs b/src/adc.rs index 6c1bf34..48792d9 100644 --- a/src/adc.rs +++ b/src/adc.rs @@ -1,22 +1,77 @@ ///! Stabilizer ADC management interface ///! -///! The Stabilizer ADCs utilize three DMA channels each: one to trigger sampling, one to collect -///! samples, and one to clear the EOT flag betwen samples. The SPI interfaces are configured -///! for receiver-only operation. A timer channel is -///! configured to generate a DMA write into the SPI CR1 register, which initiates a SPI transfer and -///! results in a single ADC sample read for both channels. A separate timer channel is configured to -///! occur immediately before the trigger channel, which initiates a write to the IFCR (flag-clear) -///! register to clear the EOT flag, which allows for a new transmission to be generated by the -///! trigger channel. +///! # Design ///! -///! In order to read multiple samples without interrupting the CPU, a separate DMA transfer is -///! configured to read from each of the ADC SPI RX FIFOs. Due to the design of the SPI peripheral, -///! these DMA transfers stall when no data is available in the FIFO. Thus, the DMA transfer only -///! completes after all samples have been read. When this occurs, a CPU interrupt is generated so -///! that software can process the acquired samples from both ADCs. Only one of the ADC DMA streams -///! is configured to generate an interrupt to handle both transfers, so it is necessary to ensure -///! both transfers are completed before reading the data. This is usually not significant for -///! busy-waiting because the transfers should complete at approximately the same time. +///! Stabilizer ADCs are connected to the MCU via a simplex, SPI-compatible interface. The ADCs +///! require a setup conversion time after asserting the CSn (convert) signal to generate the ADC +///! code from the sampled level. Once the setup time has elapsed, the ADC data is clocked out of +///! MISO. The internal setup time is managed by the SPI peripheral via a CSn setup time parameter +///! during SPI configuration, which allows offloading the management of the setup time to hardware. +///! +///! Because of the SPI-compatibility of the ADCs, a single SPI peripheral + DMA is used to automate +///! the collection of multiple ADC samples without requiring processing by the CPU, which reduces +///! overhead and provides the CPU with more time for processing-intensive tasks, like DSP. +///! +///! The automation of sample collection utilizes three DMA streams, the SPI peripheral, and two +///! timer compare channel for each ADC. One timer comparison channel is configured to generate a +///! comparison event every time the timer is equal to a specific value. Each comparison then +///! generates a DMA transfer event to write into the SPI CR1 register to initiate the transfer. +///! This allows the SPI interface to periodically read a single sample. The other timer comparison +///! channel is configured to generate a comparison event slightly before the first (~10 timer +///! cycles). This channel triggers a separate DMA stream to clear the EOT flag within the SPI +///! peripheral. The EOT flag must be cleared after each transfer or the SPI peripheral will not +///! properly complete the single conversion. Thus, by using two DMA streams and timer comparison +///! channels, the SPI can regularly acquire ADC samples. +///! +///! In order to collect the acquired ADC samples into a RAM buffer, a final DMA transfer is +///! configured to read from the SPI RX FIFO into RAM. The request for this transfer is connected to +///! the SPI RX data signal, so the SPI peripheral will request to move data into RAM whenever it is +///! available. When enough samples have been collected, a transfer-complete interrupt is generated +///! and the ADC samples are available for processing. +///! +///! The SPI peripheral internally has an 8- or 16-byte TX and RX FIFO, which corresponds to a 4- or +///! 8-sample buffer for incoming ADC samples. During the handling of the DMA transfer completion, +///! there is a small window where buffers are swapped over where it's possible that a sample could +///! be lost. In order to avoid this, the SPI RX FIFO is effectively used as a "sample overflow" +///! region and can buffer a number of samples until the next DMA transfer is configured. If a DMA +///! transfer is still not set in time, the SPI peripheral will generate an input-overrun interrupt. +///! This interrupt then serves as a means of detecting if samples have been lost, which will occur +///! whenever data processing takes longer than the collection period. +///! +///! +///! ## Starting Data Collection +///! +///! Because the DMA data collection is automated via timer count comparisons and DMA transfers, the +///! ADCs can be initialized and configured, but will not begin sampling the external ADCs until the +///! sampling timer is enabled. As such, the sampling timer should be enabled after all +///! initialization has completed and immediately before the embedded processing loop begins. +///! +///! +///! ## Batch Sizing +///! +///! The ADCs collect a group of N samples, which is referred to as a batch. The size of the batch +///! is configured by the user at compile-time to allow for a custom-tailored implementation. Larger +///! batch sizes generally provide for lower overhead and more processing time per sample, but come +///! at the expense of increased input -> output latency. +///! +///! +///! # Note +///! +///! While there are two ADCs, only a single ADC is configured to generate transfer-complete +///! interrupts. This is done because it is assumed that the ADCs will always be sampled +///! simultaneously. If only a single ADC is used, it must always be ADC0, as ADC1 will not generate +///! transfer-complete interrupts. +///! +///! There is a very small amount of latency between sampling of ADCs due to bus matrix priority. As +///! such, one of the ADCs will be sampled marginally earlier before the other because the DMA +///! requests are generated simultaneously. This can be avoided by providing a known offset to the +///! sample DMA requests, which can be completed by setting e.g. ADC0's comparison to a counter +///! value of 0 and ADC1's comparison to a counter value of 1. +///! +///! In this implementation, single buffer mode DMA transfers are used because the SPI RX FIFO can +///! be used as a means to both detect and buffer ADC samples during the buffer swap-over. Because +///! of this, double-buffered mode does not offer any advantages over single-buffered mode (unless +///! double-buffered mode offers less overhead due to the DMA disable/enable procedure). use super::{ hal, timers, DMAReq, DmaConfig, MemoryToPeripheral, PeripheralToMemory, Priority, TargetAddress, Transfer, SAMPLE_BUFFER_SIZE, @@ -191,7 +246,7 @@ macro_rules! adc_input { // Generate DMA events when an output compare of the timer hits the specified // value. trigger_channel.listen_dma(); - trigger_channel.to_output_compare(2); + trigger_channel.to_output_compare(2 + $index); // The trigger stream constantly writes to the SPI CR1 using a static word // (which is a static value to enable the SPI transfer). Thus, neither the diff --git a/src/dac.rs b/src/dac.rs index 8c93f74..a22e5da 100644 --- a/src/dac.rs +++ b/src/dac.rs @@ -1,8 +1,55 @@ ///! Stabilizer DAC management interface ///! -///! The Stabilizer DAC utilize a DMA channel to generate output updates. A timer channel is -///! configured to generate a DMA write into the SPI TXFIFO, which initiates a SPI transfer and -///! results in DAC update for both channels. +///! # Design +///! +///! Stabilizer DACs are connected to the MCU via a simplex, SPI-compatible interface. Each DAC +///! accepts a 16-bit output code. +///! +///! In order to maximize CPU processing time, the DAC code updates are offloaded to hardware using +///! a timer compare channel, DMA stream, and the DAC SPI interface. +///! +///! The timer comparison channel is configured to generate a DMA request whenever the comparison +///! occurs. Thus, whenever a comparison happens, a single DAC code can be written to the output. By +///! configuring a DMA stream for a number of successive DAC codes, hardware can regularly update +///! the DAC without requiring the CPU. +///! +///! In order to ensure alignment between the ADC sample batches and DAC output code batches, a DAC +///! output batch is always exactly 3 batches after the ADC batch that generated it. +///! +///! The DMA transfer for the DAC output codes utilizes a double-buffer mode to avoid losing any +///! transfer events generated by the timer (for example, when 2 update cycles occur before the DMA +///! transfer completion is handled). In this mode, by the time DMA swaps buffers, there is always a valid buffer in the +///! "next-transfer" double-buffer location for the DMA transfer. Once a transfer completes, +///! software then has exactly one batch duration to fill the next buffer before its +///! transfer begins. If software does not meet this deadline, old data will be repeatedly generated +///! on the output and output will be shifted by one batch. +///! +///! ## Multiple Samples to Single DAC Codes +///! +///! For some applications, it may be desirable to generate a single DAC code from multiple ADC +///! samples. In order to maintain timing characteristics between ADC samples and DAC code outputs, +///! applications are required to generate one DAC code for each ADC sample. To accomodate mapping +///! multiple inputs to a single output, the output code can be repeated a number of times in the +///! output buffer corresponding with the number of input samples that were used to generate it. +///! +///! +///! # Note +///! +///! There is a very small amount of latency between updating the two DACs due to bus matrix +///! priority. As such, one of the DACs will be updated marginally earlier before the other because +///! the DMA requests are generated simultaneously. This can be avoided by providing a known offset +///! to other DMA requests, which can be completed by setting e.g. DAC0's comparison to a +///! counter value of 2 and DAC1's comparison to a counter value of 3. This will have the effect of +///! generating the DAC updates with a known latency of 1 timer tick to each other and prevent the +///! DMAs from racing for the bus. As implemented, the DMA channels utilize natural priority of the +///! DMA channels to arbitrate which transfer occurs first. +///! +///! +///! # Limitations +///! +///! While double-buffered mode is used for DMA to avoid lost DAC-update events, there is no check +///! for re-use of a previously provided DAC output buffer. It is assumed that the DMA request is +///! served promptly after the transfer completes. use super::{ hal, timers, DMAReq, DmaConfig, MemoryToPeripheral, TargetAddress, Transfer, SAMPLE_BUFFER_SIZE, @@ -13,8 +60,8 @@ use super::{ // processed). Note that the contents of AXI SRAM is uninitialized, so the buffer contents on // startup are undefined. The dimensions are `ADC_BUF[adc_index][ping_pong_index][sample_index]`. #[link_section = ".axisram.buffers"] -static mut DAC_BUF: [[[u16; SAMPLE_BUFFER_SIZE]; 2]; 2] = - [[[0; SAMPLE_BUFFER_SIZE]; 2]; 2]; +static mut DAC_BUF: [[[u16; SAMPLE_BUFFER_SIZE]; 3]; 2] = + [[[0; SAMPLE_BUFFER_SIZE]; 3]; 2]; macro_rules! dac_output { ($name:ident, $index:literal, $data_stream:ident, @@ -32,6 +79,16 @@ macro_rules! dac_output { ) -> Self { Self { _channel, spi } } + + /// Start the SPI and begin operating in a DMA-driven transfer mode. + pub fn start_dma(&mut self) { + // Allow the SPI FIFOs to operate using only DMA data channels. + self.spi.enable_dma_tx(); + + // Enable SPI and start it in infinite transaction mode. + self.spi.inner().cr1.modify(|_, w| w.spe().set_bit()); + self.spi.inner().cr1.modify(|_, w| w.cstart().started()); + } } // Note(unsafe): This is safe because the DMA request line is logically owned by this module. @@ -60,7 +117,6 @@ macro_rules! dac_output { MemoryToPeripheral, &'static mut [u16; SAMPLE_BUFFER_SIZE], >, - first_transfer: bool, } impl $name { @@ -78,11 +134,12 @@ macro_rules! dac_output { // Generate DMA events when an output compare of the timer hitting zero (timer roll over) // occurs. trigger_channel.listen_dma(); - trigger_channel.to_output_compare(0); + trigger_channel.to_output_compare(4 + $index); // The stream constantly writes to the TX FIFO to write new update codes. let trigger_config = DmaConfig::default() .memory_increment(true) + .double_buffer(true) .peripheral_increment(false); // Listen for any potential SPI error signals, which may indicate that we are not generating @@ -90,64 +147,53 @@ macro_rules! dac_output { let mut spi = spi.disable(); spi.listen(hal::spi::Event::Error); - // Allow the SPI FIFOs to operate using only DMA data channels. - spi.enable_dma_tx(); - - // Enable SPI and start it in infinite transaction mode. - spi.inner().cr1.modify(|_, w| w.spe().set_bit()); - spi.inner().cr1.modify(|_, w| w.cstart().started()); + // AXISRAM is uninitialized. As such, we manually zero-initialize it here before + // starting the transfer. + // Note(unsafe): We currently own all DAC_BUF[index] buffers and are not using them + // elsewhere, so it is safe to access them here. + for buf in unsafe { DAC_BUF[$index].iter_mut() } { + for byte in buf.iter_mut() { + *byte = 0; + } + } // Construct the trigger stream to write from memory to the peripheral. - let transfer: Transfer<_, _, MemoryToPeripheral, _> = + let mut transfer: Transfer<_, _, MemoryToPeripheral, _> = Transfer::init( stream, $spi::new(trigger_channel, spi), // Note(unsafe): This buffer is only used once and provided for the DMA transfer. unsafe { &mut DAC_BUF[$index][0] }, - None, + // Note(unsafe): This buffer is only used once and provided for the DMA transfer. + unsafe { Some(&mut DAC_BUF[$index][1]) }, trigger_config, ); + transfer.start(|spi| spi.start_dma()); + Self { transfer, // Note(unsafe): This buffer is only used once and provided for the next DMA transfer. - next_buffer: unsafe { Some(&mut DAC_BUF[$index][1]) }, - first_transfer: true, + next_buffer: unsafe { Some(&mut DAC_BUF[$index][2]) }, } } /// Acquire the next output buffer to populate it with DAC codes. - pub fn acquire_buffer( - &mut self, - ) -> &'static mut [u16; SAMPLE_BUFFER_SIZE] { - self.next_buffer.take().unwrap() - } + pub fn acquire_buffer(&mut self) -> &mut [u16; SAMPLE_BUFFER_SIZE] { + // Note: If a device hangs up, check that this conditional is passing correctly, as + // there is no time-out checks here in the interest of execution speed. + while !self.transfer.get_transfer_complete_flag() {} - /// Enqueue the next buffer for transmission to the DAC. - /// - /// # Args - /// * `data` - The next data to write to the DAC. - pub fn release_buffer( - &mut self, - next_buffer: &'static mut [u16; SAMPLE_BUFFER_SIZE], - ) { - // If the last transfer was not complete, we didn't write all our previous DAC codes. - // Wait for all the DAC codes to get written as well. - if self.first_transfer { - self.first_transfer = false - } else { - // Note: If a device hangs up, check that this conditional is passing correctly, as - // there is no time-out checks here in the interest of execution speed. - while !self.transfer.get_transfer_complete_flag() {} - } + let next_buffer = self.next_buffer.take().unwrap(); // Start the next transfer. - self.transfer.clear_interrupts(); let (prev_buffer, _, _) = self.transfer.next_transfer(next_buffer).unwrap(); // .unwrap_none() https://github.com/rust-lang/rust/issues/62633 self.next_buffer.replace(prev_buffer); + + self.next_buffer.as_mut().unwrap() } } }; diff --git a/src/main.rs b/src/main.rs index 5732474..eaf3979 100644 --- a/src/main.rs +++ b/src/main.rs @@ -977,6 +977,22 @@ const APP: () = { } } + /// Main DSP processing routine for Stabilizer. + /// + /// # Note + /// Processing time for the DSP application code is bounded by the following constraints: + /// + /// DSP application code starts after the ADC has generated a batch of samples and must be + /// completed by the time the next batch of ADC samples has been acquired (plus the FIFO buffer + /// time). If this constraint is not met, firmware will panic due to an ADC input overrun. + /// + /// The DSP application code must also fill out the next DAC output buffer in time such that the + /// DAC can switch to it when it has completed the current buffer. If this constraint is not met + /// it's possible that old DAC codes will be generated on the output and the output samples will + /// be delayed by 1 batch. + /// + /// Because the ADC and DAC operate at the same rate, these two constraints actually implement + /// the same time bounds, meeting one also means the other is also met. #[task(binds=DMA1_STR4, resources=[pounder_stamper, adcs, dacs, iir_state, iir_ch, dds_output, input_stamper, timestamp_handler, iir_lockin, iir_state_lockin], priority=2)] fn process(c: process::Context) { if let Some(stamper) = c.resources.pounder_stamper { @@ -988,6 +1004,7 @@ const APP: () = { c.resources.adcs.0.acquire_buffer(), c.resources.adcs.1.acquire_buffer(), ]; + let dac_samples = [ c.resources.dacs.0.acquire_buffer(), c.resources.dacs.1.acquire_buffer(), @@ -1053,9 +1070,6 @@ const APP: () = { builder.write_profile(); } - - c.resources.dacs.0.release_buffer(dac0); - c.resources.dacs.1.release_buffer(dac1); } #[idle(resources=[net_interface, pounder, mac_addr, eth_mac, iir_state, iir_ch, afes])] diff --git a/src/pounder/dds_output.rs b/src/pounder/dds_output.rs index 418eb65..36399f2 100644 --- a/src/pounder/dds_output.rs +++ b/src/pounder/dds_output.rs @@ -1,4 +1,57 @@ ///! The DdsOutput is used as an output stream to the pounder DDS. +///! +///! # Design +///! +///! The DDS stream interface is a means of quickly updating pounder DDS (direct digital synthesis) +///! outputs of the AD9959 DDS chip. The DDS communicates via a quad-SPI interface and a single +///! IO-update output pin. +///! +///! In order to update the DDS interface, the frequency tuning word, amplitude control word, and +///! the phase offset word for a channel can be modified to change the frequency, amplitude, or +///! phase on any of the 4 available output channels. Changes do not propagate to DDS outputs until +///! the IO-update pin is toggled high to activate the new configurations. This allows multiple +///! channels or parameters to be updated and then effects can take place simultaneously. +///! +///! In this implementation, the phase, frequency, or amplitude can be updated for any single +///! collection of outputs simultaneously. This is done by serializing the register writes to the +///! DDS into a single buffer of data and then writing the data over QSPI to the DDS. +///! +///! In order to minimize software overhead, data is written directly into the QSPI output FIFO. In +///! order to accomplish this most efficiently, serialized data is written as 32-bit words to +///! minimize the number of bus cycles necessary to write to the peripheral FIFO. A consequence of +///! this is that additional unneeded register writes may be appended to align a transfer to 32-bit +///! word sizes. +///! +///! In order to pulse the IO-update signal, the high-resolution timer output is used. The timer is +///! configured to assert the IO-update signal after a predefined delay and then de-assert the +///! signal after a predefined assertion duration. This allows for the actual QSPI transfer and +///! IO-update toggle to be completed asynchronously to the rest of software processing - that is, +///! software can schedule the DDS updates and then continue data processing. DDS updates then take +///! place in the future when the IO-update is toggled by hardware. +///! +///! +///! # Limitations +///! +///! The QSPI output FIFO is used as an intermediate buffer for holding pending QSPI writes. Because +///! of this, the implementation only supports up to 16 serialized bytes (the QSPI FIFO is 4 32-bit +///! words wide) in a single update. +///! +///! There is currently no synchronization between completion of the QSPI data write and the +///! IO-update signal. It is currently assumed that the QSPI transfer will always complete within a +///! predefined delay (the pre-programmed IO-update timer delay). +///! +///! +///! # Future Improvement +///! +///! In the future, it would be possible to utilize a DMA transfer to complete the QSPI transfer. +///! Once the QSPI transfer completed, this could trigger the IO-update timer to start to +///! asynchronously complete IO-update automatically. This would allow for arbitrary profile sizes +///! and ensure that IO-update was in-sync with the QSPI transfer. +///! +///! Currently, serialization is performed on each processing cycle. If there is a +///! compile-time-known register update sequence needed for the application, the serialization +///! process can be done once and then register values can be written into a pre-computed serialized +///! buffer to avoid the software overhead of much of the serialization process. use super::QspiInterface; use crate::hrtimer::HighResTimerE; use ad9959::{Channel, DdsConfig, ProfileSerializer};