18. DMA Use Policy

DMA is an intelligent part of most modern microcontrollers that permits transferring of data between peripherals and RAM (or between blocks of RAM), without using CPU time to do so, thus permitting the CPU to get on with other things. Such operations can be time-consuming, and so DMA effectively provides another thread of execution, whose entire purpose is to speed up execution.

Whether DMA is used is entirely dependant on the system architect, and the efficiency needs of the system.

When DMA is going to be implemented, the programmer should first get a data transfer going with CPU-driven transfer with known data before implementing DMA. Reason: DMA of PIC MCU’s has undocumented (or poorly-documented) buffer alignment constraints, and unexpected (and sometimes undocumented) relationships with certain peripherals that cannot be debugged unless you already know what the resulting data in the data transfer is supposed to contain.

Example: PMP (Parallel Master Port) peripheral first read from PMDIN reads junk because CPU-driven PMP reads implements a 1-element FIFO for reads. However, when using DMA, you start a PMP DMA transfer with a CFORCE = 1 and the DMA either knows to discard the first datum read, or has a way around the 1-element FIFO! If you start the block transfer with a PMDIN read, the first valid read gets lost! Apparently, DMA is discarding it.

Thus, KNOWN data in the transfer, like “Now is the time for all good men” helped isolate this problem, when instead what was transferred looked like this at the destination: “ow is the time for all good men”!

Example: Hidden booby trap: Implementing PMP DMA, I expected already-tested code from the PIC32MZ to work just fine. In fact, this assumption was not a bad one. The one difference, unsuspected because CPU-driven PMP doesn’t care: if the PMPMODEbits.MODE16 == 1, then CPU-driven PMP transfers 16 bits, but we only use the lower 8 bytes, so there is no behavioral difference (this is in a situation where the PMP module is in an environment with mixed bus widths, so we have been just leaving it set to 16 bits). However, we NOW have a NAND chip with an 8-bit bus whereas it had a 16-bit bus before when it was running with the PIC32MZ. And surprise surprise: with MODE16 == 1, but DMA cell size == 1 (meaning 1 byte), it WILL NOT keep going and go past a single cell transfer, despite the IRQ being set up correctly to trigger re-starting the cell transfer. What made it work was setting MODE16 = 0, then the whole block transfer (2112 reads — the size of the NAND chip’s page size) completed successfully.

Example: A uint32_t array used for NAND page buffer is aligned on 4 bytes. DMA documentation indicates there are, literally, NO alignment constraints — that when the cell transfer size is 1, that the DMA operates just fine with only byte alignment. To my dismay, this is covered IN DETAIL in a whole section about DMA memory alignment. However, a buffer whose physical start address was 0x0000CAAC worked fine with CPU-driven 1-byte transfers. The KNOWN DATA was:

0x000000DB, 0x000000DC, 0x000000DD, 0x000000DE, 0x000000DF, etc.

But the DMA transfer gave us:

0xDB000000, 0xDC000000, 0xDD000000, 0xDE000000, 0xDF000000, etc.

See the difference?

When the buffer was aligned on 16 bytes (with new address 0x0000CAB0), then we got

0x000000DB, 0x000000DC, 0x000000DD, 0x000000DE, 0x000000DF, etc.

as expected.