11.5 Adder
A part of the DSP module is a 40-bit adder which may save the result in one of the two 40-bit accumulators. The activation of the saturation logic is optional. The adder is required for accumulation of the partial sums. Adding or subtracting of the partial sums is performed automatically as a part of DSP instructions, no additional code is required which allows extermely short time for signal processing.
An example which illustrates the significance of a hardware, independent adder is the previous example of the MAC instruction.
Example:
MAC W4*W6, A, [W8]+=2, W4, [W10]+=2, W6
The instruction of this example:
- Multiplies the values from the registers W4 and W6,
- The result of the multiplication is added to the value in the accumulator,
- From the address in the X space pointed by the register W8 the value of the next array element is loaded to the register W4,
- After reading the array element in the X space, the register W8 is incremented to point at the next array element in the X space,
- From the address in the Y space pointed by the register W8 the value of the next array element is loaded to the register W6,
- After reading the array element in the Y space, the register W10 is incremented to point at the next array element in the Y space.
It is important that this DSP instruction is executed in one instruction cycle! This means that the whole algorithm for calculating the sum of products consists of loading arrays to the memory, adjusting the parameters of the DSP module (format, positions of the arrays, etc) and then the above instruction is called the corresponding number of times.
Example:
CLR A
REPEAT #20
MAC W4*W6, A, [W8]+=2, W4, [W10]+=2, W6
The result of the execution of the above program is calling the MAC instruction 21 times (REPEAT means that the next instruction will be called 20+1 times). If the first array elements have been loaded to the registers W4 and W6 and the initial addresses in the data memory or extended data memory (PSV management) loaded to the registers W8 and W10 before the execution of the program, then, upon completion of the REPEAT loop, the accumulator will contain the sum of products of the 20 elements of the two arrays.
This section of the code occupies 3 locations in the program memory and includes 22 instruction cycles (1 for MOV, 1 for REPEAT and 20 for MAC). If the device clock is 80MHz (20MHz instruction clock), then the program will be executed in 22*50 = 1100ns!
It should be noted that without an independent adder which may carry out the operations simultaneously with the multiplier and other parts of the DSP module, this would not be possible. Then the parallelism in the execution of instructions would not be possible and the execution of one DSP instruction would last at least one clock longer.
Two 40-bit accumulators for saving the partial sums are avialable. These are accumulator A and accumulator B (ACCA and ACCB). The accumulators are mapped in the data memory and occupy 3 memory locations each. The addresses and distribution of the accumulator bits are given at the end of this chapter.