Microprocessor Design/Multiply and Divide Blocks
Multiply and Divide Problems
[edit | edit source]Multiplication and Division operations are significantly more complicated than addition or subtraction operations. This additional complexity leads to more hardware, more complicated hardware, and longer processing time.
In hardware, multiplication and division are performed by a series of sequential additions and arithmetic shifts. for this reason, it is imperative that we have efficient adders and shifters at our disposal.
Multipliers and dividers are composed of shifters and adders. It is typically not possible, or not desirable to use the main adder and shifter units of the ALU, so a microprocessor will typically have multiple ALU units (a primary unit for addition and subtraction, and units embedded in the multiplication and division units). These are other good reasons why our ALU and shifters need to be small and fast.
Multiplication Algorithms
[edit | edit source]Booth's Algorithm
[edit | edit source]Cascaded Multiplication
[edit | edit source]Wallace tree
[edit | edit source]The Wallace tree, a specialized structure for performing multiplication, has been called one of the most important advances in computing.[1]
A Wallace tree using many identical 3:2 compressors (aka full adders), such as the TI 74x275 chip, or the TI 74x183 chip, is one popular way to implement single-cycle multiplication. The datasheets for the TI 74x261 and 74x284 describe some practical details of implementing multiplication with a Wallace tree. The Dadda multiplier uses the same 3:2 compressors in a slightly more efficient arrangement.
Division Algorithm
[edit | edit source]Multiply and Accumulate
[edit | edit source]Multiply and accumulate (MAC) operations perform a multiplication and an addition in a single instruction. For instance, the instruction:
MAC A, B, C
Would perform the operation:
A = A + (B × C)
This is valuable for math-intensive processors, such as graphics processors and DSPs.
An MAC tends to have a long critical path, so if your processor has an MAC operation it is probably possible to include other complicated arithmetic operations.
In a processor with an accumulator architecture, MAC operations will use the accumulator as the destination register, so the instruction:
MAC B, C
Will perform the operation:
ACC = ACC + (B × C)
Fused Multiply-Add
[edit | edit source]A fused multiply-add operation is a floating-point operation that is similar to the MAC. However, in the fused operation, the floating-point values are not rounded between the multiply and the add, they are rounded afterwards. For more information about floating-point rounding, see Floating Point.
- ↑ DTACK Grounded, The Journal of Simple 68000/16081 Systems Issue # 29 - March 1984 p. 6.