x86 Assembly/MMX
MMX is a supplemental instruction set introduced by Intel in 1996. Most of the new instructions are "single instruction, multiple data" (SIMD), meaning that single instructions work with multiple pieces of data in parallel.
MMX has a few problems, though: instructions run slightly slower than the regular arithmetic instructions, the Floating Point Unit (FPU) can't be used when the MMX registers are in use, and MMX registers use saturation arithmetic.
Saturation Arithmetic
[edit | edit source]In an 8-bit grayscale picture, 255 is the value for pure white, and 0 is the value for pure black. In a regular register (AX, BX, CX ...) if we add one to white, we get black! This is because the regular registers "roll-over" to the next value. MMX registers get around this by a technique called "Saturation Arithmetic". In saturation arithmetic, the value of the register never rolls over to 0 again. This means that in the MMX world, we have the following equations:
255 + 100 = 255 200 + 100 = 255 0 - 100 = 0; 99 - 100 = 0;
This may seem counter-intuitive at first to people who are used to their registers rolling over, but it makes sense in some situations: if we try to make white brighter, it shouldn't become black.
Single Instruction Multiple Data (SIMD) Instructions
[edit | edit source]The MMX registers are 64 bits wide, but can be broken down as follows:
2 32 bit values 4 16 bit values 8 8 bit values
The MMX registers cannot easily be used for 64 bit arithmetic. Let's say that we have 4 bytes loaded in an MMX register: 10, 25, 128, 255. We have them arranged as such:
MM0: | 10 | 25 | 128 | 255 |
And we do the following pseudo code operation:
MM0 + 10
We would get the following result:
MM0: | 10+10 | 25+10 | 128+10 | 255+10 | = | 20 | 35 | 138 | 255 |
Remember that our arithmetic "saturates" in the last box, so the value doesn't go over 255.
Using MMX, we are essentially performing 4 additions in the time it takes to perform 1 addition using the regular registers, using 4 times fewer instructions.
MMX Registers
[edit | edit source]There are 8 64-bit MMX registers. To avoid having to add new registers, they were made to overlap with the FPU stack register. This means that the MMX instructions and the FPU instructions cannot be used simultaneously. MMX registers are addressed directly, and do not need to be accessed by pushing and popping in the same way as the FPU registers.
MM7 MM6 MM5 MM4 MM3 MM2 MM1 MM0
These registers correspond to the same numbered FPU registers on the FPU stack.
Usually when you initiate an assembly block in your code that contains
MMX instructions, the CPU automatically will disallow floating point
instructions. To re-allow FPU operations you must end all MMX code with
emms
.
The following is a program for GNU AS and GCC which copies 8 bytes from one variable to another and prints the result.
Assembler portion
.globl copy_memory8
.type copy_memory8, @function
copy_memory8:
pushl %ebp
mov %esp, %ebp
mov 8(%ebp), %eax
movq (%eax), %mm0
mov 12(%ebp), %eax
movq %mm0, (%eax)
popl %ebp
emms
ret
.size copy_memory8,.-copy_memory8
C portion
#include <stdio.h>
void copy_memory8(void * a, void * b);
int main () {
long long b = 0x0fffffff00000000;
long long c = 0x00000000ffffffff;
printf("%lld == %lld\n", b, c);
copy_memory8(&b, &c);
printf("%lld == %lld\n", b, c);
return 0;
}
MMX Instruction Set
[edit | edit source]Several suffixes are used to indicate what data size the instruction operates on:
- Byte (8 bits)
- Word (16 bits)
- Double word (32 bits)
- Quad word (64 bits)
The signedness of the operation is also signified by the suffix: US for unsigned and S for signed.
For example, PSUBUSB subtracts unsigned bytes, while PSUBSD subtracts signed double words.
MMX defined over 40 new instructions, listed below.
EMMS, MOVD, MOVQ, PACKSSDW, PACKSSWB, PACKUSWB, PADDB, PADDD, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW, PAND, PANDN, PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW, PMADDWD, PMULHW, PMULLW, POR, PSLLD, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLQ, PSRLW, PSUBB, PSUBD, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW, PUNPCKHBW, PUNPCKHDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, PUNPCKLWD, PXOR