
![]()
The C++ version in Example 14-15(c) uses a __declspec(align(16)) directive before each variable to make certain that they are aligned properly in the memory. If these are missing, the program will not function because the SSE memory variables must be aligned on at least quadword boundaries (16). This final version executes at about 4-½ times faster then Example 14-15(b)
EXAMPLE 14-15(c)
void FindXC()
{
//floating-point example using C++ with the inline assembler
__declspec(align(16)) float f[4] = {-300,-200,-100,0};
__declspec(align(16)) float pi[4];
__declspec(align(16)) float caps[4] = {1.0E-6, 1.0E-6, 1.0E-6, 1.0E-6};
__declspec(align(16)) float incr[4] = {400, 400, 400, 400};
__declspec(align(16)) float Xc[400];
_asm
{
fldpi ;form 2 pi
fadd st,st(0)
fst pi
fst pi+4
fst pi+8
fstp pi+12
movaps xmm0,oword ptr pi
movaps xmm1,oword ptr incr
movaps xmm3,oword ptr f
mulps xmm0,oword ptr caps ;2 pi C
mov ecx,0
LOOP1:
movaps xmm2,xmm3
addps xmm2,xmm1
movaps xmm3,xmm2
mulps xmm2,xmm0
rcpps xmm2,xmm2 ;recipocal
movaps oword ptr Xc[ecx],xmm2
add ecx,16
cmp ecx,400
jnz LOOP1
}
}
Converting from BCD to 7-Segment Code. One simple application that uses a lookup table is BCD to 7-segment code conversion. Example 8-26 illustrates a lookup table that contains the 7-segment codes for the numbers 0 to 9. These codes are used with the 7-segment display pic¬tured in Figure 8-5. This 7-segment display uses active high (logic 1) inputs to light a segment. The lookup table code (array temp1) is arranged so that the a segment is in bit position 0 and the g segment is in bit posi¬tion 6. Bit position 7 is 0 in this example, but it can be used for displaying a decimal point, if required.
Example 8-26
unsigned char CasciiDlg::LookUp(unsigned char temp)
{
char temp1[] = {0x3f, 6, 0x5b, 0x4f, 0x66, 0x6d, 0x7d, 7, 0x7f, 0x6f};
_asm
{
lea ebx,temp1
mov al,temp
xlat
mov temp,al
}
return temp;
}

64-bit Extension Technology At the time of this writing, Intel has announced its 64-bit extension technology for the Intel 32-bit architecture family, but has yet to announce the release of a microprocessor that supports it. The instruction set and architecture is backwards compatible to the 8086, which means that the instructions and register set has remained compatible. What is changed is that the register set is stretched to 64-bits in width in place of the current 32-bit wide registers. Refer to Figure 19-10 for the programming model of the Pentium 4 in 64-bit mode.

![]()