Excerpts from

The Intel Microprocessors: 8086/8088, 80186/80188, 80286, 80386, 80486, Pentium, Pentium Pro Processor, Pentium II, Pentium III, and Pentium 4 - Architecture, Programming, and Interfacing, Seventh Edition

© 2005 by Barry B. Brey

SSE Unit (Example 1)

The C++ version in Example 14-15(c) uses a __declspec(align(16)) directive before each variable to make certain that they are aligned properly in the memory. If these are missing, the program will not function because the SSE memory variables must be aligned on at least quadword boundaries (16). This final version executes at about 4-½ times faster then Example 14-15(b)

EXAMPLE 14-15(c)

void FindXC()
{
	//floating-point example using C++ with the inline assembler

	__declspec(align(16)) float f[4] = {-300,-200,-100,0};
	__declspec(align(16)) float pi[4];
	__declspec(align(16)) float caps[4] = {1.0E-6, 1.0E-6, 1.0E-6, 1.0E-6};
	__declspec(align(16)) float incr[4] = {400, 400, 400, 400};
	__declspec(align(16)) float Xc[400];
	_asm
	{
		fldpi						;form 2 pi
		fadd   st,st(0)
		fst    pi
		fst    pi+4
		fst    pi+8
		fstp   pi+12
		movaps xmm0,oword ptr pi
		movaps xmm1,oword ptr incr
		movaps xmm3,oword ptr f
		mulps  xmm0,oword ptr caps			;2 pi C
		mov    ecx,0
LOOP1:
		movaps xmm2,xmm3
		addps  xmm2,xmm1
		movaps xmm3,xmm2
		mulps  xmm2,xmm0
		rcpps  xmm2,xmm2				;recipocal
		movaps oword ptr Xc[ecx],xmm2
		add    ecx,16					
		cmp    ecx,400
		jnz    LOOP1
	}
}

7-segment code (Example 2)

Converting from BCD to 7-Segment Code. One simple application that uses a lookup table is BCD to 7-segment code conversion. Example 8-26 illustrates a lookup table that contains the 7-segment codes for the numbers 0 to 9. These codes are used with the 7-segment display pic¬tured in Figure 8-5. This 7-segment display uses active high (logic 1) inputs to light a segment. The lookup table code (array temp1) is arranged so that the a segment is in bit position 0 and the g segment is in bit posi¬tion 6. Bit position 7 is 0 in this example, but it can be used for displaying a decimal point, if required.

Example 8-26

unsigned char CasciiDlg::LookUp(unsigned char temp)
{
	char temp1[] = {0x3f, 6, 0x5b, 0x4f, 0x66, 0x6d, 0x7d, 7, 0x7f, 0x6f};
	_asm
	{
		lea  ebx,temp1
		mov  al,temp
		xlat
		mov  temp,al
	}
	return temp;
}

EMT-64 Technology (Example 3)

64-bit Extension Technology At the time of this writing, Intel has announced its 64-bit extension technology for the Intel 32-bit architecture family, but has yet to announce the release of a microprocessor that supports it. The instruction set and architecture is backwards compatible to the 8086, which means that the instructions and register set has remained compatible. What is changed is that the register set is stretched to 64-bits in width in place of the current 32-bit wide registers. Refer to Figure 19-10 for the programming model of the Pentium 4 in 64-bit mode.

return to publication list