Intel Pentium 4 2.2 GHz and AthlonXP 2000+ in the test: The battle of the titans

Intel Pentium 4 2.2 GHz and AthlonXP 2000+ in the test: The battle of the titans

Advanced Dynamic Execution

In order to avoid false predictions when processing the next instructions, Intel has added some advanced features to the 20-stage pipeline. All of these improvements have been brought together in the 'Advanced Dynamic Execution' engine. Its purpose is to prevent the pipeline from getting wrong due to missing instructions or from the processorpredicted branches of the programs would stall, which would slow down the Pentium 4 enormously. Compared to the past, Intel has improved the branch prediction algorithm and increased the branch target buffer. The processed branch commands are stored in the 'Branch Target Buffer' and can thus be controlled again later by the processor. The buffer was increased to 4 KB on the Pentium 4. For comparison, the Pentium III only had 512 bytes of 'Branch Target Buffer'. Due to the longer pipeline, an increase in the buffer was inevitable. The forecast is said to have been improved by a third compared to the Pentium III. Of course, these values ​​cannot be measured for us, so we are dependent on the information from Intel and should perhaps be viewed with a little caution. All in all, the Pentium 4 should make a correct prediction with a 95% probability.

Trace Cache

Some of you will probably have wondered about the information in the L1 cache when we looked at the processors. Now we want to shed some light on this as well.

Unlike its predecessors and the CPUs of the competition, the Pentium 4 does not rely on a shared L1 cache, which stores both the data and the instructions, but in the L1 -Cache only stores the data, but no longer the instructions. Only the trace cache is now available for the commands.

In order to understand the trace cache, one must first be familiar with how a processor works. Up to now it was common for the commands to be stored in the area of ​​the L1 cache provided for them. But today's processors no longer calculate directly with the old x86 commands, but still accept them in order to ensure software compatibility. These commands must first be decoded laterare saved, but only the x86 commands remain. And this is exactly where the trace cache comes in. It no longer saves the x86 commands, but rather the decoded (so-called) micro-operations (micro-OPs). Of course, this has an enormous advantage if the same commands are used over and over again. Since the decoded MikroOps were not previously saved, only the x86 commands, they had to be decoded again and again in MikroOPs if they were to be used again. The Pentium 4 can use the already decoded x86 commands, i.e. the micro-OPs, and thus saves the time-consuming process of decoding again.

Unfortunately, Intel no longer specifies the size of the trace cache in Kbytes, but only in the maximum number of micro-OPs that can be stored, which is actually the more sensible but difficult method to compare. With the Pentium 4, 12,000 microOPs can be stored in the trace cache, which were previously decoded.

Rapid Execution Engine

Intel also works with the ALUs (Arithmetic Logic Units) to calculate simple integers a new way. The Pentium has a total of three ALU units, but only one of them is a conventional one. The other two work with twice the CPU speed. With a Pentium 4 with 2.2 GHz, two of the three ALU units work with 4.4 GHz. This double computing cycle naturally has two significant improvements: On the one hand, the time of a computation itself and, on the other hand, the ALU latency is significantly reduced. As a result, the acceptance time of the new commands is already below the 0.4ns limit for a Pentium 4 with 1.4 GHz. A Pentium III with 1 GHz still needs 1ns for this.

The Pentium 4 should clearly set itself apart from its competitors in the integer calculation if the innovations that are so easy to read on paper all work that way how they like itshould. Of course, the two ALUs running at twice the CPU clock rate have no effect on the FPU performance (MMX, SSE) of the Pentium 4, which brings us to the next topic: SSE2.

On the next page : SSE2