Intel Pentium 4 2.2 GHz and AthlonXP 2000+ in the test: The battle of the titans

Intel Pentium 4 2.2 GHz and AthlonXP 2000+ in the test: The battle of the titans


NetBurst Architecture

First we turn to the generic term for Intel's new technology of the Pentium 4: The NetBurst architecture.


To meet the demands of the rapidly growing processor market, also in terms of thingsIn order to have increased speed, it is customary at Intel to develop a completely new processor architecture every 3-5 years. The architecture last introduced by Intel was the P6 micro architecture, which was first used in 1995 with the Pentium Pro. But just above the 1 GHz limit, this architecture reached its limits and Intel was forced to develop a new one, which they did with the NetBurst architecture. The following block diagram and the DIE view give a brief overview of the individual building blocks of the new architecture, which we will discuss in more detail below.

Pentium 4 Design
Pentium 4 architecture

But first a little insight into the identifications made by Intel on the Pentium 4. They give information about the processor clock, the cache Size, the front-side bus, the core voltage and the production country. But everything else clarifies the following graphic:

Pentium 4 processor ID

Hyper Pipelined Technology

The new pipeline of the Pentium 4 is certainly the most fundamental and, at the same time, most far-reaching change compared to its predecessors in many respects. With 20 stages, the pipeline is twice as long as that of the Pentium III. The extended pipeline has the enormous advantage that it can handle a large number of tasks at once and it is only through it that the high clock rates of the Pentium 4 are possible in the first place. If the Pentium III with its 10-stage pipeline was already finished with one gigahertz in 0.18 µm technology, it startedPentium 4 only at 1.4 GHz, although it was still manufactured in 0.18 µm at the time.

But first we should go into more detail about the 20-stage pipeline.

Pentium 4 Pipeline

There most Probably can't do much with the term “pipeline” and what it means, let's first look at the basics of a series pipeline and a staged pipeline. To make things easier, we are assuming five individual steps that must be taken to fully complete an operation. The following five points are important here in simplified form:

  • Prefetch: The next instruction is loaded from the cache.
  • Decode 1: The x86 instructions are in microOPs decoded
  • Decode 2: If further data are required for the calculation, these are now loaded
  • Execute: The instruction is executed.
  • Wirteback: The result is stored in the cache or system memory.

A CPU without a pipeline processes these steps one by one and does not start again until an instruction has been completely executed. Therefore, in our example, four of the five departments are always in inactive mode and only one area is working. This is exactly what is prevented in a CPU with a pipeline. All five sections always work here, so that unnecessary inactivity of individual areas is excluded. As soon as the first instruction has left an area, the next is loaded immediately. So there are always five instructions being processed at the same time.

With a 20-stage pipeline, such as the Pentium 4 has, 20 instructions are being processed at the same time and all areas are constantly busy. Therefore, there must be no one in any areaThere will be a delay, otherwise all other areas will have to wait and work will be slowed down considerably. But we will come back to that later.

The higher clock rates of the Pentium 4 are mainly achieved by that the individual steps of each command have been divided into smaller parts, which can be processed individually faster than the complete command.

Of course, this also has its disadvantages with a 20-step pipeline. A command now has to go through more partial areas than before in order to be finally processed, and this increases the total time in which an operation is completed. This is exactly the reason why a Pentium 4 with 1 GHz would be slower than a Pentium III with 1 GHz with software that is not specially designed for the Pentium 4! Thanks to its 10-stage pipeline, the Pentium III can process the individual operations faster than the Pentium 4 with the same clock frequency. On the other hand, the Pentium III cannot achieve the clock rates of a Pentium 4 by a long way due to its only 10-stage pipeline. In order for the Pentium 4 not to be slower than its Pentium III counterpart at its debut, Intel probably also started it with 1.6 GHz first.

This point is extremely important for further understanding of the results because it also explains why a Pentium 4 is not faster than an Athlon XP with the same clock rate, which uses a mix of 10 and 15 stage pipelines. That is why we always find it a bit questionable when a Pentium 4 is compared with an Athlon XP with the same clock rate. You should know in advance that in this case the Athlon XP should have the edge in most applications simply because of its architecture. But it must also be clear to everyone that an Athlon XP with its current architecture cannot achieve the clock rates of a Pentium 4.

Intel and AMD are currently going two different ways. Because of its architecture, Intel has to set the clock rate significantly higher for its Pentium 4 in order to be able to process the same number of operations in the same time as an Athlon XP with a lower clock rate. AMD relies on more performance per clock and Intel on more performance through more clock.

On the next page: Advanced Dynamic Execution