Preview of Matrox Parhelia: DirectX 9 not quite fulfilled
Parhelia in detail
And now we want to get closer to the Poodle's core concern, the internal structure of the Parhelia. Without big words we let the numbers affect us first ...
- 350MHz chip clock
- ~ 300MHz DDR RAM clock
- 0, 15µ manufacturing process
- ~ 80 million transistors
- 256Bit DDR memory interface
- ~ 19.2GB/s memory transfer rate
- 1.4GPixel/s Pixel fill rate
- 5,6GTexel/s Texel fill rate
- 4 pipelines with 4 TMUs (Texture Mapping Unit) each
- 8xAGP support (downward compatible)
- 4 Vertexshader 2.0
- Pixelshader 1.3 (!)
- Dot3 & Environment Mapped Bump-Mapping Support
- Hard-wired DX8 vertex shader for matrix blending and skinning
- 40Bit internal rendering accuracy (10Bit each for R, G, B and Alpha)
- 2 400MHz UltraSharp (TM) RAMDACs
- DVD & HDTV support with 10Bit precision
- max. Resolution 2048x1536 @ 85Hz
- 2 internal TDMS transmitters with up to 1920x1024x32Bit each
- 2 completely independent monitor controllers
- Triple Head with up to 3840x1024x32Bit for Surround Gaming
- Adaptive fragment supersampling FSAA (up to 16x)
- n-patch support with adaptive tessellation (LOD adjustment) and displacement mapping
- GlyphAntialiasing for font sharpening
- DirectX 8 and OpenGL 1.3 support with the option of some DirectX9 features
On the one hand, the obvious lack of pixel shader 2.0 is interesting. Why this is so will certainly remain Matrox's secret for the time being, maybe they just wanted to quickly announce the chip to at least have a head start on the competition at the paper launch or they are planning an improved version of the Parhelia until DirectX-9 has prevailed .
It is possible that the final specifications for the Pixelshader 2.0 were not supplied by Microsoft during development either, since DirectX9 has already been a little delayed one or the other time.
Furthermore, it amazed us that such a silicon monster with 80 million transistors with 350MHz target work rate should still be manufactured in 0.15µ. nVidia seems to be operating close to the limit with 63 million transistors at 300MHz, not to mention the costs due to the huge chip area.
Pixel and vertex shaders
As already mentioned in the specifications , the Parhelia will have no less than four vertex shaders, supported by a DX8 blending and skinning unit, probably a holdover from the G550 to remain software-compatible with its own head-casting engine.
Important for DirectX 9 are above all the programmability of the vertex shaders with programs that can consist of up to 256 instructions in a high-level language similar to C. Matrox does not show any nakedness here.
After The normal rendering unit with 4 TMUs each comes with the pixel shaders, which must be present once per pipeline. What is interesting, however, is the concept of being able to combine these from 5 4-step to 2 10-step if required. This approach is known from early chips with single-pass multitexturing, which could also combine their pipelines in order to enable multitexturing. The problem that also occurs here is that more complex pixel shader programs with more than 5 instructions are effectively only processed with half the performance.
To what extent this will have a negative effect, i.e. how much the developers will rely on these longer pixel shader programs, cannot of course be said with any certainty at the moment.
It is certain However, that Matrox has neither built in the Pixelshader 2.0 required for DirectX 9 nor the long-known Pixelshader 1.4, which is already included in the ATi Radeon8500 and which can hardly hope for further distribution. What is mainly missing is the ability to use dependent texture accesses and more than 4 texture calls per rendering pass. We have already speculated a bit about why Matrox decided so, but we definitely cannot say anything about it yet, except for Matrox, for the time being, only sees the Parhelia as a feasibility study for the following, fully DirectX 9 compatible product.
Here again the memory interface in direct comparison to those of the current competition.
Here tooIt is clear that Matrox relies on brute force, because no bandwidth-saving measures, such as ATis Hyper-Z or nVidia's LMA, can be identified from the feature list and the block diagram. Of course, it remains to be seen whether Matrox has put a cuckoo's egg in its nest despite the immense raw power, but given the efficiency of the tricks from ATi and (at least in this case) nVidia in particular, that doesn't seem to have been a very smart decision/p>
Let's remember that you win simply by splitting the 128-bit DDR memory controller into four small 32-bit controllers with nVidia partly very clearly added in performance. This leads to the conclusion that there is some truth in the claim that most of the bandwidth is wasted by underutilized memory accesses, and with an interface twice as wide this wasted percentage would certainly not decrease, rather the opposite.
In the light of the 80 million transistors that seem almost spartan for a GPU of this complexity, it is also doubtful that Matrox will be able to compensate for this deficiency with large chip-internal caches, for which there would hardly be enough space. Update: Contrary to our original claim, one single point indicates the existence of an efficient storage controller. This can be found in the block diagram in the green bar above the quadruple vertex shader and is referred to as the '8 way parallel DMA engine'. Let's hope for Matrox's sake that our first judgment was a little too hasty!
On the next page: AF and FSAA