Menu
Asus V8440 TD and V8460 ultra TD in the test: Two 'Ti' tans among themselves

Asus V8440 TD and V8460 ultra TD in the test: Two 'Ti' tans among themselves

LMA-II

The LMA-II consists of six individual components, each of which can contribute to minimizing excessive bandwidth utilization through texture operations. In the days of the GeForce3, on the other hand, there were only four elements that can also be found in the LMA-II in an 'improved' form. in theThese would be the Cross-Bar Memory Controller, Z-Occlusion Culling and Lossless Z-Compression. New additions are a cache collection called QuadCache, an Auto PreCharge for the local graphics RAM and a Fast Z-Clear. Below are a few words about each component.

Cross-Bar Memory Controller (CBCM): Almost every conventional graphics card (even including the Radeon8500) has a memory controller that supports the performs various necessary memory accesses, usually with 128Bit (DDR). In and of itself, that's a good thing, but there is a catch here. Every time an access is carried out that requires less than 128 bits (DDR) to be transferred, the excess capacity of the controller is simply wasted.

When nV25 (and also with the nV20), on the other hand, uses a quadruple memory controller that works independently with 32-bit (DDR). These units, which can also be interconnected with 128 bits (DDR) if required, enable a much more efficient use of the available memory bandwidth, since the full 128 bits (DDR) are often not required during access. Ideally (accesses with less than 33 bits (DDR) can save up to 75% bandwidth, which is then available for other tasks.

QuadCache: For primitives (basic geometric elements), vertex data, textures and pixels already being processed, each has its own cache, optimized in size and structure. This holds frequently required data that would otherwise have to be laboriously loaded from the graphics memory. The effect is comparable to First and second level caches, without which even a modern processor beyond 1.5 GHz would be slowed down to the performance level of a Pentium133.

According to nVidia, these caches are ideal for thatadapted to the respective requirement profile, which is also necessary because they are integrated in the chip and every unnecessary transistor there is worth real money.

Z-Occlusion Culling: How can you do this in advance recognize that certain pixels, or ideally entire areas, cannot be seen later? The solution for this is actually quite obvious. Since you don't get to see a shambles of overlapping objects on the screen, there must be some value that determines how far a certain object is in the field of view from the viewer. This value, the depth information, is called the Z value and it is, obviously, stored in the Z buffer. If you check before rendering whether a certain pixel is already being covered by another, you can of course save yourself the further calculation, which in turn greatly benefits the available bandwidth.

A sub-point is the application of this procedure to a complete region of the screen content, the so-called occlusion query. However, this must currently still be done via software request, i.e. the respective program must request the graphics chip to check a region. In principle, this method is of course much more efficient, but requires the use of specially designed software.

Lossless Z-Compression: By this, nVidia understands a compression for the depth information to be written (Z-Buffer ), whereby the compression rate should be 4: 1, as with the GeForce3. Unfortunately, we do not have any more detailed information on this feature yet, so we should first critically keep nVidia's claim that the data traffic to and from the Z-buffer could be reduced by a factor of 4 in this way.

Auto PreCharge: This feature is used to charge the RAM after an access to an area of ​​the RAM and beforeto prepare access to another area as quickly as possible to accept new data. The main memory built into the PC also has to wait a certain number of clock cycles, for example, before new data can be sent. The higher the clock frequency of the RAM, the higher the number of these cycles in general. nVidia speaks of up to 10 clock cycles, which can be necessary until one memory area is closed and the next is opened and ready for data transfer. Auto PreCharge is now supposed to activate areas with a forecast calculated in the chip, which in the near future (in the microsecond range) may become the target of a data transfer. These are then 'pre-charged' on suspicion and so the waiting time can be reduced to the 'normal' latency of 2-3 cycles.

Fast Z-Clear: Den Z- We already mentioned Buffer earlier. So that there is no chaos on the screen, the values ​​on it must of course be exact. Remnants of one pixel from the last fully rendered frame can falsify the result and pixel errors would be the result. For this reason, each value in the Z-buffer must be set to zero before each new image to be calculated. In principle, this costs as much bandwidth as any other access. For a single image in 1024x768 in 32Bit, 3.1MB of bandwidth would be lost for this Z-Buffer Reset alone (multiplied by the number of frame rates, assuming 30 frames/s, 95MB/s), which could just as well be used .

Now the Z-buffers are not overwritten individually with zero, but are set to the original value in one go, not unlike pressing the reset button, only that only the area of ​​the graphics memory is deleted , in which the Z-buffer is located.

In this context, it is also interesting that the GeForce3 has another feature, whichalso served to reduce the bandwidth, but in this case to reduce the geometry data.

We are talking about the so-called higher order surfaces. You may have come across many of them in the form of ATI's TruForm. Essentially, nVidia's path to the HOS was about no longer chasing complex curved surfaces using a vast number of triangles over the AGP bus, but only describing them using a complicated mathematical formula. Necessary changes were then made to the formula by changing variables. Unfortunately, this feature remained a phantom throughout its life. It does appear in the official product description for the GeForce3, but since the Detonator driver in version 20.80 it is no longer available and now nVidia seems to have removed this completely from the feature list, although it is still available in hardware via patched drivers.

Accuview FSAA

Accuview is dedicated to the removal of so-called anti-aliasing artifacts. These artifacts mainly arise at the edges of objects and polygons because, for example, an inclined edge can only be represented by a fine staircase with the help of pixels. If you now increase the number of pixels significantly, this podium effect is reduced under normal circumstances. The really annoying element of these artifacts is mainly caused by movement. It can happen that the order of these steps changes very often, almost constantly jumping around, a flicker effect is the result, which is still clearly visible due to the movement even at very high resolutions such as 1600x1200

podium effects

There are now different methodsto combat these aliasing effects, summarized under the term anti-aliasing.

The simplest type of AA is to simply render each individual image in a higher resolution frame buffer than the currently set resolution and the result on End down to the original resolution. So with a resolution of 800x600 virtual bsw. a 1600x1200 pixel framebuffer was set up and then scaled back down to 800x600. The artifact formation would thus be at the level of a much higher resolution. However, since with this method, called supersampling, almost the entire rendering process is carried out in a higher resolution, including all texture access etc., there is an extreme loss of performance.

Since this feature was de facto not usable, nVidia tried to introduce a new procedure in the consumer sector, with anti-aliasing usable and widely used as a must-have feature if the performance was still sufficient. This was marketed under the name “Quincunx” anti-aliasing and was based on the multisampling process and a downstream filter. Multisampling means that the edges are smoothed at least as well as with supersampling, but the textures are not treated with this method, so that in direct comparison to supersampling 'textures' those of the multisampling image do not look quite as sharp and also the flicker effect removed textures are not cleaned up.

The GeForce3 already offered dedicated hardware for this purpose, so that the additional 'samples' could be calculated at full (GPU) performance. Apparently for this purpose, since the GeForce3, the filling rate of the chips is no longer given in gigatexels/s, but in anti-aliased samples per second.

Accuview is also based on this technology, but includes additional onesPossibilities that promise increased image quality and higher performance at the same time.

Sampling positions

The quincunx pattern, named after the arrangement of the five eyes on a cube, is retained, however, together with the additionally calculated samples, it is offset by half a pixel. This skew is beneficial for edge smoothing, since the additionally generated pixels would now be 50% closer to an imaginary skew in both the y and x directions and thus the real position of the pixel, and the anti-aliasing effect increases due to the more exact approximation. This is particularly noticeable in the slightly inclined, almost horizontal lines.

Furthermore, when performing Accuview anti-aliasing, complete access to the frame buffer was saved. However, this does not happen, as was speculated in the rumor mill and our preview, by combining the multisample buffers into a frame buffer in the RAMDAC.

What is certain, however, is that there will be a new FSAA mode, the '4xS' is designated. The 'S' stands for staggered and is intended to ensure increased fidelity in textures by reading out 50% more texture values ​​and including them in the final calculation. As has meanwhile been seen, 2x2 multisampling is combined with 1x2 supersampling, so that the coveted texture smoothing effect is also set.

Furthermore, this time one wants to emphasize the positive effect of anisotropic filtering, i.e. to market it. This is a filtering that, contrary to the conventional bilinear and trilinear filters, does not have the same strength in all directions, but is much more analogous to the viewing angle, e.g. in 3D games, mostly into the depth of the monitor, depending on the degree of filtering setthan only uses the neighboring 4 pixels for filtering.

On the next page: Test system

Comments