ATi Radeon HD 3850 in the test: Another price-performance hit?

ATi Radeon HD 3850 in the test: Another price-performance hit?

Technical data

RadeonHD 3870 RadeonHD 3850 RadeonHD 2600 XT GeForce8800 GTS (320MB) GeForce8600 GTS Logo Chip RV670 RV670 RV630 G80 G84 transistors approx. 666 million approx. 666 million approx. 390 million approx. 681 million approx. 289 million production 55 nm 55 nm 65 nm 90 nm 80 nm Chip clock 775 MHz 670 MHz 800 MHz 500 MHz 675 MHz Shader clock 775 MHz 670 MHz 800 MHz 1200 MHz 1450 MHz Pixel pipelines XXXXX Shader units (MADD) 64 (5D) 64 (5D) 24 (5D) 96 (1D) 32 (1D) FLOPs (MADD/ADD) 496 GFLOPS 429 GFLOPS 192 GFLOPS 346 GFLOPS * 139 GFLOP/s * ROPs 16 16 4 20 8 Pixel fill rate 12400 MPix/s 10720 MPix/s 3200 MPix/s 10000 MPix/s 5400 MPix/s TMUs 16 16 8 48 16 TAUs 32 32 16 24 16 Texel fill rate 12400 MTex/s 10720 MPix/s 6400 MTex/s 24000 MTex/s 10800 MTex/s Vertex-Shader XXXXX Unified -Shaderin Hardware ✓ ✓ ✓ ✓ ✓ Shader-Model SM 4.1 SM 4.1 SM 4 SM 4 SM 4 Geometryshader ✓ ✓ ✓ ✓ ✓ Memory amount 512 GDDR4 256 GDDR3 256 GDDR3 (256 GDDR4) 640 GDDR3 (320 GDDR3) 256 GDDR3 Memory clock 1125 MHz 830 MHz 700 MHz (1100 MHz) 800 MHz 1000 MHz memory interface 256 bit 256 bit 128 bit 320 bit 128 bit memory bandwidth 72000 MB/s 53120 MB/s 22400 MB/s (35200 MB/s) 64000 MB/s 32000 MB/s

The ATi Radeon HD 3850 is based on a fully activated RV670 GPU, which is manufactured in the modern 55 nm process at TSMC,Contains 666 million transistors and has 64 5D vector shader units. Although the 5D units can theoretically split up in a ratio of 1: 1: 1: 1: 1 (whereby they then behave like scalar shaders), the operations must be completely independent of each other. If, on the other hand, they are interdependent, some ALUs wait for the results of the others and stand still. The thread scheduler tries to prevent this and to occupy the ALUs with other tasks, but a lot of driver optimization is necessary. Each ALU can perform one MADD (Multiply-ADD) operation per cycle on the RV670. In addition, the ATi Radeon HD 3850 uses 16 texture units, which can texturize 16 pixels per cycle and address 32 pixels. This contradicting relationship to the G80 chip from Nvidia is said to be particularly advantageous in Direct3D 10 applications.

The number of ROPs is also 16. 32 Z operations (visibility checks of pixels which, depending on the result, are not even rendered) can be carried out per cycle. The multi-sampling shader resolve is carried out on the RV670 as on the predecessor R600, which is used on the Radeon HD 2900 XT, even for simple anti-aliasing modes with the box filter in the shader units. Normally this is the job of the ROPs, but this does not work on the RV670. Whether this is ATi's intention or a mistake in the chip design will probably remain a secret forever. The GPU clocks at 670 MHz on the Radeon HD 3850. The memory interface on the graphics card is 256 bits wide and connected to the GPU at 830 MHz. On the ATi reference design, 256 MB is built into it, which is made up of eight 32 megabyte memory chips. If desired, the manufacturer can double the memory to 512 MB by using 64 megabyte modules. The memory controller on the RV670 consists of four 64-bit channels.

The Radeon HD 3850 supports the Direct3D 10.1 standard, which will be introduced in Windows Vista with Service Pack 1. The technology known as 'PowerPlay' and known from notebook chips ensures low power consumption by reducing clock rates and voltages in idle mode under load and deactivating chip parts that are not required. The Unified Video Decoder (UVD) is also on board the HD 3850 and takes care of the acceleration of HD videos. The graphics card is PCIe 2.0 compatible.


* The GFLOP numbers of the G80 graphics cards given by us correspond to the theoretical maximum output, if all ALUs can access the entire capacity of the MADD and MUL units. However, this is practically never the case on a G80. While the MADD can be used completely for 'general shading', the second MUL usually has other tasks and takes care of the perspective correction or works as an attribute interpolator or special function unit (SFU). With the ForceWare 158.19 (and its Windows Vista offshoot), the second MUL can also be used for general shading, but apparently not completely, since the 'special functions' still have to be carried out. That is why the real GFLOP numbers are below the theoretical maximum.

On the next page: Impressions