NVIDIA GeForce GTX 280 – fast and hot
By: Anton Rachko, Vladimir Romanchenko
In its today's official press-release, NVIDIA announced a release of the new generation of GeForce GTX 200 family graphic cards based on the second generation of the unified visual computational architecture - GeForce GTX 280 and GeForce GTX 260. The first sample of the video card built on the GeForce GTX 280 chip has already been to our test lab, and we are looking forward to sharing our first test results with the readers, and our impressions of NVIDIA's new architecture.
Prior to moving to the graphs and findings, we introduce you to the description of the architecture of GeForce GTX 200 family graphic chips, as well as new NVIDIA's new and renewed technologies, and a number of new initiatives first announced in our today's review. For those who are impatient and can't help scrolling down to the conclusions, we'd like to point out the following: this time, NVIDIA has not only announced a renewed architecture, but in some way a new philosophy of the graphic architecture with far-reaching consequences.
First, the technical traits. Being a logical extension of GeForce 8 and GeForce 9 series which used to be NVIDIA's first generation of the unified visual computational architecture, the new products of the GeForce GTX 200 family are made on the base of the second generation of this architecture.
NVIDIA's GeForce GTX 280 and 260 graphic processors are the most massive and complicated graphic chips of the known so far: just imagine - 1.4 billion transistors in each! The most powerful solution - GeForce GTX 280 - offers 240 shader processors, 80 texture processors, support for up to 1 GB of video memory. See the following table for detailed specifications of GeForce GTX 280 and GeForce GTX 260 chips.
Specifications of NVIDIA GeForce GTX 280 and GTX 260 |
| Graphic core |
GTX 280 |
GTX 260 |
| Process technology | 65 nm |
| Q-ty of transistors |
1.4 bln |
| Clock speeds of the graphics (including the dispatcher, texture and ROP units) |
602 MHz |
576 MHz |
| Clock speeds of the processor units |
1296 MHz |
1242 MHz |
| Processor units |
240 |
192 |
| Memory clock speed (frequency/data) |
1107 MHz / 2214 MHz |
999 MHz / 1998 MHz |
| Memory interface width |
512 bit |
448 bit |
| Memory bandwidth |
141.7 GB/s |
111.9 GB/s |
| Memory capacity |
1 GB |
896 Mb |
| Q-ty of ROP units |
32 |
28 |
| Q-ty of texture filtration units |
80 |
64 |
| Performance of the texture filtration units |
48.2 GTexels/s |
36.9 GTexels/s |
| Support for HDCP |
Yes |
| Support for HDMI |
Yes (DVI-HDMI adapter) |
| Interfaces |
2 x Dual-Link DVI-I 1 x 7-pin HDTV |
| RAMDAC, MHz |
400 MHz |
| Bus |
PCI Express 2.0 |
| Form Factor |
Two slots |
| Configuration of power connectors |
1 x 8-pin 1 x 6-pin |
2 x 6-pin |
| Maximum power consumption |
236 W |
182 W |
| GPU boundary temperature |
105°C |

In fact, the modern graphic core of the GeForce GTX 200 family can be seen as a universal chip that supports two different modes – graphic and computational. The architecture of GeForce 8 and 9 family chips are represented as Scalable Processor Arrays, (SPA). The architecture of GeForce GTX 200 family chips is based on the updated and improved SPA which comprises a number of Texture Processing Clusters in the graphic mode, or "Stream processing clusters" in the parallel computing computing mode. At the same time, each TPC unit comprises an array of Streaming Multiprocessors, each containing eight processor cores also referred to as Streaming Processors or Thread Processors. Each SM also includes processors of texture filtration for the graphic mode, also used for various filtering operations in the computational mode.
Below is a block diagram of GeForce 280 GTX in the traditional graphic mode.
On switching to the computational mode, the hardware thread dispatcher (top) controls the TPC threads.
Here is the TPC cluster upon a closer look: distributed memory for each SM; each processor core of SM is able distributing data among other SM cores with distributed memory, without having to address the external memory subsystem.
Therefore, NVIDIA's unified shader and computer architecture uses two absolutely different computational models: for handling the TPC, the MIMD (multiple instruction, multiple data) is used; for SM computations - SIMT (single instruction, multiple thread), advanced version, SIMD (single instruction, multiple data).
As to the general specifications, as compared to the previous generations of chips, the GeForce GTX 200 family offers the following advantages:
- Three-fold amount of data streams processed per unit time
- New design of command execution scheduler with the texture processing efficiency raised by 20%
- 512-bit memory interface (vs. 384-bit in the previous generation)
- Optimized process for z-sampling and compression to attain better performance results at high screen resolutions
- Architectural improvements to increase performance of shadow processing
- Full-speed frame buffer blending (versus the half-speed in 8800 GTX)
- Doubled command buffer to increase the performance of computations
- Doubled number of registers for faster computation of long and complex shaders
- Doubled precision of floating-point data computations as per the IEEE 754R standard
- Hardware support for 10-bit color space (only with the DisplayPort interface)
This is how the list if main specifications of the new chips looks:
- Support for NVIDIA PhysX
- Support for Microsoft DirectX 10, Shader Model 4.0
- Support for NVIDIA CUDA
- Support for the PCI Express 2.0
- Support for GigaThread
- NVIDIA Lumenex engine
- 128-bit floating-point computations (HDR)
- Support for OpenGL 2.1
- Support for Dual Dual-link DVI
- Support for NVIDIA PureVideo HD
- Support for NVIDIA HybridPower
We note it separately that DirectX 10.1 is not supported by the GeForce GTX 200 family. Among the causes is the fact that in developing chips of the new family, a decision was made to concentrate efforts not on support for DirectX 10.1, still little in demand, but on the improvement of the architecture and performance of chips.
The implementation of NVIDIA PhysX technology based on the package of physical algorithms is a powerful physics engine for real-time computations. Currently, support for the PhysX is implemented in over 150 games. Combined with the powerful GPU, the PhysX engine provides a substantial increase in the physics computational power, especially at creation of explosions with dispersion of dust and debris, characters with complex mimics, new types of weapons with fantastic effects, realistically worn and torn fabrics, fog, and smoke with dynamic streamlining of objects.
There is one more no less important novelty – new power-saving modes. Due to use of the high-precision 65-nm process technology and new circuitry, it is possible to attain more flexible and dynamic control over the power consumption. For instance, the power consumption of the GeForce GTX 200 family graphic chips in the stand-by mode or in the 2D mode amounts to merely 25W; while playing a Blu-ray DVD - about 35W; at the full 3D load, the TDP does not exceed 236W. The GeForce GTX 200 chip is even able getting completely disabled due to support for the HybridPower technology when used with motherboards based on HybridPower chipsets nForce with integrated graphics (e.g., nForce 780a or 790i), while the threads of graphics of low intensity simply computed by the GPU integrated into the motherboard. Apart from that, the GPU of the GeForce GTX 200 family also offers special power-consumption units which are aimed at disabling blocks of the graphic processor not engaged at the moment.

The user can configure the system based on two or three video cards of the GeForce GTX 200 family in the SLI mode when using motherboards built on the respective nForce chipsets. For the traditional Standard SLI mode (with two video cards), an about 60-90% performance gain in games is declared; in the 3-way SLI mode – the maximum FPS at the maximum screen resolutions.
Another innovation is in support for the new DisplayPort interface with resolutions over 2560 x 1600, with 10-bit color space (previous generations of GeForce graphics offered support for 10-bit data processing, but only 8-bit composite RGB colors were displayed).
Within the announcement of the new series of the GeForce GTX 200 family graphic processors, NVIDIA suggests to take an entirely different look at the role of the CPU and the GPU in a modern balanced desktop system. Such optimized PC based on the concept of heterogeneous computations (i.e., computations of the stream of heterogeneous multi-type tasks), in the opinion of specialists at NVIDIA, offers a much more balanced architecture and substantially greater computational capabilities. It means a combination of the CPU with the comparatively ,moderate performance of the most powerful graphics and even the SLI system, which allows attaining a peak performance in the most demanding games, 3D, and media applications.
In other words, the concept can be briefly formulated as follows: the CPU in the modern system takes on the service functions, while the burden of demanding computations is placed on the graphic system. Approximately the same conclusions (albeit more complex and numerically substantiated) are seen from the series of our reviews devoted to the investigations of dependence of performance on the key components of the system - see the reviews CPU-boundedness of the video system. Part I - Analysis; CPU-boundedness of the video system. Part II – Effect of the CPU cache and the speed of the RAM; Bot-dependence, or why 3D games need a powerful CPU; CPU-boundedness of the video system. Transition range. "Critical" point of the CPU clock speed.
However, intensive computations with modern graphic video cards is no longer a novelty, but just with the emergence of GeForce GTX 200 family graphic processors NVIDIA expects a substantial rise of interest towards to the CUDA technology.
CUDA (Compute Unified Device Architecture) - is a computational architecture aimed at solving complex tasks in the consumer, business, and technical spheres - in any applications making intensive use of data with NVIDIA graphic processors. From the viewpoint of the CUDA technology, the new GeForce GTX 280 graphic chip is nothing more than a powerful multi-core CPU (having hundreds of cores!) for parallel computations.
As was stated above, the graphic core of the GeForce GTX 200 family can be represented as a chip supporting both the graphic and computational modes. In one of these modes – "computational" - GeForce GTX 280 becomes a programmable multiprocessor with 240 cores and 1 GB of dedicated memory – kind of a dedicated supercomputer whose performance is a bit less than a teraflop, which raises the effectiveness of handling applications multiple times, which do a good job parallelizing data, e.g. video encoding, scientific computations.
Graphic processors of the GeForce 8 and 9 families became the first on the market which supported the CUDA technology, and by now over 70 mln pcs have been sold so far, with the interest to the CUDA project is constantly going up. For details of the project and downloadable files required to start the work, are available here. As an example, the below screenshots demonstrate the patterns of computational performance gain produced by independent users of the CUDA technology.
Summing up with our brief investigation of architectural and technology improvements implemented in NVIDIA's new generation of graphic processors, let's point out the major aspects. The second generation of the unified architecture of visual computations implemented in the GeForce GTX 200 family is a substantial step forward as compared to the previous generations GeForce 8 and 9.
As compared to the previous leader GeForce 8800 GTX, the new flagship GeForce GTX 280 offers 1.88 times more processor cores; is capable of processing about 2.5 times more threads per chip; offers the doubled size of file registers and support for floating-point computations at doubled precision; supports 1 GB memory with 512-bit interface; is equipped with more efficient command dispatcher and improved capabilities of communication among the chip's elements; offers the improved module for Z-buffer and compression, support for 10-bit color palette, etc.
For the first time, the new generation of GeForce GTX 200 chips is originally positioned not only as a powerful 3D graphic accelerator but also as a serious computer solution for parallel computations.
GeForce GTX 280 with 1 GB are expected to appear in the retail priced at about $649, with the new products based on GeForce GTX 260 having 896 MB of memory – priced at about $449 (or even $399). Quite soon, we'll be able to verify how the recommended prices match the real retail prices, so according to all the data the announcement of the GeForce GTX 200 family is far not "on paper" because solutions based on these chips have been announced by many partners of NVIDIA, and in the nearest future the novelties will appear on the retail shelves.
We now move on to describing the first GeForce GTX 280 which arrived at our test lab, as well as the test results.
 |
Top Stories: |
 |
 |
 |
MoBo:


|  |
 |
 |
VGA Card:


|
 |
 |
 |
CPU & Memory:

|
|