Intel QX9650 (Penryn): first tests
Author: Date: 12.11.2007 |
|
In the expiring year 2007, Intel prepared quite a nice surprise to the users: launched a new series of CPUs manufactured following the 45-nm process technology. Any change of the process technology is just the right time to update the structure of the CPU core. The thing is, any core (doesn't matter which - video, CPU, chipset, audio processor, etc.) is not perfect, it has errors, flaws, as well as unveiled capabilities (due to varied reasons). The user can't see the errors (except the very serious, which have not been there for three-four years): they can be walked around at the level of the chipset and the motherboard's BIOS. Once the news of Intel's migration to the 45-nm process technology has been public, all started waiting for renewal of the Conroe core. And Intel has proved the expectations - presented a new family of Penryn processors, which includes the 4-core Yorkfield and 2-core Wolfdale.
The Conroe core is the most perfect and advanced core to date, and processors based on that easily leave AMD competitors behind. So Intel's position is quite understandable, and the company decided not to introduce fundamental changes to the Core architecture, but did no more than modifications thereto. Well, what have the engineers at Intel added or modified? First, the division operations (for both integer and real numbers) have been substantially accelerated. The modified division unit has been dubbed Fast Radix-16 (in the Core family, the similar unit was called Radix-4). As a result, the new unit processes 4 bytes per single cycle, instead of 2 bytes. Traditionally, programmers still avoid using division operations as relatively slow ones and prefer replacing them with multiplication. Varied compilers follow suit. But in any case, any acceleration of division operations will advantage the CPU overall performance. At the same time, square-rooting operation are not so easy to bypass, and that is just where Penryn is much faster than Conroe.
In Penryn, the unit in charge of executing thread commands has been seriously modified. Engineers at Intel have made that move since the new processors offer the additional SSE4.1 instruction-set. The maximum changes have been introduced to the permutation block which does bit permutations in 128-bit registers. Now, operations like packing, unpacking, shift of packed values, insertion are performed within the respective register during a single cycle. In the end, the permutation block has been dubbed Super Shuffle Engine, and its use gives an almost twofold performance gain while executing thread instructions. The new set of thread instructions (SSE4.1) includes 47 new instructions which make the lives of programmers much easier while developing software related to processing threaded data. These can be the jobs for video and audio encoding, scientific problems, and 3D graphics.
Now let's try to sort out what all this gives to the common user. The regular home-based user will not sense any difference in speed between Core and Penryn processors. Penryn does work a bit faster due to the more "mature" architecture, but in the regular non-optimized software this difference will be no more than a few percents. The optimized software is a different thing. First, its optimization for multiple cores. If it is there, the performance gain achieved by a 4-core processor as compared to a standard single-core processor (running at the same frequency and having the same architecture) may vary within 200% to 400%! Optimization for use of SSE4.1 instructions provides a 30% advantage of Penryn over the Core at the same frequency.
The only question is where is all this optimized software? Unfortunately, the average user does not come across that. And programmers themselves have no desire for spending resources to solve tasks like these.
But they don't only play at the computer - sometimes it is used to work. At that, the situation with optimization is better. Foe that, there are matching extensions in various graphics editors (e.g., 3DMax, POV-ray, and Photoshop CS), in video processing software (DivX, Microsoft Media Encoder). For instance, DivX 6.7 already offers support for SSE4.1. That means, decoding a film in the morning for further watching on a mobile device, a regular student will save time and be in time for the first lecture. The advantage will be more substantial at software for archiving (e.g., WinRAR). Besides these programs, most part of other software uses data compression - e.g. all the latest games built on ID Software engines store a lot of files as compressed container files. In other words, loading certain games and switching between levels will run much faster.
There is one more point to note. New Penryn processors are better than all their predecessors not only due to the improvements but also due to the 45-nm process technology itself. The core voltage has gone down, the heat emission lower, with the clock speed potentials higher. However, Intel is not hasty at raising the clock speeds: we know only the planned rise to 3.0 - 3.33 GHz clock speed. Perhaps with the transition to 400 MHz bus we'll see a 3.6 GHz processor. But that will not be earlier than in late 2008. That is the planned life span for the Penryn family, after which we'll see the entirely new Nehalem architecture offering the integrated memory controller (top-end processors will offer 8 cores and execute 16 threads at a time!). By that time, Intel will definitely migrate to the 32-nm process technology, and AMD may come out with the good news of successful introduction of the 65-nm process.
Of course, the 45-nm process will please the overclockers once they get such processors on hands. The potential of the core is much higher - that means higher overclocking; the lower heat emission means you can apply higher voltage (with the same cooler used) and attain even higher overclocking. I even dare to assume that the clock speed 4 GHz will not be regarded an "achievement" the way it is now with 65-nm Core processors.
However, the advantages of the 45-nm process technology are not only about the overclockers' delights. The process allows engineers at Intel to cut down the physical dimensions of the core, which means a reduction in prime cost (i.e. you can "grow" more cores on a single wafer). The reduction in prime cost will not affect the users at all - they will acquire processors at standard, fixed prices. But... Buying a Penryn CPU (Yorkfield or Wolfdale), the user does not get 4 MB of L2 cache (as it is with the Conroe) but 6 MB on each of the chips. That is, the overall L2 cache size in the tested specimen QX9650 which includes two Wolfdale cores amounts to 12 MB, and it's just this figure which will be in all the specifications and price lists. The QX9650 will sell at about 1000$.
The most interesting is that even with the large cache size the physical dimensions of Wolfdale are much smaller than Conroe: 107 sq mm versus 143 sq mm! There are 410 mln transistors packed on this area in the Wolfdale versus "merely" 291 mln in the Conroe core.
Wolfdale core
It turns out that the Yorkfield contains almost a billion of transistors (820, or 2 x 410), or about one million of transistors per $1 (for QX9650)! Those who are more patient will buy the transistors cheaper: in 6-8 weeks, the Q9450 (Yorkfield) CPU of 2.66 GHz priced at about $316 will be released.
The greater L2 cache size will favor to the software operating speed, whose performance depends on this factor. However, the L2 cache in Penryn has turned a bit slower than in the Conroe. Anyway, engineers at Intel have partly made up for that shortcoming by the Split Load Cache Enhancement feature.
As regards the typical heat emission, then for the tested specimen QX9650 it is 130W precisely. Only QX9770 offers greater than that, with its TDP being 136W, which is quite acceptable for the 3.2 GHz clock speed. This model will be available in Q1'08 at about ~$1400.
It is still one and a half months left until 2008, and currently the only and first representative of the new Penryn family is Core 2 Extreme QX9650 with 3 GHz clock speed, which offers four cores and runs at FSB = 333 MHz (1333 MHz QPB).
Externally, the novelty looks absolutely common, just another LGA775 processor. Even the marking does not allow distinguishing the contents under the heat spreader (like in all the other engineering samples by Intel):
If we remove the lid, we'll see two dual-core Wolfdale chips, with 6 MB of L2 cache memory on each (the overall L2 cache size = 12 MB).
On the reverse side of the CPU, we can see a somehow different configuration of capacitors.
Conroe - to the left, Yorkfield - to the right
The CPU-Z utility displays the following information:
Now, based on the preliminary data we make up a table of specifications for the Penryn family processors.
Name |
Core |
Q-ty of cores |
Clock speed |
FSB |
Multiplier |
L2 cache |
Core 2 Extreme QX9770 |
Yorkfield |
4 |
3.2 GHz |
400 MHz |
8 |
12 Mb |
Core 2 Extreme QX9650 |
Yorkfield |
4 |
3.0 GHz |
333 MHz |
9 |
12 Mb |
Core 2 Quad Q9550 |
Yorkfield |
4 |
2.83 GHz |
333 MHz |
8,5 |
12 Mb |
Core 2 Quad Q9450 |
Yorkfield |
4 |
2.66 GHz |
333 MHz |
8 |
12 Mb |
Core 2 Quad Q9300 |
Yorkfield |
4 |
2.5 GHz |
333 MHz |
7,5 |
6 Mb |
Core 2 Duo E8500 |
Wolfdale |
2 |
3.16 GHz |
333 MHz |
9,5 |
6 Mb |
Core 2 Duo E8400 |
Wolfdale |
2 |
3.0 GHz |
333 MHz |
9 |
6 Mb |
Core 2 Duo E8300 |
Wolfdale |
2 |
2.83 GHz |
333 MHz |
8,5 |
6 Mb |
Core 2 Duo E8200 |
Wolfdale |
2 |
2.66 GHz |
333 MHz |
8 |
6 Mb |
Core 2 Duo E8190 |
Wolfdale |
2 |
2.66 GHz |
333 MHz |
8 |
6 Mb |
We can draw a number of inferences from this table. First, Intel has left the Core 2 (Duo/Quad/Extreme) brand in the name, which seems to be reasonable because the Penryn family is a derivative from the Core family. Secondly, Intel has substantially reduced the clock speed increments among various models, using fractional multipliers. It is hard to recall when it was the last time Intel used such multipliers. Thirdly, almost all the Penryn family processors offer the FSB = 333 MHz (i.e. 1333 QPB) and therefore will call for a modern motherboard. And only the Core 2 Extreme QX9770 will run at FSB = 400 MHz (1600 QPB), which will require a X38-based high-quality motherboard or at least a top-quality motherboard based on P35 (although Intel will state that the only possible option is a motherboard based on X48 to be released simultaneously with QX9770).
Of special mention is the E8190 model which at its specifications meets E8200 but does not support the virtualization technology. In any case, the user won't be able to save: the prices recommended for E8190 and E8200 are the same and amount to ~$163. By the way, a similar situation is with the CPU E6540 which meets the E6550 but does not support the Intel TXT (Intel Trusted Execution Technology).
Overclocking
A few words on the overclocking. First of all, we note that we are testing the extreme version of the CPU (Extreme Edition) whose multiplier is unlocked. That allows us to come close to the limit clock speed of a particular specimen by means of adjusting the multiplier. We did just that way and attained the clock speed as high as 4 GHz.
But to provide operational stability at this frequency, we had to raise the supply voltage to 1,45 ?. Originally, this frequency somehow disappointed us - frankly, we expected a process technology larger than 45 nm. On the other hand - we see a 4-core processor, and its overclocking capability will be lower than that of the 2-core (which is well seen on the example of Kentsfield). Therefore, the series produced 2-core Wolfdale processors should easily get over the 4 GHz clock speed bar. Considering that Intel will most certainly update the stepping, we can expect even much higher frequencies.
As regards the heat emission, then due to the "finer" process technology Intel managed to reduce the real heat emission by one quarter under load and almost by half at rest. At the same time, we note that at rest all the known power-saving technologies like Enhanced Halt State (C1E) and Enhanced Intel SpeedStep work, so Intel has not presented any novelties at that. In fact, they are not really needed since Intel has not changed the requirements to the cooling system which should cope with the typical heat emission level of the CPU (TDP) being 130-136W. In other words, Intel is building quite a fair amount of margin in terms of the CPU heat emission and if necessary can release models of higher clock speeds (e.g. 3.33 GHz and 3.6 GHz).
|
Content: |
|
|
|
Top Stories: |
|
|
|
MoBo:
|
|
|
|
VGA Card:
|
|
|
|
CPU & Memory:
|
|
|