3DNews Vendor Reference English Resource -
All you need to know about your products!
Biostar And ECS CPU Boundedness Foxconn 9800GTX
About Us | Advertise  
Digital-Daily.com
Digital-Daily

Motherboard
CPU & Memory
Video
Mobile
Cooling
Editorial
Digital
Links









Digital-Daily : CPU & Memory : intel_penryn_nehalem
Print version

45-nm Intel Penryn and Nehalem: architectural details

45-nm Intel Penryn and Nehalem: architectural details
Author:
Date: 09.05.2007

Introduction

One of the most intriguing news topics of this season is about Intel's release of processors made following the 45-nm process technology. As more and more details of new chips are coming in, there has been a series of publications on our web site devoted to this topic. Today, we'll try to sort them out and review the novelties and technologies implemented in new processors. We'll try to recap all the theoretical calculations followed by practical tests of engineering samples and retail specimens of processors.

Micro architectures of the nearest few years

First of all, we'll tell you about Intel's most recent plans as to the introduction of CPU architectures for the forthcoming two years. The new-generation CPUs for desktops codenamed Penryn will be built on the enhanced Intel Core micro architecture. Their major distinction will be migration to the 45-nm process technology and some architectural novelties which should result in increased power efficiency, enhanced clock speed capabilities, and increased number of instructions executed per cycle, etc.

Once the mass production of Penryn chips has been established, Intel plans to present its Nehalem processors having a micro architecture of the same name, which should come to replace Intel Core. In about 2-3 years after the announcement of 45-nm processors – approximately, in about 2009-2010, Intel hopes to present a new, more precise 32-nm, process technology. For now, these plans are still pretty vague: even transition to the 45-nm process was accompanied by serious problems and called for involvement of absolutely new materials (high-k dielectrics and metal gates). As part of the 32-nm process technology, processors of the working name Westmere, formerly known as Nehalem-C, of the same Nehalem micro architecture will be presented.

Two years after the release of the Nehalem, the Gesher micro architecture will come to replace the previous. There is still very little information on it. We only know that first Gesher processors will be manufactured following the 32-nm process technology. At that, the forecasts regarding the future progress of processors are over.

Judging by these plans, Intel adheres to the former strategy for replacement of micro architectures and transition to a new process technology once every two years. It is hard to tell if the leader of the CPU industry will be able keeping up with such a fast pace of the progress. At Intel, they call such strategy of produce release as “tick-tock”. Every "tick" expresses a new stage in the development of semiconductor production technologies and improvements in the field of micro architecture (e.g., Penryn). Every "tock" matches the creation of a new micro architecture (e.g., Nehalem).

Penryn processors in detail

Penryn family processors will appear earlier than Nehalem, so we start just with them. Currently, there are over 15 Penryn family products under development. Among the first ones, we'll chips aimed at various market sectors.

Until the recent time, it has been known of the preparation for release of a dual-core processor for notebooks, 2- and 4-core models for desktop PCs, as well as 2- and 4-core processors for the server segment. During the days of Intel Developer Forum in Beijing, we also found out about the company's plans regarding the release of 45-nm chips for UMPC (Ultra Mobile PC) devices. The new processors will be a serious claim and may be able to shatter the positions of such manufacturers like AMD, VIA Technologies and others.

The improvements which the new process technology will brings will be interesting to look at from the viewpoint of qualitative comparison. For instance, quad-core Penryn processors will include about 820 mln processors which will be placed on two chips of 107 mm2 in area. For comparison, modern quad-core Intel Kentsfield processors offer 582 mln transistors, and the areas of quad-core processors manufactured following the 65-nm process technology amount to 143 mm2.

The novelties which the next generation of processors will bring can be viewed with regard to Intel's five modern technologies: Wide Dynamic Execution, Advanced Smart Cache, Smart Memory Access, Advanced Digital Media Boost, and Intelligent Power Capability.

The Wide Dynamic Execution provides execution of greater number of instructions per cycle, which boosts performance and helps enhance the power efficiency. Within this technology, Intel will present an improved and faster division block based on the radix-16 methodology, as well as the Enhanced Intel Virtualization Technology. The innovative architecture based on radix-16 will let substantially reduce the latencies in executing integer division operations as well as floating-point division operations. On the below diagram, you can see eloquent results which don't require any comments.

The Advanced Smart Cache technology is aimed at providing a higher performance and cache memory efficiency. Intel decided to increase the cache size in Penryn family processors. Dual-core processors will be equipped with the L2 cache of up to 6 MB in size, whereas some quad-core models will acquire 12-MB cache memory. Regarding the clock speeds, there is a mention of overcoming the 3 GHz bar.

Regarding the Smart Memory Access technology, they mention the increased bus bandwidth. There is confirmation of FSB speeds as high as 1600 MHz. Reportedly, the FSB 1600 MHz will appear in some processor models aimed at servers and workstations; it is still not yet specified when models of high-speed bus for desktop PCs will be released.

The Advanced Digital Media Boost technology is used to boost processing video, images and talk spurts. To increase performance when handling media data, Intel decided to add SSE4 (Streaming SIMD Extensions 4) to the ISA architecture, which will be available for most mainstream desktop PC sectors with the advent of 45-nm processors. This new instruction set includes many innovative instructions (they are as many as 50) which can be subdivided into the two groups:

  • Vectorization primitives for compilers and multimedia application accelerators;
  • Line and text data processing accelerators.

Perhaps, we dwell on SSE4 in more detail since this technology is one of the key innovations. To start with, we describe the applications which will be affected by this improvement. The improvements will affect graphics, video encoding and processing, 3D imaging, games, Web-servers, as well as application servers. According to Intel, the performance of applications making intensive use of computations will go up - namely, data storage analysis, database management systems, complex search and mapping algorithms, algorithms for compression of audio, video, images and data, algorithms for parsing and logical state analysis, as well as many others.

According to Intel, SSE4 is the most substantial and outstanding extension to the Intel ISA architecture since the times when SSE2 emerged. The SSE4 instruction set includes a few vectorization primitives for compilers, which provide further performance boost and efficiency of multimedia applications. There are also innovative instructions for lines processing.

Another enhancement is the Super Shuffle Engine. The new engine is able shuffling values over all the 128-bit register per cycle. This substantially raises the performance of processing operations related to shuffling (packing, unpacking, shift of packed values, insert). The diagram presents a comparison of the number of cycles required for the execution of SSE operations. We can see an almost twofold performance boost on the average.

There are interesting innovations which relate to the reduction of power consumption and increase in the "performance per watt" indicator. In this regard, Intel presented the two new technologies: Deep Power Down Technology, and Enhanced Dynamic Acceleration Technology.

The Deep Power Down Technology will be introduced primarily into processors for mobile platforms (Mobile Penryn). To reduce the power consumption in the idle mode, one more special state of the CPU has been added, which is named as the Deep Power Down Technology State, or C6. This mode implies disabling the cores, with the cache memory disabled completely. This allows for a substantial decrease in the core voltage and the power consumption, which in turn prolongs the battery operation life.

Among the other interesting novelty is the Enhanced Dynamic Acceleration Technology (EDAT). The idea behind it is as follows. For ease of description, we take the case of a dual-core CPU. Since single-threaded application makes little use of multi-core computations, the major role here is played by the performance of a specific core. That is why Intel has provided for the increase in the clock speed of the non-idle core, whereas the idle core is in one of the idle states C3-C6 and its heat emission drops sharply. This difference is leveraged by the non-idle core which raises its clock speed until the TDP boundary level is achieved. To visualize it, we bring in the following illustration.

Now regarding the TDP level of 45-nm processors. Unfortunately, there is still no data on the heat emission of mobile chips. Dual-core Penryn for desktop PC will fall within the 65W power consumption class, whereas for their quad-core relatives the TDP will be 95 and 130 W. In the server segment, the TDP for dual-core Intel Xeon will amount to 40, 65, and 80 W, whereas for the quad-core – 50, 80, and 120 W.

According to Intel's in-house tests, gaming applications demonstrate a 20% performance boost for new chips, while at video decoding operations (provided the SSE4 is enabled) – a performance boost of over 40%. If we compare the server Penryn of clock speeds over 3 GHz versus the most powerful quad-core Xeon (Xeon X5355, 2.66 GHz, FSB 1333 MHz), the performance boost in applications making intensive use of floating-point operations and sensitive to the bandwidth will amount to about 45%.


Next

Content:

Top Stories:
MoBo:


ECS X58B-A (Intel X58)
ASUS Rampage II Extreme (Intel X58)
MSI DKA790GX and ECS A780GM-A Ultra
MSI P7NGM (NVIDIA GeForce 9300)
Intel X58 and ASUS P6T Deluxe
MSI P45 Neo2 (Intel P45)
Foxconn A7GMX-K (AMD 780G)
VGA Card:


NVIDIA GeForce GTX 295 – a new leader in 3D graphics!
ECS HYDRA GeForce 9800GTX+. Water-cooled and SLI "all-in-one"
Radeon HD 4830 CrossFire - better than Radeon HD 4870!
XFX GeForce GTX 260 Black Edition in the SLI mode
Leadtek WinFast PX9500 GT DDR2 – better than GeForce 9500GT DDR-3
Palit Radeon HD 4870 Sonic: exclusive, with unusual features
Palit HD 4850 Sonic: almost Radeon HD 4870, priced as HD 4850
CPU & Memory:

GSkill high-capacity memory modules
CPU Intel Core i7-920 (Bloomfield)
DDR3 memory: late 2008
CPU AMD Phenom X3 8750 (Toliman)
AMD Phenom X4 9850 – a top-end CPU at affordable price
CPU Intel Atom 230 (Diamondville)
Chaintech Apogee GT DDR3 1600

  Management by AK
  Design VisualPharm.com

Copyright © 2002-2011 3DNews.Ru All Rights Reserved.
contact -
Digital-Daily - English-language version of the popular Russian web-project 3DNews