Index

New Paths in Processor Architecture

Overview

In surveying current processor architectures, we can easily find an example of almost any style or theme that you might be interested in. The line between RISC, originally a Reduced Instruction Set and CISC, a Complex Instruction Set Computer, has been blurred in the last ten years. While Instruction Set Architectures (ISA) have been stable until recent developments, the battle now is between very fast single processors and slower wide parallel processors.

This battle has now been joined by the technology battle that intensifies as circuit dimensions shrink, and the power/heat battle as clock speeds race for the 10 GHz crown. That particular winner may provide both home entertainment and heating in the near future.

The power/heat connondrum is proving to be a very hard barrier to bypass, as Intel has found in its Itanium (130W) and P(100W) processors. This is the limit of practical air cooling with current systems. IBM is not a stranger to these challenges, but their solution is based on Multi Chip Modules (MCM) with liquid cooling attachments, not typically suitable for desktop use. Any current or future processor Architecture (PA) will have to deal with all of these barriers to succeed in wide acceptance.

How are these barriers being breached? What techniques will give the next generation of processors that edge that jumps current limitations?

Basic Processor Architecture Classes

For the purpose of this article, I will separate processors into four broad classes:

  1. Single or dual core processor, high power
  2. Single or dual core processor, Low power
  3. multiple processor, high power
  4. multiple processor, low power
Low power is less than 10 watts, high power more than 80 watts. Some chips may not fit exactly withing those boundaries, but it will be clear which class they belong to. Some examples of these classes are:
  1. Intel Itanium, IBM Power4 and Power5, Fujitsu Sparc64 VI
  2. Via C5, Transmeta TM8000, Intel Pentium M, AMD Geode
  3. Intel Tanglewood
  4. Clearspeed CS301
There are many ARM derivatives and embedded processors which perform well at very low power, but they typically are special purpose systems and software. This group is a diverse set of solutions for the embedded arena and will not be covered here.

Class I: Single or dual core processor, high power

This class is characterized by high clock speeds and very complex implementation to optimize the rate instructions are executed. Processors have many stages and several instructions are 'in flight' at any time. Large cache and high bandwidth links are standard. The IBM Power5 is an extreme example example of this class.

There are big hurdles ahead of this class. The first one is power, and it is a critical limit for both heat dissipation and electromigration. We can get more heat off of a chip by use of liquid cooling, channels on the back of the silicon, and silver as a heat conductor, much better than copper. The limiting issue will be the electromigration effects as dimensions get smaller, aggrivated by elevated temperatures. Extreme cooling such as Freon or alcohol as the fluid, or even liquid nitrogen, is possible but limited in appliction.

A second major limitation is the diminishing returns of complexity. Fast processors already use all the obvious techniques for speed, and those left are not likely to yield big gains. The Itanium's attempt to move that complexity into software will not yield further significan gains because of the NP nature of software comnplexity. NP is shorthand for complexity that increases exponentially, so no matter how much resource you use, the gains diminish rapidly.

Overall, this class of processors is near several limits that will close the door to future extensions. This is reflected in reality as all of the major vendors are moving to multiple cores (more than two) as is reflected in their future roadmaps. Faster systems must come from this and other approaches.

Class II: Single or dual core processor, Low power

This class is characterized by moderate clock speeds, relatively simple implementation and resources for execution, modest cache and bandwidth requirements and automatic control of which parts of the chip are powered up to minimize power use. The Via C5P is a good example of this class.

The primary objective of this class is best performance while maintaining low power. In addition, small die size means cheap production and minimum board space. Each of the current vendors uses somewhat different techniques to keep power low, but Via's and Transmeta's techniques represent the extremes.

Via build the smallest, simplest die consistent with rapid execution with minimal complexity, and takes great care to use low power in each section as well as power control. Via's current C5P with a 47 mm sq. die and a installable chip smaller than a penny is the smallest processor capable of running as a dual processor system. I expect to see dual core C5 chips in the next generation or two from Via.

Transmeta steps away from direct execution and builds a powerful general execution engine with the capability of dynamically transforming the instruction stream into a VLIW stream that executes multiple units at low power. Transmeta's approach is a powerful general technique that has potential well beyond its current implementation in the TM8000.

Multiple Processor Cores

There is little data available about this class of processors at the current time. Only the Power4 and Power5 from IBM fall in this class, and they seem to be two separate cores on a ceramic substrate, not yet the multiple processors in one core. When multiple processors on one core get announced, I'll be taking a close look at them.

All content on this site is Copyright 2001 - 2004 by Bill Nicholls
All Rights Reserved