Horus is a chip designed by Newisys that sits above a quad Opteron board and synchronizes those four caches with other Horus chips. Initial Horus chips will synchronize the caches on four quad processor boards, and dual core Opterons will enable 32 processors in a tightly coupled system.
The importance of this is that it makes the Opteron capable of competing with other large Symmetric MultiProcessor (SMP) systems from Sun, HP and IBM at a new level of economy. This will chip away at the profitable middle of the large SMP systems market where a fully shared memory is required. For certain kinds of problems where a clustered system introduces too much delay in accessing memory in other systems, a fully shared memory SMP system is the only answer.
More info on Horus, and some specs for the Dual Core Opteron. See also the SGI and NASA Altix system using twenty 512 processor SMP systems for the largest ever SMP system in 19Aug2004 Supercomputer News.
NEW YORK -- March 31, 2004 -- IBM today outlined plans to openly collaborate and build a community of innovation around its Power microprocessor architecture used in a vast range of products from the world's most powerful enterprise systems and supercomputers to games and embedded devices. The move could have major implications for computers and the electronics industry at large.For more information, follow these links to the full Press Release, and another link to the Power Web Site.
Leading microprocessor makers, Intel and AMD, added multitude of technologies into their x86 microprocessors recently, including massive things like x86-64, Hyper-Threading and plethora of micro-architectural improvements. However, this is only the beginning of massive increase of CPU computing power! Next year Intel is rumored to start making dual-core chips, not only for servers, but for desktops and even mobile computers, according to reports from PC Watch web-site!Read more at Xbit Labs.
IBM is adding Intel SpeedStep-style clock frequency and core voltage scaling technology to the 90nm version of its PowerPC 970 processor, aka the G5. Read more at The Register.
ARS Technica has an overview of three new processors - the POWER5, UltraSparc IV and the Transmeta Efficon.
This article originally started life as an MPF CPU roundup, but it has evolved into more of an overview of three specific upcoming processors: IBM's POWER5, Sun's UltraSparc IV, and Transmeta's Efficon. Actually, the article focuses mostly on IBM's POWER5 and Transmeta's Efficon, but I also cover Sun's UltraSparc IV because it's relevant to the "big picture" that I want to paint with this report.
Ace's Hardware has a good overview on the server processors announced at MPF. Look for UltraSparc IV and Fujitsu Sparc64 VI here, and Power5 here.
The annual Microprocessor Forum is the place for big announcements and this year was no exception. The Efficon and C5P are in the low power class.
Clearspeed is a new design approach of parallel processors and low power together. The performance is impressive, the low power doubly so. Multiple CS301 chips can be tied together, putting a large array into a small box. This may be the multiprocessor for the rest of us.
The IBM Power5 and Fujitsu Sparc64 VI are in a class by themselves - the behemoth class. It's clear that IBM will own the very high end (if you have to ask, you can't afford it) of the SMP and cluster systems for the forseable future.
ClearSpeed CS301This is a substantial design breakthru in parallel processing combined with low power. I expect this design, in its delivered chip variations, to make as much difference in parallel processing as the cluster and blade concepts have in the past. According to President Mike Calise, first silicon is working and no surprise to me, interest is very high. I will have more on this design soon. Here is a brief overview from the announcement:
The Clearspeed CS301 is a multi-threaded array processor that enables dramatic improvements in performance and power consumption for intensive floating point applications. At over 25 GFLOPS peak performance, the new chip provides more than twice the processing speed of competitive products. At 10 GFLOPS per Watt, power consumption is also twenty times more efficient. As a result, the CS301 delivers up to a ninety percent reduction in purchase price and running costs, making high performance computing affordable and available to companies of all sizes.Read the full announcement at Clearspeed.
Finally supporting the full Pentium4 instruction set including SSE2, Efficon has integrated single-channel DDR400 controller, integrated AGP 4X and integrated Hypertransport for a choice of south bridges (same ones as on AMD Athlon64-M platform - Nvidia Nforce 3 Go comes to mind). While the claimed 7 W Thermal Design Power limit on Efficon will accommodate a 1100MHz CPU, compared to 900 MHz on Pentium M, Transmeta also claims overall better performance per cycle for Efficon vs Pentium M (not to mention Pentium 4-M). Read the full announcement for the Transmeta 8000 Efficon.
VIA C5P NehemiahVIA is firmly on its 'low-cost, low-power' fanless desktop CPU path and it continues with the new C5P Nehemiah CPU. Smaller than a US 1-cent coin, the C5P package enables dual-CPU Mini-ITX integrated PC boards for the first time. The claimed power figures show this thingie consumes roughly 30% less power than Pentium-M at the same clock. Check out the review at ExtremeTech.
Check out the picture of an 8 processor Power5! The MCMs now are more integrated - a single POWER5 MCM has four chips (8 CPUs) plus four 36 MB L3 cache chips, and allows for easier back-to-back with another MCM to make a very fast and compact 16-way system at full bandwidths systemwide. In fact, so compact that you could fit 16 of those in a single rack, and connect them with something like, say, Quadrics, for a nice little 2 TFLOPs supercomputer with 2TB RAM in that same rack. Finally, one more MP link on chip allows now for 64-way single SMP system in the same footprint as current 32-way POWER4+. For more details on the Power5, check out this article on The Inquirer.
Fujitsu SPARC64 VIIt is a powerful Sparc Class CPU, with dual 2.4GHz cores, 6MB ultra high bandwidth on-chip cache, fast buses, proper out-of-order execution with fast FP as well, and and scalability that far exceeds the UltraSPARC IV. The Fujitsu chip with 690 million transistors done in 0.09um copper process, similar process geometry is claimed for the Sun UltraSPARC IV as well. In reality, the dual-core, multithreaded (each core is 2-way SMT too) SPARC64VI should be at least twice as fast as dual-core UltraSPARC IV. Check out the Fujitsu Sparc Roadmap at Computer Business Review.
It's not at all clear what will happen to the upcoming Alpha processor upgrades. There is both positive news and some questions not answered yet. Here are some links to the various news items:
It's clear that HPC is less than happy about the slow uptake of their Itanium systems and obviously believe that they can increase those by downplaying OpenVMS and the Alpha processor. This is a terrible mistake by HPC as it creates FUD for their own products while not really strengthing their position in high end systems.You can check out the updated OpenVMS information in Large System Notes.
AMD has put a date on dual core Opterons, which have been rumored for some time. The original Opteron design includes an internal interface to the north bridge for two cpus. Optrons will be moved to 90 nm lithography in early 2004, so the dual core chips will start on that line size and probably move to 65 nm in 2006. Read more on this at The Register and Xbit Labs.
Hans de Vries, the host of Computer-Architect, writes in detail about current and upcoming new processors using the x86 instruction set. Normally I don't cover this class of processors because they are well covered in the general press, but Hans' analysis merits a closer look.
Specifically, Hans has discovered several innovations in the Opteron (also Athlon FX and Athlon 64) design that make the chip faster and more efficient. Here is a short selection from the extensive chapter indexes:
This list selects points of special interest from the detailed four chapter analysis of the Opteron design. To read the whole document in detail will take at least an hour, but you can get a good feel for the system by just selecting the items listed above from the detailed chapter indexes. Recommended for technical people.
Even just skimming the information will give you a new appreciation for why the Opteron is winning all those supercomputer contracts. These same reasons will make Opteron and its sibling Athlon 64 and Athlon FX killer desktop systems. Read it all at Chip-Architect's Opteron Analysis.
IBM has named this TRIPS, the Tera-op Reliable Intelligently Adaptive Processing System. The prototypes will include four Trips processors, each containing 16 execution units laid out in a 4 x 4 grid. By the end of the decade, when 32-nanometer process technology is available, the goal is to have tens of processing units on a single die, delivering more than 1 trillion operations per second.
TRIPS addresses the increasingly difficult problem of accessing data quickly on a chip operating at picosecond cycle times. At those rates, signals take several cycles just to travel across the chip. In order to make full use of the chip's capabilities, processing logic and data must be physically close together, yet predicting where that needs to happen is difficult. That's where the Adaptive nature of the chip comes in.
I think this new concept represents a fundamental change in the way future high end processors will be designed and built. Once again, IBM's investment in Research and Development, more than $6 billion per year, will pay off in future generations of products. Read more at EE Times and at IBM.
Ace's Hardware has a nice article with some details of the upcoming multicore processors Sun has designed. There is also a short piece on Sun's research contract with the government to design a single system image with 100,000 thread capability. Checkout the story at Ace's Hardware.
At the high end of the performance curve, IBM is adding SMT to its dual core Power5 design. The main reason IBM gives for this is that there is a 40% performance gain possible with very little more energy required, thus reducing the heat generated for a given performance level. They also plan future chips to have 4 cores with 8 threads each on a single (very large) high performance chip. More info at C|Net News.
The Power5 processor will be used in a nuclear weapons simulation supercomputer at Lawrence Livermore National Laboratory. That machine, called ASCI Purple, is slated to use 12,544 Power5 chips. ASCI Purple, due to be running by the end of 2004, is expected to have 196 interconnected 64-processor servers, making a total of 12,544 Power5 chips. It will come with 50 terabytes of memory and will also will have IBM disk storage arrays holding 2 petabytes, or a quadrillion bytes, of data.
As for physical size, ASCI Purple will weigh about 197 tons, be linked to 119 miles of optical cable and 28 miles of copper cable, and occupy 8,900 square feet of floor space--or about two basketball courts. It will consume 4.7 megawatts of power, enough current for 4,000 homes, according to IBM. More on ASCI Purple at C|Net.
Gemini, a dual processor on a chip, will combine two UltraSPARC II cores. It wll arrive in 2004, run at 1.2 GHz yet consume only 32 watts at maximum. This should be an ideal chip for small but powerful blade servers. More on Gemini at The Register.