Because this is a major release, I am including most of the release text in Large System Notes - links are at the end.
December 16, 2003 -- Today's fourth release by the National Science Foundation Middleware Initiative (NMI) includes a wide range of software, services, documents and recommendations for the effective use of information technology in research and education. NMI-R4 emphasizes open-source solutions to issues critical to collaboration across multiple organizations that may be separated by geography and by divergent local computing architectures.
SGI has extended its line of Altix 3000 Itanium clusters and NUMA systems to larger and smaller configurations. Recently they delivered a 512 processor single image system to NASA Ames for ocean simulation (see 03Dec2003 item). Now they will deliver systems with 1 to 16 processors, and clusters or NUMA systems with up to 1024 processors. Here is a quote from their announcement:
"The new SGI Altix server configuration, supporting 1 to 16 Intel Itanium 2 processors, will be available in the first quarter of 2004 at a price point comparable to mid-range UNIX® servers. Scalable SGI Altix 3000 systems are available today in server configurations of 4 to 64 processors, and supercluster configurations of 4 to 128 processors. For customers demanding even larger Altix superclusters, SGI plans to support configurations of 512 processors in October 2003 and 1,024 processors in May 2004. SGI also recently announced plans to extend the industry-leading scalability of its SGI® Altix™ 3000 servers to encompass a record 128 processors within a single instance of the Linux operating environment.
Since its introduction in January, SGI Altix 3000 has been recognized as the first Linux cluster that scales up to 64 processors within each node and the first cluster to allow global shared-memory access across nodes. Inspired by the success of the SGI Altix family and the powerful combination of standard Linux running on 64-bit Intel processors, more than 60 high-performance manufacturing, science, energy and environmental applications have been ported by their commercial developers to the 64-bit Linux environment. Over two thirds of these applications have certified and optimized their code for differentiated performance on the Altix platform."
Read the full SGI Announcement and check out the Altix Tech Specs and Availability.
"Just weeks after attaining record levels of sustained performance and scalability on a 256-processor global shared-memory SGI® Altix™ 3000 system, the team at NASA Ames doubled the size of its Altix™ system-achieving 512 processors in a single image, by far the largest supercomputer ever to run on the Linux® operating system. (NASA announced its technical feat at the SC2003 supercomputing conference.) NASA's effort is part an intra-agency collaborative research program between NASA Ames, JPL and NASA's Goddard Space Flight Center to accelerate the science return for large-scale earth modeling problems."
Read the Ocean Simulation release at SGI. See also Global Climate Modelling at SGI.
"The Onyx 3000 series graphics supercomputers performing image generation provide highly realistic and precise simulation of the multi-role functions that F-16 fighter aircraft perform in combat missions. In addition to the four SGI Onyx 3000 supercomputers -- each with seven graphics pipes -- the company will deliver an SGI(R) InfiniteStorage TP9100 disk array, two SGI(R) Origin(R) 3000 family supercomputers, as well as Silicon Graphics(R) Octane2(TM) and Silicon Graphics(R) O2(R) visual workstations. These extremely reliable and high-performance systems form the majority of the computational components of the F-16 training simulators."
Read the full release at SGI.
AMD's Opteron design is not only beating Intel's Xeon in many benchmarks, it is performing well in its first supercomputer systems. As AMD ramps the clock speed, memory bandwidth and Hypertransport performance, we can expect future Opteron supers to set new performance records. Here is an excerpt:
The Number Six supercomputer, built by Linux Networx and in service at Los Alamos National Laboratory, comes in as the highest AMD Opteron processor-based system on the Top500 and operates at a maximal LINPAC performance rate of 8,051Gflop/s with a theoretical peak performance of 11,264 Gflop/s.
Additional AMD Opteron processor-based supercomputers making the TOP500 list include: an installation at Doshisha University's Intelligent Systems Design Laboratory in Kyoto, Japan and built by Visual Technology at number 93, a supercomputer at Lawrence Livermore National Laboratory designed by Linux Networx at number 116, and at number 247, a system built by RackSaver with Arima and Myricom, in service at AMD?s Developer Center in Sunnyvale, California.
Read about the details on Xbit Labs.
In remarkable contrast to the 3,000 square foot floor space of ASCI Q and even larger supers, this supercomputer fits in a closet - six square feet of floor space! Here is a brief description of this powerful midget:
Green Destiny, as shown in figure 3, is the name of our 240-processor supercomputer that fits in a telephone booth and sips less than 5.2 kW of power at full load (and only 3.2 kW when running diskless and computationally idle). It provides affordable, general-purpose computing to our application scientists while sitting in an 85- to 90-degree F dusty warehouse at 7,400 feet above sea level. More importantly, it provides reliable computing cycles without any special facilities-that is, no air conditioning, no humidification control, no air filtration, and no ventilation-and without any downtime.
I found this article to be well worth reading because it takes a much broader look at supercomputer costs. One thing you will learn is that the operational costs per year often outweigh the original purchase price. Read the whole article at ACM Queue. Recommended.
Intel commits $36 million to supercomputer R&D.
The Advanced Computing Program will fund work on clusters and large multiprocessing systems using Intel's off-the-shelf processors. The initiative comes as the U.S. government is backing development work by Cray, IBM and Sun Microsystems on new supercomputer architectures.
Read about Sun's plans for a ZettaByte size self managing storage in the First Workshop on Algorithms and Architectures for Self-Managing Systems.
Cray said it will define a new high-performance programming language and Sun will try to get the industry to standardize on a set of low-level software primitives as part of their proposals to the High Productivity Computing Systems (HPCS) project. IBM will pursue an aggressive extension of today's message-passing interface as part of its software strategy for a petaflops system. Read more on these at EET News.
The three companies last week nabbed about $50 million each from a Defense Advanced Research Projects Agency program that aims to deliver working systems with breakthrough performance by 2011. The money will fund a three-year R&D effort to turn the trio's very different and aggressive paper concepts into realistic implementation plans by 2006. Darpa will then decide on two to fund for building prototypes. The resulting systems are aiming at 10 to 40 times the average performance of today's high-performance computing machines, but must also be easier to program. More on the Darpa story at EET.
Check out the Top 500 fastest supercomputers at Top500. Find out where all of the fastest systems are, who builds them and how they are configured.
NCSA's cluster, called Tungsten, achieved Linpack benchmark performance- the figure used to compile the Top500 list - of 9.8 teraflops (9.8 trillion calculations per second); this is 64 percent of the 15.3-teraflop peak of the dedicated computational component. The full cluster, including the compute and I/O components, is 17.6 teraflops.
From NCSA Director Dan Reed.
"The cluster is a key component of the 31 total teraflops of computing power NCSA provides to the country's scientists. Greater computational performance is the means to gaining critical knowledge about our world, from the accurate prediction of dangerous weather to the understanding of the molecular roots of disease."The Tungsten cluster will provide long-term, dedicated access for applications teams and has been designed for large-scale computational and I/O capabilities that will enable breakthrough computational results. It employs more than 1,450 dual-processor Dell PowerEdge 1750 servers running Red Hat Linux, a Myrinet 2000 high-speed interconnect fabric, an I/O subcluster with more than 120 terabytes of DataDirect storage, and a dedicated applications development environment.
See the full NCSA Press Release, and the home of the NCSA Web Site.
Also check out the Teragrid Update - the second cluster to be deployed as part of the Teragrid Project. It is currently being installed at NCSA as an Itanium 2 Linux cluster that will offer a peak performance of 8 teraflops. You should also visit the Teragrid Web Site,
For a long time, supercomputers have been measured by one standard benchmark - the Linpack computation which is measured today in GigaFlops, that is Billions of Floating Point Operations per second. The current list of the top 500 systems in the world is exclusively determined by that measure.
Supercomputers come in different designs for different purposes, multiple choices for multiple needs, yet they are all measured by a one dimensional benchmark. This makes specific architectures and hardware difficult to evaluate for the many possible uses of supercomputers.
DARPA, the Defense Advanced Research Projects Agency, has initiated a program to develop a broader set of benchmarks to enhance evaluation as well as encourage new architectures to be developed. From the EE Times:
The High Productivity Computing Systems program under the Defense Advanced Research Projects Agency quietly launched in August a three-year effort to deliver by 2006 benchmarks that measure multiple hardware and software aspects of a computer's overall capabilities.
The HPCS' first step will be this coming week, when it launches the so-called HPCchallenge benchmark of five hardware performance metrics. The benchmark, designed to broaden the Linpack benchmark of raw floating-point operations/second (flops) widely used today to rank the world's top supercomputers, will roll out at the SC2003 supercomputing conference in Phoenix.
Read more at EE Times. An overview of DARPA High Productivity Computing Systems (HPCS) is available at their Information Processing Technology Office (DARPA-IPTO).
Next week the Top500 Supercomputer project will announce its latest ranking of the 500 most powerful supercomputers, as measured by an industry-standard benchmark. With a peak speed of 2 teraflops (2 trillion mathematical operations per second), an initial small-scale prototype of IBM's Blue Gene/L supercomputer has been rated as a world-leader, even though it occupies a mere half-rack of space, about one cubic meter.
The full Blue Gene/L machine, which is being built for the Lawrence Livermore National Laboratory in California, will be 128 times larger, occupying 64 full racks. When completed in 2005, IBM expects Blue Gene/L to lead the Top500 supercomputer list. Compared with today's fastest supercomputers, it will be six times faster, consume 1/15th the power per computation and be 10 times more compact than today's fastest supercomputers.
Read more at The Inquirer and IBM's Press Release.
GPFS is IBM's parallel file system, which is being implemented for computing systems at NCSA and SDSC that are part of the TeraGrid system. The TeraGrid (www.teragrid.org) is a National Science Foundation project to build and deploy the world's largest, most comprehensive, distributed infrastructure for open scientific research.
During the SC2003 demonstration, GPFS will be extended beyond the individual machine rooms at the two centers and used with IBM Itanium 2 TeraGrid systems distributed among the SDSC and NCSA booths on the SC2003 show floor, SDSC at the University of California, San Diego, and NCSA at the University of Illinois in Urbana-Champaign. The demonstration will use TeraGrid disk servers at both centers to move data across the TeraGrid network to compute nodes in the booths, where the data can then be used by scientific applications.
Using GPFS, each machine in the distributed system will have the same view of the file systems and will be able to access the same files simultaneously across the TeraGrid Wide Area Network.
Phil Andrews, director of SDSC's High Performance Computing program, said, "We see wide-area, distributed global file systems as a cornerstone of supercomputing in a grid environment. As more high-performance grids arise, we expect to see data-intensive infrastructures that are integral parts of the original systems rather than grafted onto existing environments."
The NCSA/ASCI/SDSC entry is in the "most innovative" category.
During the demonstration, the LLNL I/O application IOR will be run across four locations: the NCSA, ASCI and SDSC booths on the SC03 conference floor and a Linux cluster in the NCSA machine room at the University of Illinois at Urbana-Champaign.
The Lustre File System, developed by Cluster File Systems, Inc., is key; it allows the widely distributed computing nodes of the system to access the Data Direct data storage capabilities distributed between the ASCI booth and NCSA in Urbana-Champaign. In addition, the demonstration will highlight file system interoperability between IA-32 and 64-bit Itanium 2 systems. This will be one of the first trials of a wide area cluster file system with a real I/O application--a fundamental element for grid computing.
"The technology has evolved to the point that every compute node in your local center can now read and write to the same file system. Moreover, this can also be done over the wide area network," said Michelle Butler, head of NCSA's Storage Enabling Technologies group. "These are groundbreaking technologies bringing data centers from across the country together with a common file system."
"The Bandwidth Challenge will showcase how NCSA and our partners have been able to prove the scalability and performance of our architecture in production environments," said Peter Braam, president and chief technology officer of Cluster File Systems. "This demonstration also illustrates another key benefit of Lustre-its support for interoperability between a wide variety of industry-standard protocols and commodity hardware."
NCSA (National Center for Supercomputing Applications) is a national high-performance computing center that develops and deploys cutting-edge computing, networking and information technologies. Located at the University of Illinois at Urbana-Champaign, NCSA is funded by the National Science Foundation. Additional support comes from the state of Illinois, the University of Illinois, private sector partners and other federal agencies. For more information on NCSA, see http://www.ncsa.uiuc.edu/.
For more on the Supercomputing 2003 Conference, check out the SC03 site.
Sony has released the first specifications for its Playstation 3 system, and it's a super number cruncher. From the article:
A four-core chip home server system will be able to deliver one billion floating-point operations per second, apparently. Move up to a 32-core chip - in, say, a blade server module - and you'd get 32 gigaflops of processing power, while a 64-core slab of silicon inside a rack-mount unit doing graphics work would churn out two teraflops, according to Kutaragi's presentation foils.
IBM thinks that future generations of high performance supercomputers will replace copper cables and electronic switches with scalable optical networks. This is a development project designed to deliver petaflop capability in the next 2 1/2 years. Read the Petaflop Networks article at The Inquirer.
Building right on that story, OctigaBay Systems Corp has developed a 58 Gigaflop system with 12 Opterons in a 3U rack mount.
The OctigaBay 12K system packs up to twelve 2.4-GHz Opteron CPUs from Advanced Micro Devices Inc. into a 3U chassis along with the company's own terabit/second class switch, an 8-Gbyte/s maximum fabric with latency as low as 1.2 microseconds between CPU operations. As many as 1,000 of the 58-Gflops units can be interconnected to create a 60-Tflops system with latencies of 150 to 200 nanoseconds per box-to-box hop.
Read more about this new design that reduces the I/O delays in a supercluster at EE Times. Check out more from the PR at Otigabay Systems.
Global supercomputer leader Cray Inc. today announced plans to create a product line based on the "Red Storm" 40-TeraOp (40 trillion calculations per second) supercomputer it is developing for Sandia National Laboratories.
The product, due out in 2004, targets the need for highly scalable microprocessor-based Linux supercomputers with high bandwidth. The Cray product is designed to be more efficient and cost-effective for challenging problems and workloads than clustered SMP systems ("clusters") available in the marketplace, according to company officials.
"Superior efficiency and cost-effectiveness are major benefits of an advanced MPP (massively parallel processing) computer architecture like Red Storm, or the successful Cray T3E? and ASCI Red systems on which Red Storm is modeled," said Peter Ungaro, Cray vice president, worldwide sales and marketing. "Even in very large systems with thousands of processors, this new MPP product is designed to function as a single high-efficiency computer, balanced with massive bandwidth to exploit its high-speed processors."
Read the full announcement at Cray. See also Red Storm Details at The Inquirer and a link to more information about Red Storm in RedStormcamp.pdf
Los Alamos National Labs needs lots of storage. They are now able to order up to 600 TB, (0.6 petabytes or 600,000 Gigabytes), from Panasas. This storage system is based on an object design which enables higher thruput, up to 30 times faster than non-object systems. The folks at CERN and the Large Hadron Collider (LHC) project will be glad to hear about this. Check out the Press Release at Panasas for details.
The first super to be built on the new Mac server with 2,200 IBM 970 chips is showing surprising results at over 7 Teraflops. The official place in the Top 500 list will be formally announced next month at Supercomputing 2003, held in Phoenix AZ this year. From the Mercury News:
"We are demonstrating that you can build a very high performance machine for a fifth to a tenth of the cost of what supercomputers now cost,'' said Hassan Aref, the dean of the School of Engineering at Virginia Tech in Blacksburg, Va. The computer was put together in a virtual flash. Scientists from the school met with Apple executives two days after the company introduced its new 64-bit desktop computer in June. "
Egenera has announced some low cost blades with virtualization software and remote management. What is unusual about this is the size - the whole unit mounts up to 24 Xeon processors in a rack mount unit twenty three inches high. Check out the Cluster and Blades section of the Supercomputer Index.
I don't normally report on Yet Another Cluster Announcement (YACA), but this one seems to be the right idea at the right time. Appro, a cluster and super supplier, has reduced the cabinet size for small cluster needs so it can fit in without being obtrusive. They offer up to 16 blades plus a master system, and all the blades can be moved to a full size cabinet if you outgrow the small one. Read more about Appro's Mini Cluster.
DARPA has awarded a four year, $30 million research effort to IBM and Agilent Technologies to reach the terabit per second level.
"The main goal of this program is to reach terabit-per-second speeds in a form factor small enough to enable chip-to-chip interconnects, said Waguih Ishak, director of the Communications and Optics Research Laboratory at Agilent. This will only be achieved by developing miniature optical components, pushing their operating speeds to 40 gigabits per second (Gbps) and higher, and by clever integration and packaging techniques."
The effect this level of interconnect speed will have on supercomputers can only be estimated at this time, but instead of a 4 to 16 processor SMP being a cluster component, a whole rack, perhaps 256 processors, might operate in a SMP or NUMA (Non Uniform Memory Access) arrangement with delays similar to todays shared memory SMP systems. "Supercomputer in a Box" anyone? Read more at EETimes.
Virginia Polytechnic Institute and State University are building a Power Mac G5 cluster. The G5s will be connected with Infiniband to form a 1100-node supercomputer (est.10 Teraflops). The cluster is expected to rank as one of the top five supercomputers in the world. Read more at Think Secret.
DOE's Pacific Northwest National Labs just had its Itanium supercomputer upgraded with 1,400 Itanium Madison 1.5 GHz processors, making it currently the world's fastest Linux super. Performance jumped from 6.2 trillion floating point operations per second (T FLOPS) to 11.8T FLOPS. Read more at Infoworld.
The Power5 processor will be used in a nuclear weapons simulation supercomputer at Lawrence Livermore National Laboratory. That machine, called ASCI Purple, is slated to use 12,544 Power5 chips. ASCI Purple, due to be running by the end of 2004, is expected to have 196 interconnected 64-processor servers, making a total of 12,544 Power5 chips. It will come with 50 terabytes of memory and will also will have IBM disk storage arrays holding 2 petabytes, or a quadrillion bytes, of data.
As for physical size, ASCI Purple will weigh about 197 tons, be linked to 119 miles of optical cable and 28 miles of copper cable, and occupy 8,900 square feet of floor space--or about two basketball courts. It will consume 4.7 megawatts of power, enough current for 4,000 homes, according to IBM. More on ASCI Purple at C|Net.
At the Hot Chips conference that started August 18, experts talked about ways to build new and faster supers. They are:
The new AMD Opteron chip is a clear hit with makers of large clusters and supercomputers. The combination of x86 code compatibility, high speed links on the chip, 64 bit native mode with a large address space, and very good floating point performance have created real competition for Xeon and Itanium systems.
The two supers will be built for Los Alamos National Labs (LANL). Two large Opteron-Linux clusters which together will use over 3,300 of its 64-bit processors. The bigger system, named Lightning, will include over 2,800 Opterons and run at a peak of 11.2 teraFLOPS, supporting nuclear simulations. The smaller 512 processor system will be used for medical, scientific and environmental research. See The Inquirer for more, and visit LANL's Science Site. Checkout the Lightning News Release and the Lightning Fact Sheet for all the details.
The National Academy of Sciences (NAS) is expected to publish an interim report tomorrow on supercomputing, although details of the study are being kept under wraps and a full report is not expected until later next year, according to industry sources. Check out this EETimes article.
The National Academy of Sciences (NAS) interim report on the future of supercomputing is one of three government reports expected to influence federally-funded research in high-end computing. Taken together, the three reports are expected to reinforce current work in clusters of systems based on off-the-shelf processors while bolstering R&D in custom technologies that could power petaflops-class systems. In another outcome of the studies, government researchers are expanding their plans to acquire high-end systems capable of performing 100 TeraFlops. More on this in EETimes.
Terra Soft, the company behind the Yellow Dog and Black Lab Linux PowerPC distros, has released a supercomputer cluster based on Apple's iMac. Each unit is comprised of eight of Marathon's "iRacks": an iMac but without the monitor or casing. More from The Register. Links to Marathon Computer and Terra Soft.
This will become the third fastest supercomputer in the world when it goes live in the fall of 2003. The new cluster will employ more than 1,450 dual-processor Dell PowerEdge 1750 servers running Red Hat Linux, a Myrinet 2000 high-speed interconnect fabric, an I/O subcluster with more than 120 terabytes of DataDirect storage, and a dedicated 64-node applications development environment. More details on the new super are linked in the title above.
NCSA (National Center for Supercomputing Applications) is a national high-performance computing center that develops and deploys cutting-edge computing, networking and information technologies. For more information about NCSA, see http://www.ncsa.uiuc.edu/.
The Japanese National Institute of Advanced Industrial Science and Technology's new Linux-based supercomputing cluster is set to be the world's third most powerful supercomputer. The system will comprise 1058 IBM dual-processor 1U rack-mounted eServer 325s. Each server is based on a pair of AMD Opteron 200-series CPUs.
All together you have a system capable of performing more than 11 trillion calculations per second, making it the world's third most powerful supercomputer chart. More at TheRegister.
Visual Technology has a Japanese order for a parallel computer which includes 512 Opteron chips.The machine will run the Linux OS, and is expected to exceed one trillion floating point operations per second (1 Teraflop). More at TheInquirer.
When Sony released the Linux Kit for the PS2, interest in the machines spread beyond the gaming community to a seemingly unlikely place: the National Center for Supercomputing Applications (NCSA).
Researchers at NCSA theorized that because of the unique processor of the PS2-with its two powerful vector units designed to manipulate polygons for game displays-a cluster of the consoles could potentially be used for scientific computation.
The Sony Linux Kit enables such a dramatic repurposing. The kit gives programmers direct access to the processor's vector units and provides a working and development environment that contains the tools found on more traditional Linux systems. This allows the units to be used for nongraphic, computationally intensive tasks.
NCSA's researchers now have 70 PlayStation2 consoles running as a cluster using an Ethernet network. Some of the tools commonly found on more traditional high- performance computing clusters have been integrated into the system, including the Message Passing Interface (MPI), which allows the individual consoles to communicate with each other and execute application across all of the machines simultaneously, and the Portable Batch System (PBS) and Maui Scheduler, which manage jobs on cluster systems.
"This is literally a hand-made supercomputer assembled by a creative group of researchers who are constantly pushing the limits of our field," said Dan Reed, director of NCSA. "Many people have talked about the possibilities of the PlayStation's graphic processors, but to our knowledge, no one else has attempted to make these machines perform as a large, integrated Linux cluster. We have shown it is possible, and the long-term result could be another low-cost computing alternative for the scientific community."
For more info about the PS2 Supercomputer
You may have noticed that most of the recent Journal updates are about supercomputers of one kind or another. There are a number of reasons for this.
All in all, the range of supercomputer capabilities has reached a critical mass and I expect it to accelerate substantially in 2004. By then I expect such announcements to be so common they just get listed as bullet points.
From the INQUIRER:
"DAWNING (A Chinese company) SAID today that it will fashion the fastest supercomputer in China with a machine capable of delivering 10,000 GFLOPS and using 2000 AMD Opteron microprocessors.
Reports said the Dawning 4000A will use clustering and hypertransport linking together over 2000 Opterons, in a machine that's called the Red Grid."
From The INQUIRER:
Last week AMD and Cray visited Japan and told the press about the Red Storm supercomputer which uses a total of 10,368 Opterons.
The machine, it transpires, will use the 148 Opteron with the essential glue being provided by Cray to make the multiprocessing super computer sing to the tune of 40 TFLOPS.
The article on PC Watch said the 10,368 Opterons are linked in a three dimensional mesh using Hypertransport.
Cray will deliver a system with theoretical peak performance of 40 trillion calculations per second (teraOPS) using two calculations/clock cycle, or 20 teraOPS using one calculation/clock cycle. Red Storm is expected to become operational in fiscal year 2004, and will use the upcoming Advanced Micro Devices Inc. (NYSE: AMD) Opteron(TM) processors connected via a low-latency, high-bandwidth, three-dimensional mesh interconnect network based on HyperTransport? technology. This system is expected to be at least seven times more powerful than Sandia's current ASCI Red supercomputer on actual weapons problems. ASCI Red was the first supercomputer delivered under the ASCI program.
"This computer will allow modeling and simulation of complex problems that were only recently thought impractical, if not impossible," said Tom Hunter, Sandia Senior Vice President for Nuclear Weapons Programs. "Calculations that would have taken months only a dozen years ago will now be done in a matter of minutes. This investment by Sandia and the NNSA represents a clear commitment to provide the essential capabilities to support the nation's nuclear weapons program. It is a major step toward establishing computing as the key enabler of science and engineering in the 21st century and reemphasizes our role as one of the world's leaders in that transformation."
While HPC clusters have been the major source of new supercomputer systems, I predict that Blade and Infiniband technology will replace traditional clusters in many areas. The advantages, listed below, make installation, management and expansion a routine process, much like traditional systems that come complete from one supplier. The final challenge is to reduce the power and A/C requirements. Some Transmeta blade systems have pioneered in that area.
For the supercomputer cluster, Appro has reduced the space required, eliminated much of the interconnect cabling, and added management tools. All this and easy expansion plus top performance make this approach very advantageous for everything from "Supers in a Closet" to massive high end clusters with thousands of processors. Read more here.
SGI has launched new visualization servers named Onyx4. They come in desktop, deskside and large towers for the high end.
SGI claims its new high-end visualization server has a price/performance ratio 40 times better than the next best technology. This may open up new markets for visualization systems where, previously, all but the largest organizations have found the price prohibitive.
The Onyx4 UltimateVision system is an offshoot of the same NUMAflex shared memory clustering technology that is at the heart of the company's Origin 3000 MIPS-Irix HPC servers.
Story in The Register, also in The Inquirer, and the full scoop from SGI.
A cluster supercomputer Linux Networx built for Lawrence Livermore National Laboratory (LLNL) in 2002, advanced two rankings to become the third fastest supercomputer in the world on the 21st TOP500 supercomputing list (www.top500.org). The Linux Networx Evolocity? system, called MCR by LLNL, can process 7.6 trillion calculations per second (teraflops) running the Linpack benchmark, and is also the fastest Linux cluster in the world. Read more here.
Researchers at Los Alamos National Labs have struck computing gold once again with an open source project that could benefit genetic research.
Three scientists have tried their hand at improving the popular BLAST (Basic Local Alignment Search Tool) search algorithms. The group decided to chop up a BLAST database and spread it across a number of servers instead of throwing lots of horsepower at a single data set. In so doing, the need to run I/O requests to disk was eliminated and the researchers saw huge, super-linear performance gains.
The experiment to put little bits of a database in memory instead of on disk proved a success and has since drawn considerable attention to mpiBLAST from pharmaceutical companies, researchers and even Microsoft. Read more here.
The way semiconductor design is going CPUs will generate more heat than a nuclear reactor by 2015. This calculation, a side-effect of the world-famous Moore's Law, is known, to us anyway, as the Gelsinger co-efficient. Chipmakers and their suppliers are developing new materials, shrinking the die size and investing in sundry manufacturing techniques to address this hot issue.
But this is not much help to some of the big tin in operation today. Where better to go than Los Alamos, home of the Nuclear Bomb and some absolutely supercomputers. One, a beast called Q, will consume enough energy to power 5,000 homes when it's fully up and running later this year, drawing 3 megawatts for the machine, and 2 megawatts for the cooling system. It lives in the Nicholas C. Metropolis Center for Modeling and Simulation, a three-storey 333,000 sqft structure which incorporates several cooling towers, and cost $93m to build.
Compare and contrast with a new Los Alamos supercomputer, the Transmeta-low power CPU- based Green Destiny. "Though Q will be almost 200 times as fast, it will cost 640 times as much - $215 million, compared with $335,000 for Green Destiny". And it doesn't need to live in a $93m temperature controlled, dust-controlled building. It measures two by three feet and stands six and a half feet high.
Read the whole article here in the Register.
Does the phrase 'supercomputer in a closet' strike a strange note for you? How about the idea of IBM's Blue Horizon being the last dinosaur of the supercomputer era? Blue Horizon has 42 towers and takes 1500 square feet of floor space.
If this dichotomy seems strange, then the following quote from Dr. Wu-chun (Wu) Feng at Los Alamos National Laboratory (LANL) may clarify the situation.
"While this project believes that performance and price/performance are important metrics in supercomputing and cluster computing, Dr. Feng believes that the key metrics of this decade will be efficiency, reliability, and availability and that these metrics will have even broader applicability to the general Internet and ubiquitous computing."
It may be that the future of supercomputers is (physically) smaller than you thought. Once again, all of those SF stories about huge computers taking over missed an important limit - power density. Read more about SSS at LANL .
The University of Liverpool today unveiled a new supercomputer cluster which is expected to be one of the World's 100 most powerful systems when it goes live next month.
The cluster comprises 940 Intel Pentium 4-powered, Dell PowerEdge 650 HPCCs, and will be used for mapping global virus outbreaks, such as SARS, as well as research into physics and nuclear sciences. The entrepreneurial business centre at the University which assists start-up companies in Merseyside will also get some compute time.
The University's Department of Physics will use powerful system (dubbed ULGRID) to simulate the collision of particles to help determine the origins of the universe. In addition, ULGRID and the Advanced Institute for Methods and Emergent Systems (AiMeS) will harness the power of the cluster to assist in research with the World Health Organisation in simulating the spread of disease epidemics, such as SARS.
Read more about this here.
U.S. and European scientists have set a new data transfer speed record, shattering the previous mark using nothing but good old fashioned Ethernet.
The researchers sent one terabyte of data from Sunnyvale, California to Geneva in less than an hour. Their 2.38Gb/s sustained rate for a single TCP/IP data stream beat the old top mark by a factor of 2.5. At this rate, users could send a full CD in 2.3 seconds or 200 full length DVD movies in an hour.
"To put the numbers into perspective, at a transfer rate of 2.38 Gb/s, we could easily transfer the printed text in the entire Library of Congress in less than a day between Sunnyvale, California and Geneva, Switzerland," said Dr. Wu-chun Feng, team leader of network research RADIANT at Los Alamos National Labs.
More details on this here.
Researchers with the National Partnership for Advanced Computational Infrastructure (NPACI) at the San Diego Supercomputer Center (SDSC) have released version 2.3.2 of the NPACI Rocks cluster toolkit for both 64-bit and 32-bit processors. NPACI Rocks is developed by the Grid and Cluster Computing Group at SDSC and by partners at the University of California, Berkeley, Singapore Computing Systems in Singapore, and individual open-source software developers.
This is the first co-release for both Itanium2 (IA64) and x86 (Pentium, Athlon, and others) based clusters. The release is available to download and burn onto a bootable CD for x86 or DVD for Itanium2. Versions for both processor families are available at http://www.rocksclusters.org/.
The Portland Group(TM) Compiler Technology team of STMicroelectronics (NYSE: STM) today announced the availability of a Beta Release of the PGI(R) Workstation 5.0 Fortran and C compilers for AMD Opteron(TM) processors. This is the first publicly available release of STMicroelectronics' upcoming suite of optimizing software development tools for AMD 64-bit technology processors. http://www.pgroup.com/AMD64
"This week, the AHPCRC demonstrated the use of the NCAR/Penn State Mesoscale Weather Forecast Model, MM5, on the Cray X1 system to produce a forecast for the entire United States with a resolution of 5 kilometers," Muzio said. "MM5 is used worldwide and also within the Department of Defense (DoD) for operational weather forecasts. Current operational weather models that cover all of the United States are typically run at a resolution of about 10 kilometers. The 5-kilometer model requires approximately eight times as much computation as the 10-kilometer model and four times as much memory (20 billion bytes).
"In executing this model, the Cray X1 system sustained 36.7 billion floating point operations per second in the forecast steps on just 16 multi-streaming processors and simulated one hour of atmospheric physics and dynamics in 8.4 minutes on average, or 24 simulation hours in under 3.5 wall clock hours," he added.
According to Muzio, "The AHPCRC has seen similar performance on other important DoD applications. For example, on a widely used AHPCRC unstructured mesh fluid dynamics application, we have seen linear scaling across all the application processors with a sustained performance of 4 gigaflops per processor, or 117 gigaflops on 28 multi-streaming processors."
The full announcement can be found at Cray News.
The NSF will build on past efforts with NPACI in this area by upgrading the systems and network plus extending it to new participants. Over the next few years, this program will chnage the way science and engineering is done.
NSF Announces Initial Steps Toward a New Cyberinfrastructure
The National Science Foundation (NSF) announces the first steps it is taking to develop a state-of-the-art cyberinfrastructure likely to revolutionize the conduct of science and engineering research and education. These steps leverage the agency's recent investments in the Extensible Terascale Facility and its six-year investments in the Partnerships for Advanced Computational Infrastructure.
The Extensible Terascale Facility (ETF) will integrate terascale computing-communication-information resources at five partner sites: the Argonne National Lab, the Center for Advanced Computing Research at the California Institute of Technology, the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign, the Pittsburgh Supercomputer Center, and the San Diego Supercomputer Center at the University of California at San Diego. This year, NSF will add new partners to the ETF to enhance the scientific utility of this distributed, heterogeneous grid facility.
CHAMPAIGN, IL, March 31, 2003 -- NCSA's new IBM POWER4 p690 supercomputer, capable of performing two trillion operations per second, becomes available to the general scientific research community on Tuesday, April, 1, the center announced today.
The system will be used by researchers in a wide range of science and engineering disciplines, including chemistry, biology, astrophysics, atmospheric sciences, materials sciences, high-energy physics, and structural mechanics. Some of the questions these researchers will investigate include how biological systems work at the molecular and atomic level, how to better understand and predict severe storms, and how to build stronger, more stress-resistant aircraft and spacecraft.
BERKELEY, CA -- The NERSC Center put its 10 teraflop/s (10 trillion calculations per second) IBM supercomputer into service last week, providing researchers across the country with the most powerful computer for unclassified research in the United States. NERSC is sponsored by the Department of Energy, Office of Science, and is part of the Computing Sciences Directorate at Lawrence Berkeley National Laboratory. High-speed remote access to NERSC is provided by ESnet.
The IBM supercomputer, which comprises 6,656 processors, entered production a month ahead of schedule, meaning that the system will provide up to 4 million more processor hours of computing time in the current fiscal year. The NERSC Center serves more than 2,000 researchers at national laboratories and universities across the country.
The San Diego Supercomputer Center (SDSC) at the University of California, San Diego, the leading-edge site of the National Science Foundation (NSF) National Partnership for Advanced Computational Infrastructure (NPACI), has announced that it will deploy a major new data-oriented computer resource. The new machine, SDSC DataStar, will be a 7 teraflop/s IBM Regatta system (seven trillion floating-point operations per second) and will leverage SDSC's international leadership in data and knowledge systems to address the growing importance of large-scale data in scientific computing. The new system will be designed to flexibly handle both data-intensive and traditional compute-intensive applications, and will be linked to the national information cyberinfrastructure. DataStar is scheduled to be installed in the summer of 2003.
The new system will offer many innovations for users. Today, data collections for astronomy, physics, and other disciplines have reached terabyte size (trillions of bytes), and will grow to petabytes (1,000 times larger) in just the next few years. High-end computers are typically not configured so that users can easily compute with and move large data sets into and out of the machine, and this forms a significant impediment for scientists in extending their simulations and analysis to the largest scales. To help data-intensive users, DataStar will be specifically designed to host high-end, data-oriented computations, and will be integrated with SDSC's Storage Area Network, or SAN, which will provide 500 terabytes of online disk, and the six petabyte capacity High Performance Storage System (HPSS) for archival storage. DataStar will also be linked through the national information infrastructure grid to a wide spectrum of other resources.
Now you can get Super in a single rack. Price - if you have to ask, you can't afford it.
The San Diego Supercomputer Center (SDSC) at UCSD has released version 2.0.0 of the popular SDSC Storage Resource Broker (SRB) middleware package, which enables scientists to create, manage, and collaborate with unified "virtual data collections" that are located on heterogeneous data resources distributed across a network. While existing capabilities are preserved for current users, major enhancements "under the hood" give version 2.0 a large number of faster and more powerful services. SDSC SRB version 2.0.0 along with the user manual and release notes are available online - See also SRB Downloads.
BIRN, the Biomedical Informatics Research Network, is an NIH-funded project with considerable SDSC participation. It is especially interesting as a suprainstitutional project with large-scale computational and data requirements, because it is a testbed for a new kind of scientific endeavor requiring not only new technology, but also an unusual degree of cooperation among participants. We reprint here a shortened version of a recent article about BIRN by Stephanie Sides of the California Institute for Telecommunications and Information Technology [Cal-(IT)2] that impressed us because it highlights the sociological aspects of such a collaboration.
See Also: "Computational Science" at Byte.com (Annual fee $20)
Global Shared Memory with the SGI Altix 3000 Family of Servers and Superclusters
That's the concept behind the new SGI Altix 3000 family of servers and superclusters. SGI Altix 3000 employs scalable 64-bit Linux clustering to create what is simply the world's most powerful open-source computing environment. Thanks to its implementation of global shared memory, its optimized operating environment, and its ability to scale to 64 processors in a single Linux OS image, SGI Altix 3000 enables extraordinary capability breakthroughs compared with traditional Linux clusters. The SGI Altix 3000 family allows users to solve highly complex computational problems in record time.
See also: SGI Super Cluster at The Register.
The TeraGrid at five initial sites (NCSA, SDSC, PSC, Caltech, and Argonne National Laboratory) will be tightly integrated over the next few months into a national information infrastructure. The first delivery of 128 compute nodes and ancillary equipment came to SDSC. Further deliveries of TeraGrid equipment are being made to all sites.
"This is indeed an historic moment," said SDSC and NPACI Director Fran Berman. "The TeraGrid is at the cutting edge of hardware, software, and human infrastructure, and it is providing critical experience for the next generation of cyberinfrastructure.
See also: Teragrid Schematic
The Center for Computational Sciences (CCS) at ORNL will deploy the Cray X1 system to test its effectiveness in solving important scientific problems in climate, biology, nanoscale materials, fusion and astrophysics.
Raymond L. Orbach, director of DOE's Office of Science, said the ORNL-Cray partnership is one of the first steps in the initiative to explore computational architectures essential to 21st century scientific leadership. "Modern computational methods are developing at such a rapid rate that computational simulation is possible on a scale that is comparable in importance with experiment and theory," Orbach said.
See also: Cray X1 Information
Cray Inc may have only recently released its flagship X1 supercomputer, but it is already making plans to follow it up with a machine that currently goes under the codename "Black Widow".
According to Seattle, Washington-based Cray, Black Widow will be instruction set compatible with X1 and is expected to have a peak performance of "several hundred teraflops" on release. With two stages of enhancements, its performance will grow to in excess of one petaflop.
Cray will deliver a system with theoretical peak performance of 40 trillion calculations per second (teraOPS) using two calculations/clock cycle, or 20 teraOPS using one calculation/clock cycle. Red Storm is expected to become operational in fiscal year 2004, and will use the upcoming Advanced Micro Devices Inc. (NYSE: AMD) Opteron�processors connected via a low-latency, high-bandwidth, three-dimensional mesh interconnect network based on HyperTransport�technology. This system is expected to be at least seven times more powerful than Sandia� current ASCI Red supercomputer on actual weapons problems. ASCI Red was the first supercomputer delivered under the ASCI program.
See Also: "16,000 Hammers in Sandia supercomputer"