2001 Column Index

Meta Clusters; OS Updates

by Bill Nicholls 14Aug2001

Computing as a Utility

Today's computers sit on your desk, or in a room in the office building, connected directly to the systems and people they serve. It's a lot like to early days of the industrial revolution, before electric motors. Every shop that needed mechanical power had to have a primary mover such as a water wheel or steam engine at or in the shop, with axles, wheels and pulleys to transfer the power from the primary to where it was used.

Now change all those power words to computer and see the mainframe era (1960s to 1980s) architecture again. Next, add electric motors and you have distributed power, connected by wires. It looks just like today's distributed computer architecture. The next step is more obvious now - a utility with lots of computer power connected by a distribution system to each home and business, very similar to current electric utilities, but hopefully, less prone to blackouts.

The computer utility is already being built, first by scientists who wanted to solve hard problems, now by businesses who want computing power but not the headaches of owning it. Outsourcing was the first step of removing information processing from business employees to specialists, who could share people and resources efficiently to make a profit. Now the next step is upon us.

The computer utility already exists for scientists and engineers. It is called Grid Computing, a specific meta cluster approach. Soon, businesses who want computing power but not the headaches of owning it will be able to rent time from IBM and other utility suppliers. Outsourcing was the first step of removing information processing details from business employees to specialists, who could share people and resources efficiently to make a profit. Now the next step is almost upon us.

A computing utility will be a collection of computers, probably in several locations, connected by high speed lines internally, and the Internet to its customers. Such a system would be a meta cluster of computers, running under a common control system. But even as electric power can be from coal, nuclear, hydro or gas, computing power can come from Cray, IBM, Intel, Sun, HP or many others. The differences between computers makes managing the meta cluster a challenging problem, but one with great potential.

What is a Meta Cluster?

A Cluster is a group of computers in physical proximity connected by a high speed network. Clusters of computing systems come in two basic kinds. A cluster is Uniform if all of the processors are the same. A cluster is Heterogeneous if the computers are connected by a high speed network, but the processors are different or use different operating systems.

Beowulf clusters are an example of the first kind - clusters with a uniform set of systems running the same operating system. An example of the second kind is the heterogeneous set of supercomputers at SDSC, running HP, Intel, IBM and Cray systems all on the same site. Controlling this environment is much more difficult than a uniform cluster. For instance, job submission software must handle the different OS interfaces, character sets and the different order of bits (bits are counted right to left or left to right) in the computer as well as different word lengths.

Meta Clusters are clusters of clusters. The meta cluster is usually a group of clusters which are geographically distributed, nationally or around the world, but can be treated as a single resource by some very advanced software. Meta clusters also come in uniform and heterogeneous types.

Meta cluster software faces a much bigger hurdle than just heterogeneous hardware and software. Once the clusters are distributed, they are under different owners, have various management and security setups, no two alike, use different accounting techniques and often have unique job submission systems. In short, the meta cluster problem is mostly one of organization and management at the human level, not the computer level.

Meta Clusters in the News

Meta cluster technology has also been called Grid Computing or Super Clusters. In the last month, two major announcements have made this technology very visible to both commercial and scientific communities.

First, IBM announced its plans to build fifty clusters for commercial use: BBC Article: IBM Announcement:

" IBM will invest US$4 billion to build 50 computer server farms worldwide, a computing power grid that will allow customers to buy computing power and storage capacity over the Internet on demand."

Second, The National Science Foundation has just announced a $53 million award to four major supercomputer sites. It is a new form of meta cluster, uniform but distributed, called Teragrid:

"The four research institutions in the DTF project are the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, the San Diego Supercomputer Center (SDSC) at the University of California at San Diego, Argonne National Laboratory, and the California Institute of Technology."

One of the big differences from earlier distributed clusters is the planned 40 gigabit dedicated optical network that will link the sites. The meta cluster, when completed in 2003, will total 13.6 Teraflops (TF) (10**12 floating point operations = 1 TFlop). The uniform nature of the Linux clusters and the extreme network will make operation of the distributed cluster both easier to manage and faster on shared tasks.

Meta Cluster Tools

Building Meta Cluster Software (MCS) from scratch could sound attractive until you take a good look at the requirements. Overall, heterogeneous meta cluster software is a good deal more complex than even a cluster operating system. This is because there are several MCS tasks that represent very hard organizational and technical problems:

Fortunately there are alternatives available in toolkits. The broadest approach is the Globus Project.

"The Globus project is developing fundamental technologies needed to build computational grids. Grids are persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations."

Linked to the Globus Project is the Cactus Code Server (CCS). CCS is an application toolset that enables reuse and collaboration:

"Cactus is an open source problem solving environment designed for scientists and engineers. Its modular structure easily enables parallel computation across different architectures and collaborative code development between different groups."

Despite the orientation to science and engineering, the toolkit could enable all sorts of collaboration between disparate groups. One possible use for this kind of collaboration is the creation of 'Virtual Companies." If an emergency situation happened, experts, software models and computing power could be assembled into a collaborative group that quickly provides precise responses to a complex situation in time to reduce the overall impact of the emergency.

For special needs in large companies who already have distributed systems but don't need the full capability of other toolsets, there is now a proposed standard base system that can be customized named Jiro. Jiro has been supported by Sun Microsystems as an implementation of the standard Federated Management Architecture (FMA). Jiro is based on Jini, Java and FMA standards, and is designed as a cross platform toolset to build custom policy and management control systems.

Although the examples in the book are aimed at Storage Area Networks (SANs), the Jiro technology can be used to manage any programmable computer components. Documentation is available at and the first two chapters of the Jiro Programmers Guide are available for download.

Available Meta Cluster Software

One of the earliest meta cluster software developments began at Virginia University in 1993 with the development of Legion. This software was first publicly released in 1997 and is now a supported product for researchers and commercial use. Corporate distribution is handled by Avaki. A good overview of Legion from a user's view is presented here.

A different approach for your meta cluster software is available as a Public License from Globus. The toolkit is modular, which enables custom solutions that don't require the Legion license or the full set of meta cluster tools. The toolkit provides software for:

The current release is 1.1.3 (Now 2.0 as of June 2002) and it is available with the required supporting software for download.

Meta Cluster Communities

To help tie all this MCS together, a community named the Global Grid Forum is available with working groups, meeting reports, newsletters and other contact information. This is the place to go for current information on meta cluster software and to join people working on the same problems.

Click on Grid Initiatives & Projects and scroll down the right frame. In addition to a long list of existing systems, at the bottom are several great resource lists:

This is the most comprehensive list of resources I have discovered. It's a great place to start discovering the current status and availability of meta cluster software.

Meta clusters are not an overnight phenomena. The Economist has an excellent overview of how this technology developed, how it can be used and some future possibilities. The Economist - Computing Power on Tap

An example of using the Grid is "Harnessing the Power of Grid Computing" by Mike Gannis, SDSC, and Karen Green, NCSA. It appeared in the May 16 issue of SDSC Online.

The Data Intensive Computing Environment (DICE) is a project of the San Diego Supercomputer Center (SDSC) to make scientific data and computing available available without regard to location, format or computer.

"The goal is to provide integrated access to data sets stored on NPACI resources and to support remote execution of digital library and presentation services."

NASA is in the meta cluster business too. Their project, named the Information Power Grid (IPG) is a collection of several NASA sites across the country.

Next time I'll dig deeper into some of the meta cluster software.

Operating System Updates

The BSD groups are working hard to move BSD ahead. FreeBSD release planning [http://www.freebsd.org/releases/index.html] currently shows:

"The next scheduled release on the -stable branch will be FreeBSD 4.4 on August 31, 2001. The first release on what is now the -current branch will be FreeBSD 5.0, scheduled for the fourth quarter of 2001."

Update July 2002: FreeBSD 4.6 and 5.0 experimental are shipping.

Another BSD based system which I overlooked earlier is making a big impact. Mac OS X, which has a lot of core BSD code, has pulled off quite a technical accomplishment by enabling both current and future Mac code to run as a GUI with protected virtual memory.

Apple is about to take OS X to the next level. With the planned release of version 10.1 in September, (10.2 soon) several speed ups and enhanced features, plus a refinement to Aqua, the OS X GUI, and a new DVD player. Get the full specs here.

There is activity in OS/2 land as well. Software subscriptions and Convenience packages will continue to be available from IBM, but there's a new player now shipping. In what appears to be the bargain of the year, eCS, the e-Comm Station is now shipping version 1.0. eCS is supported and sold by Serenity Systems.

This package, eCS, not only contains the updated OS/2 software, but a lot of applications as well. Included in OS/2 are SMP support and JFS, the Journaling File System. Applications include Applause from Solution Technologies, the Lotus Smart Suite and others.

IBM's Netscape 4.61 has been updated as of July 9, 2001. It is still a free feature. Download here.

Mozilla has released version 0.9.3 (now 1.0) and there is an OS/2 version about halfway down the page. I've installed this one, but had problems which may have been local. I've pulled back to version 0.9.2. At the current moment, I have 32 open windows under Warpzilla 0.9.2 with better performance than Netscape 4.61 prior to the current release.

A final OS/2 note. With the current memory prices at $43 for 256 MB of 133 MHz ECC from Crucial, there is no longer a reason to run your system with less if you do more than casual use. The elimination of most swapping not only speeds up the performance, but it also seems to help stability. I've been running ECC for a long time, so it's not that. The occasional hangs of individual programs have stopped, activation is snappy and this is an easy upgrade. I recommend ECC since the extra cost is about $1 per chip over the non-parity version.

[30]