2003 Index

Computational Science

by Bill Nicholls
Vers 1c 12Feb2003
"Computers make scientific discoveries happen faster"

Introduction

The cliche scientist is often seen as a white coated, bad hair day man who absently wanders around a university campus making obscure chalk marks on any convenient blackboard. This cliche has been hammered home by TV and movies until recently. Films like "A Brilliant Mind" have broken that mold, and all to our benefit.

What scientists do is conceptually simple - it is the details of how it is done that make the process appear complex. For a simple explanation of what the scientific method is, see "The Scientific Method" at the end of this column.

In addition to a specific method, scientists are usually masters of mathematics for theory and of numbers for experiments. This is best explained by a quote from Lord Kelvin (Sir William Thomson), 1824-1907, British Physicist:

"When you can measure what you are speaking about and express it in numbers, you know something about it."

Science has seen major benefits from the increased capability to manipulate numbers rapidly by the computer. At first this was simply used to take measurements from experiments and determine their statistical significance. Over time, this number manipulation became intensive number crunching, from generating graphs to directly comparing measurements to expected answers.

Later, computers were connected directly to measuring devices to speed up the whole process of capturing measurements accurately and automate the reduction of measurement to numerical equations or graphs.

From Calculations to Simulations

The next step in the evolution of how computers are used in science was to create numerical experiments. A hypothesis is used as the basis for a program that simulated some physical process and that program creates numbers which represent what happens.

First hand experience with this process in the 1960s made it clear why not much science was done that way until recently. Computer time on large scale machines like the Univac 1107 were scarce and expensive, programmers who knew physics were rare, and few physicists would then have put much faith in a computer's output.

My senior paper on an "Alpha Plasma Pinch," simulating a contained fusion reaction was not difficult to write. But the amount of data in a 15 minute run filled three tape reels, which then had to be analyzed.

Reducing data is the biggest problem. Analyzing a huge pile of numbers to deduce what happened to the plasma pinch was at least an order of magnitude more difficult than creating the data. A major effort to do science on this one question would have required the entire computer facility for data storage and analysis. It simply wasn't economic then.

Thirty Years Later

Only in the 1990s did computers become plentiful and fast enough to make the next step, direct computation of scientific results, an economically feasible process. Three technologies have evolved to support different approaches to these intense computational environments.

The best known is the "Distributed Computing," using thousands to millions of personal computers distributed around the Internet. The early leader in this was the SETI (Search for Extraterrestrial Intelligence) project. Even as I write this, my system is computing SETI in the background.

Distributed projects number more than a hundred, including Protein Folding, RSA encryption challenges, and Smallpox analysis.

Not as well known by the public, but with greater potential, is the use of Cluster environments to run projects which require much faster communications channels than the typical Internet dial up or DSL. Clusters were originally created for scientific research programs, where large numbers of very fast processors could be allocated to work on specific problems and the huge result files could be easily managed.

The success of the Cluster concept that enabled widely separated researchers to collaborate on complex simulations over the Internet created a demand for more and larger clusters This resulted in Clusters of clusters, or Grids.

Grids and Supercomputers In the 21st Century

Grids also had challenges to overcome, starting with allocation of time and billing at separate sites, hardware differences, and most difficult, software control systems that were different in philosophy, structure and implementation.

The resourceful scientists and engineers who earlier gave us the Internet solved these problems too. In an Open Source process, these people created Globus, a nonprofit organization to coordinate and build a common set of code that runs on all the major systems yet enables wide latitude in local standards.

This Globus Toolkit is now in alpha release for version 3, based on an architecture to enable further growth without having to redesign or require uniform compliance. This Globus Toolkit has created such a useful capability that it is being commercially developed by IBM and others. As version 3 gets wider use, Grids will become the computer power utility for many business. MIT Technology Review has named Grid computing one of "Ten Technologies That Will Change the World."

Supercomputers haven't disappeared. The cost performance of supercomputers had risen dramatically and their space and power reduced. Supercomputers have been primarily used by government and independent labs for purposes from analyzing data from space probes to insuring our weapons will still work after being stored for decades.

More than a decade ago, the government determined that they wanted to accelerate the development of faster supercomputers more than the commercial market would do alone. This resulted in an DARPA program named "Accelerated Supercomputer Initiative," or ASCI. Because of ASCI and a series of ASCI programs named for colors (ASCI Red, ASCI White, etc.), we now have a much broader and faster range of supercomputers.

Technologies to Match the Problem

One of the prime sites for these new supers has been at UCSD - the University of California at San Diego. UCSD is a core partner in the NPACI, the National Partnership for Advanced Computational Infrastructure. Just as Interstate highways are a core part of the US transportation infrastructure, so is NPACI in computing.

UCSD is one of several major sites around the nation that is part of NPACI's infrastructure. UCSD is hooked up to a new 40 gigabit network called the TeraGrid. The purpose of this grid is to explore very large problems demanding many supercomputers working closely coupled on the same problem even though they may be thousands of miles apart.

Each technology: Distributed; Clusters; and Grids; supports a different class of scientific applications by the differences in communication structure and speed. What is important about this is the software and systems that support these technologies are reaching maturity and expanding very rapidly because they are no longer experimental. The simpler distributed systems are mature and are growing rapidly in response to public support.

We are just at the start of major deployment of clusters and grids which will change the pace of science. Each technology will contribute to new discoveries that will extend our knowledge and enable us to participate in the discovery process.

Tomorrow's Leading Edge

Computers are beginning to move into new areas again in support of science. In a sense, it is 'back to the future' as once more computers are being directly connected to the scientific instruments. This time, the computers will control the instruments to enhance its accuracy, not just record the measurements.

The most notable example of this is at the University of Arizona's Astronomy program as the 6.5 meter MMT Telescope. Arizona's Multi Mirror Telescope is designed to solve a problem that has hampered every telescope except Hubble - atmospheric turbulence, or twinkling. Big telescopes are all limited in resolving the smallest and dimmest objects by what we see as 'twinkling' of the stars. It is the variations in the atmosphere due to turbulence and density differences that creates the twinkle. This was a prime reason the Hubble telescope was built, to get sharp pictures by being above the atmosphere.

Twenty years later, computers are making it possible to get better pictures from the ground than Hubble can capture. This is currently being tested at Arizona's 6.5 meter Adaptive Optics telescope. This telescope has a secondary mirror that is made of a flexible material. This mirror is shaped by electromagnets controlled by 168 DSPs (Digital Signal Processors).

As the atmosphere 'twinkles', measurements are made of the distortion on a millisecond basis and calculations are made to adjust the shape of the secondary mirror to cancel those distortions as well as adjust the shape for ground winds. Location of the mirror is read 40,000 times per second, with calculations to correct in the same timeframe.

The net result at the telescope is equivalent to perfectly clear, still air. Since the ground telescope is three times the size of Hubble's, it has the potential to resolve objects three times more clearly.

This is just the beginning. The University of Arizona is planning to apply this same technique to mirrors which are 20 and 30 meters in size, with the expectation that these larger mirrors can image earth sized planets around other stars up to 150 light years away.

A very readable article on the telescope's capabilities is available from the EE Times.

Expanding The Envelope

An even more ambitious computational science project is underway at Biomedical Informatics Research Network (BIRN). This project will have the broadest scientific reach of any I am aware of, as well as breaking new ground in data sharing and cross domain science.

In summary, BIRN will collect cross species biomedical data, link together major data centers across the US, provide federated access to a distributed database of various biological and medical studies, and enable standardization of instrument readings to make sure the data can be compiled without instrument bias.

The San Diego Supercomputer Center (SDSC) will be the central coordinator of the computer effort and is currently linked to three universitiy teams for this project.

The project will initially cover:

The diverse nature of the data is illustrated in this quote from the NPACI newsletter: "Multiple species are being studied, notably, the mouse and human, with the possibility of adding more species that have the potential for yielding information relevant to the human brain. Data are being collected on different types of brain activity, over a range of scales (molecular to the whole brain), over a range of time periods depending on relevance to the image acquisition technology and the topic of study, and using different laboratory methodologies (such as positron emission tomography, high-voltage electron microscopy, and magnetic resonance imaging)."

BIRN breaks ground in other areas as well. Starting with cross species work, BIRN will include cross discipline work by serving as a model for geoscience through the GEON project. Francis Berman, director of SDSC, emphasizes "this model really applies to large-scale projects and cyberinfrastructure generally."

She is saying that BIRN is a prototype for future cross discipline computational science projects. It's more than just computers, it is scientists from normally separate areas sharing data and using that capability to make discoveries in areas where two or more disciplines have a common interest. BIRN's design enables this approach to scale from just the neuroscience area to many other disciplines.

There is much more potential than just this brief outline of capabilities can list. For more detail, see the NPACI newsletter and the original article on BIRN , plus a California Institute BIRN website.

A Look Into The Future

This column is only a taste of what science and computers can do together. As the leading edge of this team is now attacking problems previously intractable because of complexity, the past expectation of multiple decade research before breakthroughs is obsolete.

Today, millions of distributed computers computationally fold proteins, search for Smallpox weaknesses and analyze medical brain scans in search of the causes of aging disorders. The results of these and other computational tools for research will alter the future of medicine. Most of us living today will live better because of these programs.

Past simulations of nuclear war made it clear that a Nuclear Winter would destroy most living things from change in climate rather than radioactivity. This information was a prime factor in the reduction of nuclear arms between America and the USSR.

For the future, Japan's "Earth Simulator" and other supercomputers will expand our knowledge on how to lighten the load so many people make on our planet. Computers now predict the paths of hurricanes, forecast weather days in advance and show weather pictures from space. Not too far in the future, computers will show better ways to handle extreme events like floods, tornados and hurricanes.

Increasing computer and Internet speeds will bring a change in literacy and education. With new computers below $200 (w/o display) and older working computers being thrown out, even the poorest can have access through a neighbor, a library or a club. Each person can be a student at any age, whose education can be driven by curiosity or need. Self motivated learning works better than any other and can be available to all, even in tough neighborhoods.

What will surprise most people is the time frame I expect life changing results to show up. In less than five years, we will see many computational science discoveries per year that qualify as significant or breakthroughs in science. By 2010, this expectation of success will be built into our culture and our kids will wonder why it took so long in the ancient past before 2001. It will be difficult to explain.

The Scientific Method

The basic steps of science are:

  1. Observe a series of events you wish to study
  2. Propose a reason (hypothesis) for why these events happen
  3. Predict from the hypothesis what would happen with a changed element
  4. Test your hypothesis by running the experiment.

If the experiment validates your hypothesis, you are now one step towards a theory. Continue testing to strengthen your hypothesis. If the experiment disproves your hypothesis, revisit the first step and create a new hypothesis. The famous quote by scientists for the latter situation is:

"Another beautiful theory destroyed by an ugly fact."

The word 'theory' is not used by scientists until a new hypothesis has been thoroughly tested and agreement is reached that the hypothesis works. Note that a theory is said to 'work' rather than being considered true because scientists always expect that a new hypothesis can come along that is a better predictor.

[30]