Supercomputers

by

Dan Calle

First of all, just what is a supercomputer? We all know what microcomputers are and a growing number of people know what workstations are (really just very high-end microcomputers running a multi-user operating system), but when asked about mini-computers, mainframes, and supercomputers, many people, even those who have used such systems, will return less certain, often conflicting, answers. The Free On-line Dictionary of Computing has this definition for "supercomputer:"

A broad term for one of the fastest computers currently available. Such computers are typically used for number crunching including scientific simulations, (animated) graphics, analysis of geological data (e.g. in petrochemical prospecting), structural analysis, computational fluid dynamics, physics, chemistry, electronic design, nuclear energy research and meteorology. Perhaps the best known supercomputer manufacturer is Cray Research.

The speed of most computers was, for many years, measured by how many millions of instructions per second, or MIPS, they could execute. Variability in instruction sets has made this benchmark a poor indicator of performance and so it is rarely used anymore. Since supercomputers have always been number-crunchers, their speed is measured in floating point operations per second, or FLOPS, in units of megaflops (MFLOPS), gigaflops (GFLOPS), and teraflops (TFLOPS) which refer to millions, billions, and trillions of FLOPS, respectively.

History

[Seymour Cray; 30KB GIF
      image]Much of the early history of the supercomputer is the history of the father of the supercomputer, Seymour Cray (1925-96), and the various companies he founded; in particular, Cray Research, which was the U.S. leader in building the fastest supercomputers for many years. Cray's mission throughout his life was to build the fastest computer in the world, a goal he first realized in developing the first fully transistorized supercomputer, the CDC 1604, in 1958 Control Data Corporation, a company he founded with William Norris in 1957. He went on to design the CDC 6600, which used 60-bit words and parallel processing, demonstrated RISC design, and was forty times faster than its predecessor, followed by the CDC 7600 system. These machines would give Control Data the clout to push the mighty IBM out of the scientific computing field for a time.

Cray left Control Data in 1972 to found Cray Research following a disagreement with Norris, then CEO, who had put a new computer on hold. Always a private man, Cray was never very interested in company management so, as he had with Control Data, he relinquished control of the company after five years and worked out a deal that allowed him to do research and development at a lab away from company headquarters. After designing the 100 megaflops CRAY-1 computer in 1976 and the 1-2 gigaflops CRAY-2 computer system in 1985, both of which were the fastest supercomputers in the world when they were introduced, he again parted ways with his company after top-management elected not to go ahead with his new project, the Cray 3. Founding Cray Computer Corporation in 1989, he again built what would be (briefly) the fastest supercomputer in the world at around 4-5 gigaflops, the Cray 3, which is based on superfast 1 GHz gallium arsenide (GaAs) processors rather than conventional silicon processors, which were, and still are, topping out at 400-500 MHz. He followed it with the Cray 4, also based on gallium arsenide, which is twice as fast in per-node performance as the Cray 3 and is smaller than the human brain.

Unfortunately, various events, including the end of the Cold War, which shrank the size of the defense industry, one of the supercomputing industry's biggest markets; the advent of competition from Japanese companies such as Fujitsu Ltd., Hitachi Ltd., and NEC Corp.; and the rise in popularity of distributed computing based on large numbers of smaller microcomputers working together in a limited way all served to shrink the U.S. supercomputer industry, causing Cray Computer to file for bankruptcy in 1995. Only a few Cray 3, and even fewer Cray 4, systems had been sold. Undaunted, Cray began work on a new computer and started a new company, SRC Computer Labs, to build it. Tragically, he died on October 5th, 1996, at the age of 71, as a result of injuries sustained in an automobile accident.

Seymour Cray invented or contributed to several technologies used by the supercomputer industry, among them are the CRAY-1 vector register technology, various cooling systems, gallium arsenide semiconductor technology, and the RISC (Reduced Instruction Set Computing) architecture.


[TMC CM-5 at NCSA; 30KB
      GIF image] For many years, Seymour Cray and his companies dominated supercomputing. Eventually, other companies began finally to compete directly. Thinking Machines Corporation, for example, is another company that was famous in the field of supercomputing. Their Connection Machines, which could contain 65,536 SPARC or superSPARC processors, were among the first massively parallel machines.



The State of the Art (ca. 1997)

On July 28, 1995, two University of Tokyo researchers broke the 1 teraflops barrier with their 1,692-processor GRAPE-4 (GRAvity PipE number 4) special-purpose supercomputer costing less than two million U.S. dollars. The GRAPE-4 and its processors are specialized for performing astrophysical simulations, more specifically, the gravitational N-body problem, but it was still the fastest computer in the world at that time, reaching a peak speed of 1.08 teraflops.

[CRAY T3E-900; 14KB GIF
      image]According to a November 11, 1996 announcement by Cray Research, a 2,048-processor CRAY T3E-900 (TM) broke the world record for a general-purpose supercomputer with an incredible 1.8 teraflops peak performance. This system, according to Cray Research, is the first supercomputer able to sustain greater than one teraflops performance over long periods of time. Curiously, a December 16, 1996 announcement made by Intel Corporation, stated that their "ultra" computer, developed in a partnership with the U.S. Department of Energy, is the world's first supercomputer to break the 1 teraflops barrier. Regardless of who was first, it is clear that the current state of the supercomputing art is teraflops-level performance. A number of other companies have supercomputers operating in the 1 teraflops range, for example: NEC Corporation's SX-4 has a peak performance of 1 teraflops, the Fujitsu (Siemens-Nixdorf) VPP700 peaks at 0.5 teraflops, and the Hitachi SR2201 High-end model peaks at 0.6 teraflops.

[NEC SX-4; 57KB GIF image][Fujitsu VPP700; 65KB GIF image][Hitachi SR2201; 45KB GIF image]

There are three primary limits to performance at the supercomputer level: individual processor speed, the overhead involved in making large numbers of processors work together on a single task, and the input/output speed between processors and between processors and memory. Input/output speed between the data-storage medium and memory is also a problem, but no more so than in any other kind of computer, and, since supercomputers all have amazingly high RAM capacities, this problem can be largely solved with the liberal application of large amounts of money.

The speed of individual processors is increasing all the time, but at a great cost in research and development, and the reality is that we are beginning to reach the limits of silicon based processors. Seymour Cray showed that gallium arsenide technology could be made to work, but it is very difficult to work with and very few companies know enough to make usable processors based on it. It was such a problem that Cray Computer was forced to acquire their own GaAs foundry so that they could do the work themselves.

The solution the industry has been turning to, of course, is to add ever-larger numbers of processors to their systems, giving them their speed through parallel processing. This approach allows them to use relatively inexpensive third-party processors, or processors that were developed for other, higher-volume, applications such as personal- or workstation-level computing. Thus the development costs for the processor are spread out over a far larger number of processors than the supercomputing industry could account for on its own.

However, parallelism brings with it the problems of high overhead and the difficulty of writing programs that can utilize multiple processors at once in an efficient manner. Both problems had existed before, as most supercomputers had from two to sixteen processors, but they were much easier to deal with on that level than on the level of complexity arising from the use of hundreds or even thousands of processors. If these machines were to be used the way mainframes had been used in the past, then relatively little work was needed, as a machine with hundreds of processors could handle hundreds of jobs at a time fairly efficiently. Distributed computing systems, however, are (or are becoming, depending on who you ask) more efficient solutions to the problem of many users with many small tasks. Supercomputers, on the other hand, were designed, built, and bought to work on extremely large jobs that could be handled by no other type of computing system. So ways had to be found to make many processors work together as efficiently as possible. Part of the job is handled by the manufacturer: extremely high-end I/O subsystems arranged in topologies that minimized the effective distances between processors while also minimizing the amount of intercommunication required for the processors to get their jobs done. The other part of the job is borne by the users of the system. If they expect to get their money's worth from these highly expensive machines, they must make every effort to optimize or "parallelize" their programs so that they can make use of the many processors. If this is not done properly it will result in processors sitting idle while the information they need to continue executing is held up in an undetected bottleneck caused by poor parallelization. Worse yet, parallelization adds a certain, not insignificant, amount of complexity to the program, increasing the number of bugs and the amount of side-effects caused by changes to the code.


The Near Future (1997-2005)

What innovations and advances can we expect for the next eight or so years? A press release by Intel indicates that the completed "ultra" computer, also known as ASCI Option Red will incorporate over 9,000 Pentium Pro® processors, reach peak speeds of 1.8 teraflops, and cost $55 million. Part of the Accelerated Strategic Computing Initiative (ASCI), Option Red at the Sandia National Laboratory will be followed at the Lawrence Livermore National Laboratory by ASCI Option Blue-Pacific, a $93 million 4,096-processor supercomputer designed and built by IBM with an estimated peak performance of 3.2 teraflops. To put that into perspective, the delivery of the first components of the new supercomputer, containing only 512 processors, doubled in one day the total amount of computing power delivered to Lawrence Livermore since the laboratory's opening in 1951. Over the next ten years, the ASCI program will sponsor the development and delivery of three more supercomputers to the Lawrence Livermore, Los Alamos, and Sandia national laboratories that will reach speeds of 10, 30, and finally 100 teraflops. Though they will be made available for other applications, the primary use of this tremendous amount of computing power will be to maintain the safety and reliability of the U.S.'s remaining stockpile of nuclear weapons. Without the nuclear testing, either above or below ground, that was used for research in the past, extremely fine-grained numerical simulation is required to analyze and predict potential problems arising from long-term storage of nuclear devices.

If 100-teraflops computing seems to be a lofty goal, it should be noted that there is at least one petaflops (quadrillions of floating point operations per second) project in progress. The University of Tokyo's GRAPE:TNG project aims to have a petaflops-class computer by the year 2000. Also known as the GRAPE-5, it would have 10,000-20,000 higher-powered processors and cost around $10 million. More interesting, the new GRAPE system, though still special-purpose hardware, will be less specialized than before and will be able to perform a variety of astrophysical and cosmological simulations.

The Farther Future (2006-????)

One of the "Seymour Stories" related in the wake of Seymour Cray's death was told by Larry Smarr, Director of the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign:

When Seymour left Cray Research to form Cray Computer in the early 1990's, I was the fifth visitor he invited to Colorado Springs to review his plans for the Cray 3. He had a private lunch with me, one I will remember all my life. I asked him at one point what the next qualitative step for supercomputing would be. He paused and thought for a moment, then said, "I think it will be biological computing--using DNA and proteins as the computing elements just as Nature does." This was before the first tentative steps in that direction were announced by frontier researchers in this yet-to-be-born field. I asked him if he thought that he would be involved in building such machines and he said matter-of-factly, "no, I don't think I will live to see that day..."

Goodbye Seymour.  Thanks for
	everything.

Bibliography


Word count: 2090
© Dan Calle, 1997.
Virginia Tech
CS 3604

Last modified: Mon Feb 17 15:48:58 MET