Clusters Deliver Low-Cost Supercomputing

Before you take the plunge into clustered computing, there are some things to consider before making any decisions.

Before you take the plunge into clustered computing, there are some things to consider before making any decisions.

By Peter Varhol

 
Clusters Deliver Low-Cost Supercomputing
Clusters can prove their worth in enabling highly compute-intensive applications such as computational fluid dynamics, where the collective power of multiple systems can enable engineers to build and execute dynamic models of system behavior.

There was a time when the demarcation between day-to-day engineering design and complex, high-end simulation and prototyping was quite substantial. If you had government funding, worked at a large engineering enterprise, or otherwise had lots of money to burn, you could have ready access to a supercomputer that did everything you could imagine. Think of this capability as a Corvette for CAE.

  If you didn’t have deep funding or influence, there were entire classes of problems that you simply couldn’t tackle. Anything involving real-time continuous simulation, fluid dynamics, and the like were out of bounds. You ran greatly simplified analyses, or you guessed, and you made do.

  Today the demarcation between these classes of engineering problems is drastically shrinking, driven by off-the-shelf hardware that can be configured to work together to address a growing class of large-scale and time-critical projects. These computing clusters are sold by a number of hardware manufacturers, and supported by many CAE software vendors. And the good news is that with many of these architectures, CAE software vendors don’t have to do anything special with their code; it just works.

Clusters aren’t a panacea, and they can’t always take over everything traditionally done with supercomputers. They can have a very high level of floating point operations per second (FLOPs), but often they cannot access all data on the cluster at once. This puts technologies such as global memory and high-performance storage at a premium for more typical engineering applications.

  And because the cluster architecture matters in performance, you need to understand the types of problems that you’re typically working on, and how a particular architecture provides performance,  availability, or some other characteristic that lends itself to solving those problems.

  What characteristics should you be looking for computing clusters? Standard hardware, fast dedicated intersystem communications, and global shared memory are those that start to transform desktop computing into supercomputing. Beyond that, fast storage and cluster-ready applications can also be significant factors.

 
Clusters Deliver Low-Cost Supercomputing
The SGI Altix server clusters provide rack-mounted Intel-based systems that enable high performance and global shared non-uniform memory access for high-performance computing needs such as memory-intensive engineering applications.

Standard Hardware Drives Down Cost
Now that the PCs we used for writing reports and emails a decade or so ago have evolved into essential engineering tools, an investment of only a thousand dollars or so can deliver a computer that addresses a good proportion of engineering needs. Over the years, many computer engineers have designed architectures that have attempted to take the fundamental characteristics and juice them up for high-performance applications.

  Of course, clusters require more than simply Intel processors mounted on a no-name motherboard, with a standard ATA hard drive. You can get pretty powerful with off-the-shelf systems and components, but not enough to solve the most demanding problems.

  Standard hardware enables low-cost solutions, but it doesn’t automatically result in increased performance. Rather, it’s a starting point. The initial configuration is inexpensive and already relatively high in performance, which vendors use to add additional technologies for better memory access, storage, and compute power.

  Today, clusters from the likes of Sun Microsystems and SGI provide Intel-based hardware that keeps hardware costs down while providing a platform for additional power. Sun specializes in high-performance networked storage. Since all operating systems use virtual memory that exists in mass storage, fast and accessible storage can make engineering applications run more quickly. And fast and directly accessible storage in general enables engineering applications to move massive amounts of data into and out of memory at speeds necessary to support real-time responses to simulations and models.

Fast Communications
For a cluster to be effective, every processor, on every system, has to be able to address every memory location. And the processors have to be able to get data from across the cluster, perform computations on it, and send it back into the appropriate memory.

  Such computation within an individual computer usually involves high-speed busses and very short distances. Exchanging data between processor and memory between two different computers is an entirely different beast. If the transfer rate is not close to what it would be on an individual system, then there is little reason to share memory and processors across the cluster.

  So the interconnect between individual systems in a cluster has to be just as fast, or almost as fast, as the bus between the processor and memory. Ethernet can’t do the trick, at least not yet. Most clusters are using Infiniband or similar high-speed interconnects, where the performance begins to approximate that of the proprietary busses on the motherboard.

  The result is that clusters are able to have intersystem communication at a rate fast enough to deliver the needed performance between processor and memory. Once this has been accomplished, there is one more thing required before a cluster can deliver fully additive performance over individual PC systems.

 
Clusters Deliver Low-Cost Supercomputing
Sun Microsystems incorporates high-end Intel servers in cluster architectures along with fast, logically unified storage for fast loading of engineering application components and large data sets.

Global Shared Memory
Clustered systems have their own individual memory,  as you might expect. If you’ve got 16 clustered Intel 32-bit systems, each with quad-core processors, for example, each of those systems has its own memory,  perhaps 2GB apiece. In typical PC architectures, each processor can access only the memory that is local to it—on the same high-speed bus. While powerful in its own right, this limits the ability of a CAE application to seamlessly access power and memory beyond that individual system.

  The key to computing clusters is that any processor must be able to access the memory on any other system in the cluster. In that way,  it is possible to treat the cluster of computers as a single computer. But there’s still more to it. If the memory can be treated as one large memory space, rather than multiple smaller memory spaces, then finding that data in memory becomes straightforward and fast.

  If these systems are homogeneous; that is, the exact same architecture and memory bus, then it is technically simpler. But a growing number of clusters use different systems with different processors or memory capacities and speeds. For cost purposes, many clusters exhibit what is known as a non-uniform memory architecture, or NUMA.

  However, in a NUMA-type memory architecture, all of the memory in all of the systems can be made to look like one very large memory space. In this example, the 16 systems, each with 2GB of memory, can appear to be a full 32GB. Of course, this requires substantial hardware and software support to achieve, but the end result is a set of lower-cost systems with the memory and processing power of a supercomputer.

  What does this mean for processing? With all memory addressable by each processor, it enables applications to keep much more of their working set, and much more of their data, in memory. Processing will be faster, and larger applications and data sets can be managed.

  Many multi-processor systems are great for problems that employ parallel processing—computations that involve matrix math or similar independent calculations. But those types of problems don’t require frequent and high-speed communication among processors,  or shared memory spaces.

  If the goal is to harness the computational power, as well as the performance of other computational subsystems, then memory access has to be both fast and seamless for all processors. That’s where the high-speed interconnects come in. A clustered system requires both global memory and extremely fast networking in order to deliver high-performance benefits.

  SGI focuses on memory access and communication. Data crosses over an SGI NUMAlink switch, round-trip, in as little as 50 nanoseconds. That type of speed is essential to get data and code from memory to processor, irrespective of the engineering use.

  SGI Altix also offers a highly scalable memory architecture, offering up to 24TB of globally addressable memory, rather than the 32GB generally accessible from traditional global memory clustered systems. This amount of memory effectively increases the ability of an engineering application to bring code and data into memory, and take advantage of the fast busses between memory and processor cores.

  Unified storage can be another key to clustered computing. In many cases, engineering applications have components, options, and data loaded from a variety of different storage locations, a process that is the slowest part of computer processing. Being able to unify that storage logically makes it possible to achieve higher performance for this critical part of running high-performance engineering applications.

  For the right applications, computing clusters can offer the ability to open up entirely new areas of applications for a relatively small investment. Many engineering applications, such as computational fluid dynamics, modeling and simulation, and even detailed prototyping software will benefit from the performance of clusters. In general, almost anything that required a high-end supercomputer in the past can be done with a significantly less expensive cluster.

  This means that a wide variety of engineering software,  from ANSYS and Fluent computational fluid dynamics solutions to Dassault simulations to PTC Pro/ENGINEER mechanical simulation solutions, can all benefit from at least entry-level cluster solutions. And this means that engineers who were looking for supercomputer time to run their simulations can set their sights lower and still get the job done.

More Info
SGI
Sunnyvale, CA

Sun Microsystems
Santa Clara, CA


Peter Varhol has been involved with software development and systems management for many years. Send comments about this column to [email protected].

Share This Article

Subscribe to our FREE magazine, FREE email newsletters or both!

Join over 90,000 engineering professionals who get fresh engineering news as soon as it is published.


About the Author

Peter Varhol

Contributing Editor Peter Varhol covers the HPC and IT beat for Digital Engineering. His expertise is software development, math systems, and systems management. You can reach him at [email protected].

Follow DE
#6391