Make the Right Choice

An HPC-optimized computing system supercharges CAE performance.

An HPC-optimized computing system supercharges CAE performance.

By Himanshu Misra

Computer-aided engineering (CAE) has rapidly become the dominant prototyping strategy in virtually every industry. Recognizing the time and financial resources consumed by physical modeling processes, it’s easy to understand why. A recent report released by Daratech, a leading IT market research group,  estimated that the automobile industry alone could save about $2 billion by 2010 with increased adoption of digital prototyping. Computerized modeling also frees up design engineers to simulate more complex situations and pose more what-if scenarios. Ultimately, it allows them to produce higher quality, safer, and better performing products.

The pursuit of greater design accuracy, the use of more powerful computers that have advanced automatic mesh generation tools, and an emerging trend toward the sharing of a single computer model that is updated and maintained across a number of engineering functions during a design project have all helped increase model sizes dramatically. On top of their size, models have also become more complex as engineers complete multiphysics simulations and study fluid-structure interactions.

As the benefits of CAE have become more compelling, more and more organizations are investing in high-performance computing (HPC) systems,  which are able to tackle ever more complex CAE problems by running applications on more processors. But there’s a catch: Different HPC systems can deliver very different application performance—even when using the same number of processors. A system that runs well on highly parallel tasks such as processing transactions or even matching patterns in gene research, may deliver only a fraction of that performance when running the unique types of calculations involved in CAE. To unlock the complete potential of CAE, organizations should look for HPC systems that are built specifically for CAE codes to allow these applications to scale efficiently.

To illustrate, we’ll look at how LSTC’s (Livermore, CA) LS-DYNA finite element software and CD-adapco’s (Melville, NY) STAR-CD computational fluid dynamics (CFD) software each perform on different types of processing systems. These CAE applications demonstrate the differences in HPC systems.

Crashing Cars for Safety

Manufacturers in the auto industry commonly use LS-DYNA for crash simulations and safety analysis. A full vehicle crash problem typically involves two to four million elements. In occupant safety simulations,  complicated multiphysics comes into play when incorporating the effects of fluid-structure interaction and gas flow. LS-DYNA occupant safety models involving airbags can easily include hundreds of thousands of elements. In addition to requiring robust computational power, these calculations also demand an enormous amount of communication between processors. The more processors applied to the problem, the greater the communications demands, and the greater the impact that an inefficient communications system can have on the analysis.

To deliver ideal performance for LS-DYNA codes, an HPC system should have a high-speed, high-bandwidth interconnect linking its processors and memory, as well as sufficient raw processing speed. Such a system can apply more processors to the problem, while maintaining high efficiency even at large processor counts. That scalability allows system users to tackle more complex problems and get faster results.

 

Figure 1: Benchmark comparison of communication time for LS-DYNA code in a 3-car collision simulation.


Figure 1 (right) compares computation times running the LS-DYNA code with the 3-car collision LS-DYNA benchmark on a system with a CAE-optimized interconnect versus two alternative cluster systems.

To investigate where this performance advantage is coming from, Figure 2 (below left) compares the time a CAE-optimized system and a non-optimized commodity cluster system (built from off-the-shelf servers)  spend on communication when running the same 3-Car Collision benchmark. The difference is apparent at just four processors, and becomes magnified as more processors are added. At 64 processors, the cluster system takes three times longer to complete the necessary communication.

 

Figure 2: Comparison of communication time between CAE-optimized system and commodity cluster system using the LS-DYNA 3-car collision code.

 

Optimize for CAE

Continuing with the LS-DYNA 3-car collision benchmark comparison,  Figure 3 (below right) compares the percentage of overall solution time spent by an optimized and non-optimized system on each of three primary functions: synchronization (processors waiting for other processors to reach application barriers), interprocessor communications,  and computation. Purpose-built HPC systems also improve synchronization times via an HPC-optimized Linux operating system.

At small processor counts, the differences between the two types of systems are relatively minor. However, as the code scales to 64 processors—and the communication and synchronization functions become more demanding—the cluster system spends half of the overall solution time performing noncomputational tasks, while the CAE-optimized system achieves 70 percent efficiency. And these results are from a relatively small problem used to benchmark this code.

As the complexity of the problem increases, the differences, and thus the advantage of using an HPC-optimized system for CAE, also increase.

CFD Needs Include Communications

CD-adapco’s STAR-CD software models complicated computational fluid dynamics (CFD) problems for a variety of industrial and academic applications. Like crash FEA codes, CFD codes often require large amounts of interprocessor communications. Once again, systems with a low-latency interconnect deliver better performance.

 

Figure 3: Comparison of time spent on computation, communication, and synchronization in the LS-DYNA 3-car collision benchmark.


CD-adapco uses a CAE-optimized system to run CFD jobs for its consulting business. Based on a wide range of problems it has worked on for industrial customers, CD-adapco has reduced its turnaround time for the same number of processor counts and improved throughput in a multiuser environment because fewer processors were necessary to deliver the same turnaround times. This freed up the system for other users. In work done for those clients, the company’s CFD analyses at 32 processors show a 20 percent performance gain over 32-processor systems that are not optimized for CAE.

Extrapolating across a design cycle, the performance advantages of CAE-optimized systems can make a big difference—in turnaround times for individual jobs and ultimately in times to market for new products. The ability to efficiently apply higher processor counts to CAE problems also allows for much more complex, accurate modeling—ideally resulting in higher-quality products and lower warranty costs.

But CAE-optimized systems offer another key advantage: Unlike a cluster of standard servers, HPC systems are designed to provide the high availability demanded by CAE codes and other HPC applications.

For example, a supplier for automotive manufacturers worldwide uses various CFD codes to help improve vehicle dynamics, durability, noise,  vibration, thermal effects, ergonomics, and crash worthiness for a variety of interior and under-hood automotive systems. Using a CAE-optimized HPC system, the organization can apply large numbers of processors to CFD simulations for large, transient unsteady flows—calculations that would take far too long on a traditional cluster system to incorporate into the design cycle.

With some analyses stretching over several days, even an occasional breakdown that requires users to start again from scratch can quickly lead to major bottlenecks in the design cycle. As some design and manufacturing organizations have discovered, the faster development times that should be realized by using an HPC system can be limited—if not actually eclipsed—by reliability and management issues that plague systems that are not specially optimized. Too often, organizations spend just as much time configuring, programming, and troubleshooting commodity clusters as they do running CAE analyses.

Organizations with demanding design cycles should look for systems that were built specifically to distribute problems across multiple processors, and include intuitive interfaces for submitting and managing jobs. Many HPC systems also include management software that not only monitors the health of the entire system, but can take action to preserve an analysis in progress, even if a problem arises.

Get the Facts on CAE Performance

What’s the best HPC system for running different CAE applications? Fortunately, organizations don’t have to guess. Many resources are available that provide detailed, predictive measurements. LS-DYNA benchmark performance comparisons are available at topcrunch.org,  and comparisons of STAR-CD CFD benchmarks can be found at cd-adapco.com/support/bench/320/. For even more granular analysis, the High Performance Computing Challenge (HPCC) website offers in-depth comparisons of HPC performance on a variety of common benchmarks.

Dr. Himanshu Misra is the CAE business manager at Cray Inc. He has a Ph.D. in civil engineering from the University of Illinois at Urbana-Champaign and a B.A.S. in civil engineering from the Indian Institute of Technology in Bombay, India. Send an e-mail with your thoughts about this article   by clicking here.


 



 

Cray XD1 Speeds Turnaround Times for CD-adapco

CD-adapco constantly pushes the envelope of its STAR-CD computer-aided engineering (CAE) software using a Cray XD1 supercomputer. STAR-CD computational fluid dynamics (CFD) analyses increasingly demand models that are not only very large, but that also incorporate complex physics. Multi-phase and free surface flows, as well as many traditional moving-grid analyses (such as those used for engines and pumps), require hefty CPU resources.

Recently, CD-adapco used the Cray XD1 to model a solid oxide fuel cell stack. Each plate in the stack was modeled in detail, creating a grid with more than 10 million cells that incorporated multiphysics: flow, heat transfer, and complex electrochemistry. Using 30 processors, the Cray XD1 calculated the entire stack in less than a day. The XD1 incorporates a high-bandwidth RapidArray Interconnect with low latency, as well as the Cray HPC optimized Linux operating system for these cutting-edge simulations.

“We calculate flow from the individual passages of the front grill all the way back to the wake, which can extend 50 meters behind the trailer,” says Tom Marinaccio, director of worldwide CFD engineering services for CD-adapco. “We need quick turnaround times to support this design process, and the Cray XD1 has the muscle. Despite the size of the model, the analysis fits on as few as 14 processors. If 20 processors are used, the solutions are available in one day. We wouldn’t have attempted these analyses a few years ago due to the excessively long turnaround times they would have required.”

CD-adapco is also running several total vehicle models that simulate aerodynamics and engine compartment thermal management, and require about 20 million cells. —HM


 



 

Company & Product Information

XD1 Supercomputer
Cray Inc.
Seattle, WA

LS-DYNA
LSTC
Livermore, CA

STAR-CD
CD-adapco
Melville, NY

LS-DYNA CAE, LS-DYNA,

Share This Article

Subscribe to our FREE magazine, FREE email newsletters or both!

Join over 90,000 engineering professionals who get fresh engineering news as soon as it is published.


About the Author

DE Editors's avatar
DE Editors

DE’s editors contribute news and new product announcements to Digital Engineering.
Press releases may be sent to them via [email protected].

Follow DE
#11369