August 1, 2018
With demand for more frequent simulations conducted earlier in the design process, the pressure has been on software vendors to deliver faster, easier-to-use simulation tools. The non-specialist engineers using these tools need a way to quickly visualize simulation results—right now. That means they need more powerful hardware and software that can take advantage of parallel processing.
Earlier this year, ANSYS took up that challenge with its Discovery Live launch. The software provides real-time simulation for rapid design space exploration. Engineers can see simulations of how their design changes will affect performance almost instantaneously.
ANSYS Discovery Live is built on NVIDIA graphics processing unit (GPU) and compute unified device architecture (CUDA) parallel computing. In the early design stages, engineers using the tools can try out new ideas without meshing or post-processing.
PTC then announced a partnership with ANSYS to integrate Discovery Live into Creo, enabling new real-time capabilities in the 3D CAD software. Using the solution, designers will be able to see real-time simulation results during modeling in PTC Creo.
GPU acceleration is already becoming more common for rendering and other tasks, but so far ANSYS is the first major vendor to create simulation software to specifically make use of GPU computing. According to the major chipmakers, other companies plan to follow suit.
“You can get almost instantaneous updates on the analysis results [with GPUs],” says Olimpio DeMarco, NVIDIA’s director of strategic alliances. “This is really about design exploration. We work with all of the CAD and CAM developers, and what ANSYS has done was a wake-up call for all of them.”
GPU-Based Simulation Opportunities
Although NVIDIA currently leads the GPU space, AMD is hoping to take some of their market share with its Vega line of GPUs. Intel offers CPU-based parallel processing as well, and will likely release its own GPU for the graphics/gaming space in the next few years. There are price/performance trade-offs between the GPU and CPU approach, depending on the tasks being performed.
“Using a serial computer, it would take hours or days to tackle an intrinsically parallel processing problem,” says Jon Peddie of Jon Peddie Research. “If it can be written in a multi-threaded way, so that it is presented to a parallel processor like a GPU, then you get the answer really fast. So the opportunity is to find problem workloads that have an intrinsically parallel processing opportunity, and then someone has to code that. The coding part has been insidiously difficult.”
NVIDIA tried to help relieve that burden with its CUDA programming language, and Intel has similar tools for its own parallel processing products. Up to now, many software vendors have been hesitant to make the leap because of the cost and risk associated with starting over to optimize for GPU computing.
GPU computing enables a faster time to market for designers, because many simulation times can be cut drastically, while at the same time improving product quality through an increase in the number of iterations that can be explored. GPU computing can also lower the total cost of ownership by shortening the design process and reducing licensing costs.
These benefits are also true of CPU-based parallelism, says Joseph Curley, Intel’s senior director for software ecosystem development. “Every CPU we ship is a parallel processor,” he says. “The majority of the giant performance increases have come out of going back and looking at the algorithms, and how to use parallel hardware adequately with the software.”
Intel’s Xeon Phi, which grew out of an earlier GPU design, can run software originally targeted at standard X86 CPUs.
“CPUs have disadvantages in price and performance, but Intel has a good argument about software situations where it can be difficult to move from legacy serial software to parallel. The thing Intel brings to the party is you don’t have to change anything,” Peddie says.
“We’ve seen an explosion in parallelism, but one thing the independent software vendors (ISVs) are looking at is, when do you touch your code?” Curley says. “One thing customers value is consistency. There’s a reasonably high cost to taking the first step and adding in a new piece of hardware or new algorithm, and then revalidating that with customers. Customers are looking at how much they want you to change your code vs. how much do I want you to keep it stable?”
In addition, Intel is apparently developing its own GPU to compete directly with NVIDIA and AMD, but Peddie says that it will be several years before those products are likely to hit the market.
Last year, AMD released its family of Vega GPUs, which the company described as a “sweeping change” to its core graphics technology. The architecture is built on “flexible compute units that can natively process 8-bit, 16-bit, or 32-bit or 64-bit operations,” according to Marty Johnson, director of alliances, AMD Pro-Graphics. “These compute units are optimized to attain significantly higher frequencies than previous generations, and their support of variable datatypes makes the architecture highly versatile across workloads.”
AMD declined to discuss any specific plans with its ISV partners to offer real-time simulation solutions, but Johnson did note that the GPUs would “enable strong performance in real-time simulation workflows.”
Better Answers, Sooner
GPU architecture features several distinct benefits when it comes to simulation: high-bandwidth, high-latency memory, a large number of computing units and (theoretically) better flops-per-watt ratio.
“These features make GPUs very attractive for matrix-free, embarrassingly parallel procedures, such as explicit, gradient-free updating algorithms, iterative solvers, particle-based methods like discrete particle hydrodynamics and Lattice Boltzmann methods, and mildly-coupled systems of equations like those arising in modal dynamics,” says Luis Crivelli, SIMULIA R&D technology director at Dassault Systèmes. “With sizable more effort, GPUs can also accelerate methods that rely heavily on dense-matrix operations like direct equation solvers and eigenvalue extraction algorithms.”
AMD’s Johnson says the processors excel with highly parallel computational workloads, such as physics computations needed for CAE simulations (computational fluid dynamics and finite element analysis), particle-based simulations (discrete element modeling), and rendering (i.e., ray tracing for physically accurate renders as well as real-time graphics for virtual reality visualization).
“Certainly, there always are algorithms that adapt better than others to new paradigms,” Crivelli says. “Explicit codes, particle-based simulations, decoupled systems are natural candidates. Simulations that rely on gradient-based optimizations—also known as implicit methods—are generally more robust, accurate and less noisy but significantly harder to parallelize.”
Intel’s Curley says that heavily structured problems can be parallelized more easily than unstructured problems. “If you can find a way to make memory access more efficient or to get the various parts of the problem domains decomposed directly, you can see a huge increase in performance,” he says.
For problems that lend themselves to this approach, parallel processing gets users their answer sooner, which can allow them to examine more design iterations and ultimately arrive at a better design.
“You can look at more design options in the same amount of time,” NVIDIA’s DeMarco says. “Reducing that piece of the workflow is important, but not as important as having higher quality designs that won’t fail in the field.”
Vendors Slow to Recode
Despite the advantages of parallel processing in general, and of GPUs in particular, the market has been slow to optimize software for this approach.
Customers are often hesitant to use new technologies. Adoption of new methods can be expensive, because of the need for validation and certification, and adherence to regulations or procedures may impede acceptance.
Furthermore, GPU optimization requires a significant investment from the software vendor. According to Johnson at AMD, rewriting the application to fully exploit the parallelism is required in order to get the most GPU processing performance. Retooling for GPU-based workflows requires looking at parallelism in a new way for the GPU.
“The biggest challenge has been inertia,” DeMarco says. “A lot of software is built on this old code, and you can’t just take out a few pieces and substitute GPU code and expect big performance gains. You really have to start from a blank page and write from scratch to leverage the GPU. In the case of ANSYS, they put in the work and it took them two-and-a-half years to get to the beta state.”
Dassault Systèmes’ Crivelli agrees. “Significant investments are required to update and port mature software to a platform that has a radically different programming model compounded with the risk of bug injection and destabilizing the code,” he says. “There is a cost/risk/reward trade-off. Automatic conversion tools frequently don’t do a good job and the complexity of supporting multiple code branches often is prohibitive and defect prone. Questions about the long-term feasibility of the technology also apply here. The process is significantly simpler and the risk is mitigated if the vendor develops a completely new application.”
Simulation providers must also make sure that applications are certified and provide accurate results, which meet the requirements for double-precision compute. An additional challenge is making GPU acceleration applicable to the simulation provider’s overall product line.
On the customer side, there are a lot of files in existence from previous simulations that won’t map nicely to a hardware accelerator. “That means a company like Boeing that has millions of files that they ran on serial processors can do those same types of tests with parallel processing, but that puts them in a position of having two versions of their data,” Peddie says.
Intel’s Curley also notes that optimizing software for particular hardware implementations or accelerators comes with a cost in flexibility. “If you write code for that one piece of hardware, you aren’t going to be able to reuse it on many other things,” he says. “As you get more optimized for a single problem, you get less portable to other platforms.”
Real-Time Interactive Simulations
With ANSYS and PTC staking their claim in the GPU simulation space, there will be more activity in the space over the next few years. Much of that work will center on combining simulation with real-time renders.
“We’re doing a lot of work now to enable real-time realistic rendering,” NVIDIA’s DeMarco says. “You can benefit the simulation, but when the management team comes in, you can also give them a better understanding of the design.”
“Although primarily intended for AI applications, the outstanding theoretical computational power of integrated GPU systems, like NVIDIA’s DGX-2 and HGX-2 platforms, could help real-time interactive simulations become a reality,” Crivelli says. “There is a concerted development effort toward reaching this goal. Currently available interactive simulations are limited in size by the relatively small amount of GPU memory and trade off accuracy for speed.”
Peddie says that partitioning will contribute to this type of visualization. “You can assign certain workloads to one part of the GPU, and other workloads to a different part,” he says. “Now you have thousands of processors, and that is extremely useful. The notion there is you can have almost real-time representations of the simulation you are running, and that is extremely valuable.”
For end users wondering what to do to prepare, Intel’s Curley says that they should keep upgrading their hardware and wait to see what the software vendors come up with.
“If you keep buying the latest hardware and the latest CPUs, you get more parallel features automatically,” he says. “The software just has to expose that functionality.”
About the Author
Brian Albright is the editorial director of Digital Engineering. Contact him at [email protected].Follow DE