Cluster Computing News
Cluster Computing Resources
April 1, 2015
When it comes to selecting a high-performance computing (HPC) cluster, how can you ensure you’re picking the right system for your needs? Desktop Engineering chatted with experts to find out the most important aspects engineers should consider.
Q: What are the most common HPC cluster misconceptions you see?
A. Dominic Daninger, vice president of Engineering, Nor-Tech: Many potential HPC users view clusters as difficult to setup and maintain. We also commonly see users who are moving from the workstation CAE environment to an HPC cluster environment who view and attempt to use the cluster as just a bunch of workstations. They don’t understand the intelligent job schedulers and resource managers that a cluster can add to the engineer’s toolkit. They miss the fact that a cluster can be used as a powerful shared resource and can run multiple different CAE applications at the same time for different users.
A. Rod Mach, principal at TotalCAE: A common misconception is that cluster systems are too expensive. The increase in the amount of engineering work that can be done per month, when compared to the monthly cost to obtain that capability, makes the ROI a no brainer. Another misconception is focusing on hardware as the only enabler, instead of the entire engineering workflow required to make engineers more productive.
Q: What are the top cluster considerations for design engineering teams?
A. Daninger: Some users don’t understand what applications can be run on a distributed parallel computing machine, which a cluster is. They don’t understand which CAE applications they need for their work. Many of our customers come to us after they have lost projects to their competition on a time-to-market basis. They may not be using design simulation or if they are, they may be doing it on workstations and waiting 72 hours for each solver simulation solution.
A. Mach: The first consideration is the impact to their product if they were able to do more analysis per day. If each engineer can go from one simulation today to three or four in a single workday, that makes the justification an easier sell to management. Another consideration is to validate that your models and solver will obtain speedup. Have your cluster vendor benchmark your model, and see the real-world impact. Finally, teams should consider working with a vendor who understands their solver applications, and can deliver a complete turnkey solution.
Q: How should engineers determine cluster size?
A. Daninger: Available budgets for CAE simulation are an important consideration when determining cluster size. The software licensing can often be 60-70% or more of the cost for purchasing a CAE cluster. It is important to understand how the customer models scale.
A. Mach: Typically, common models engineers would run are benchmarked on the proposed system to determine the optimal number of CPUs to run based on a price/performance curve. Then we determine the mix of jobs they need to run, and the target amount of work per day, to determine the optimal size.
Q: Why would engineers choose an in-house private cluster vs. a cloud computing solution?
A. Daninger: Intellectual property protection is one of the key reasons that customers choose to go with an in-house private cluster vs. a cloud solution. Another reason is very few cloud HPC offerings have InfiniBand fabric. Cloud solutions typically will offer 10G Ethernet, which has much longer latency times than current InfiniBand fabric.
A. Mach: Private clusters have the following advantages over public cloud computing solutions: There is no change to the engineering workflow. There are no extra CAE licensing complexities, all work can be done on the same central system. An on-premise professionally managed cluster is much cheaper for your baseline constant engineering computation needs. A private system is completely inside the corporate firewall, which for some organizations can address data governance and security concerns. A private cluster is highly reliable. CAE software implicitly is designed with the assumption the underlying hardware, software and interconnects are highly reliable.