The HPDA Buzz

What becomes of your HPC clusters in the move to high-performance data analysis?

From all corners of the design engineering software industry, it is becoming obvious that Big Data is no longer a vague term. In the past 45 days, in different cities with different people, I have been told the following (paraphrased):

  • “We are creating an incredible number of FEA (finite element analysis) visualizations. The data is important but we aren’t really sure what to do with it.” (From an automotive engineering analyst.)
  • “The data we generate when we do 3D prints is valuable. But there is no specific way to work with it after we press ‘send.’” (From an engineering services executive.)
  • “What are we going to do with all of the information this Internet of Things (IoT) will give us?” (From multiple people with various job titles.)
  • “We have thousands of parts and assemblies in our inventory, but we don’t know how many are duplicates, how many are outdated and how many were never actually used in creating a product.” (From an engineer at a Fortune 100 manufacturer.)
All of these views describe the state of engineering today: Terabytes of data are being generated and everyone wants it to become useful information. The solution is called high-performance data analytics (HPDA). There are two aspects to the successful implementation of HPDA: new software that can intelligently mine Big Data and new approaches to hardware that can support the task. This article will examine the hardware side—specifically the use of high-performance computing (HPC) clusters.

High-performance computing evolved as a way to gain scale for challenging computing projects at a much lower price than other options. Engineering adopted HPC for the benefits it brought to increasingly complex simulation and analysis work. With the rise of cloud computing, some engineers are wondering if it is better to leave their existing HPC resources behind and let a vendor like Amazon or Microsoft own the hardware. Researchers at Rutgers University see three emerging HPDA trends regarding data analysis jobs that were once reserved for HPC:

1. HPC in the cloud—completely outsourcing large computing problems to cloud-based solutions or private (in-house) cloud technology;

2. HPC plus cloud—the use of cloud resources to complement existing HPC and grid resources, as in responding to unexpected spikes in demand; and

3. HPC as a service—repurposing HPC/grid resources using cloud technology, to gain the flexibility of cloud with the local advantages of HPC systems.

The Rutgers team sees all three as viable options going forward; specific solutions go beyond strictly technical analysis.

Time as a Shrinking Commodity

As sensors proliferate and the IoT becomes commonplace, Big Data will have to also become fast data. It won’t do anyone any good to have data in-house proving the new braking system software increases brake pad wear by 5% if it takes weeks to analyze the data. This is an example of real-time streaming analytics—an ideal environment for hybrid HPC/cloud processing.

HPDA provides great value in such situations, allowing engineers to identify and solve problems quickly. In addition, desire for near-or-real-time streaming analytics is driving a new field called activity-based intelligence (ABI), where software routines are guided by data found during HPDA.

HPC clusters can become the hub of these “we need it now” information services. They will divide the problem into two parts: collection and computation. The goal is to manufacture, so to speak, HPC when and where it is needed to solve HPDA problems. Cloud services can use machine learning algorithms to crunch the incoming brake data and spot the trend; the internal, existing HPC cluster can then host the simulations that will be needed to analyze the problem, and off-load simulation to cloud instances of your preferred simulation software when capacity becomes a problem.

Such hybrid use of local and cloud resources sounds well and good, but what if your company is among the vast majority of manufacturers that either have not yet fully deployed an internal HPC cluster or don’t have any HPC resources at all? The price/performance curve continues to trend in your favor. There are now pre-tested HPC systems from the leading vendors, designed specifically for smaller manufacturing firms. The Dell HPC System for Manufacturing, for example, is pre-tested with leading applications including Dassault Systèmes SIMULIA and Abaqus, ANSYS Mechanical and Fluent, Siemens Star-CCM+, and LSTC’s LS-DYNA. It is a simple system that bundles computing, storage and networking. Installation requires a table, not a raised-floor data center.

Near-term Future of High-Performance Data Analytics

Pacific Northwest Regional Laboratory is a leader in research on HPDA and the adaptability of HPC clusters. It sees the proliferation of open source frameworks, notably Hadoop, as the key to successful management of HPC clusters for HPDA. Because existing HPC systems are often not large enough to handle the new requirements for high-performance analytics, HPDA tools will increasingly be used at all scales, from local HPC clusters to the cloud.

Wanting the benefits of high-performance data analytics doesn’t have to be about breaking the bank for new hardware or going exclusively “to the cloud.” New software vendors like UberCloud are coming to market with solutions that bridge the gap between existing HPC systems and cloud engineering, with “containers” that make installation and use of the leading simulation products a simple process.

HPDA is not so much a specific technology as it is an approach to putting all of the pieces together. In that regard HPDA is a strategic deployment—not a tactical one. Business and IT analysis firm Transparency Market Research sees the recent development of open source frameworks as having paved the way for affordable deployment of HPDA. “HPDA is seeing remarkable growth due to its highly potential drivers such as its wide adaptability and increasing application areas,” the firm notes in a recent report. “The development of open source analytic frameworks has aided organizations to tap the vast amounts of data and manipulate the unstructured data in a way that is understandable to the user by enabling quick application on the data set. HPDA not only provides great value to access the large data sets, but also enables the analyst to work on it with great speed.”

HPDA applications can be divided into unstructured, semistructured and structured data types. Deployment options are the three HPC+ options mentioned earlier. For companies ready to invest in high-performance data analysis, the costs will be just about equally split between hardware, software and services. The hardware spend will be to either extend, replace or buy new HPC clusters. The software spend will be for the HPDA tools and for connectors to existing applications; these will come either from existing application vendors or new players in the market. There will also be software spend on additional licenses of key products as their value to your work increases. The services spend will be for additional customization and managed (in-house) services.

All in all, the rise of high-performance data analysis does not require gutting your existing engineering IT infrastructure. It is an upgrade that brings high value to the rapidly accumulating data trove and can be configured to extend the life of your existing HPC resources.

For more info:

Dell

Pacific Northwest Regional Laboratory

Rutgers University

Transparency Market Research

UberCloud

Share This Article


About the Author

Randall  Newton's avatar
Randall Newton

Randall S. Newton is principal analyst at Consilia Vektor, covering engineering technology. He has been part of the computer graphics industry in a variety of roles since 1985.

  Follow DE
#16641