The New Multicore

Today's multicore processors open doors to an entirely new level of high performance.

Today's multicore processors open doors to an entirely new level of high performance.

By Peter Varhol

Although almost unheard of just a decade ago, multicore processors are ubiquitous today—as just about every high-performance computing (HPC) server and workstation uses cores to deliver the power needed for better and faster engineering.

The beginning of 2012 introduces a new generation of multicore processors, which will continue to have an impact on engineering design, analysis and prototyping for the next decade. In the race to deliver the fastest processor, chip makers are extending the multicore theme in some new and inventive ways.

Intel Sandy Bridge
The Intel Sandy Bridge E processor will be making its way into servers and high-end workstations later this year.

For those requiring energy-efficient servers, the EnergyCore ARM processor may be able to pick up some of the engineering workload.

A core incorporates an entire processing unit within a die, so multiple cores in effect mean several processors that typically work very closely in concert, often sharing caches. Sometimes, as with the Intel hyper-threading processors, they offer the ability to run multiple threads within each individual core. This is valuable for computations that can be broken into smaller and independent parts, so that they can be executed in parallel.

As it became increasingly more complex to design processors with smaller components and higher clock speeds, processor manufacturers like Intel and AMD began putting multiple cores on the same die to achieve greater absolute performance. In addition, Intel came out with Xeon multicore processors, and also offered multiple core processors for workstations. Now, Intel has its own new multicore solution, with the next generation of its Sandy Bridge processor.

Flurry of New Processors
Many of these new processor announcements came out of last November’s Supercomputing conference, which is increasingly becoming the preferred venue for high-performance statements. With Intel, NVIDIA, AMD and ARM licensee Calxeda all making a statement at or near Supercomputing 2011, expect more processor announcements to happen at the end of 2012 as well.

Perhaps the best place to start with this new explosion of new multicore designs is with Intel, whose Sandy Bridge E processor is the next generation of its workstation/server flagship Sandy Bridge processor line. The most obvious addition to the older Sandy Bridge is two extra physical cores, bringing the total up to six. Because each physical core also sports a secondary logical core thanks to hyperthreading, a Sandy Bridge E processor can execute up to 12 threads simultaneously.

Sandy Bridge E was never intended to be entirely a workstation processor. Rather, it’s likely going to emerge sometime in the first part of this year as a Xeon-branded processor for single- and dual-socket servers and high-end workstations. While today’s Sandy Bridge E processors run at 3.3 GHz, like the older Nehalem processor, they dynamically overclock up to 3.9 GHz in response to the need for greater computational performance. This ranks them as among the fastest processors commercially available.

By itself, Sandy Bridge E is more of an evolutionary than revolutionary processor, offering relatively modest improvements over the older Sandy Bridge model. But it demonstrates that multicore processors are nowhere near running out of computational steam.

In fact, the AMD Opteron 6200, announced the same week as Sandy Bridge E, has more cores—with a 16-core processor architecture specifically targeting server applications, including HPC servers. But it might be a misnomer to call it a true 16-core processor, as its architecture has eight two-core modules rather than 16 independent ones. The distinction means that its cores aren’t as tightly coupled as you might find in a true 16-way processor.

AMD Server Product Marketing Manager Michael Detwiler says that customers could see up to a 35% performance increase over AMD’s Opteron 6100 series, the company’s previous top-performance server processor. He also reports that the Opteron 6200 tested with better performance in two-socket configurations than Intel’s Xeon 5600 series on applications such as floating-point processing computing.

Graphics and Power Management Drive Multiple Cores
NVIDIA’a Maximus represents the 3D graphics capability of NVIDIA Quadro professional graphics processing units (GPUs) with the parallel-computing power of the NVIDIA Tesla C2075 companion processor. Jeff Brown, general manager of the Professional Solutions Group at NVIDIA, refers to the vision behind Maximus as “unifying graphics and parallel computing.”

With NVIDIA Maximus-enabled applications, GPU compute work is assigned to run on the NVIDIA Tesla companion processor, which frees up the NVIDIA Quadro GPU to handle graphics functions. Further, the processor is able to differentiate which code will best run on which processor, without any underlying software changes. While this only works for GPU-compiled code, it represents a significant advance over current processors, which can’t make decisions on where to run code.

Despite the drive to improve absolute performance, there is also a significant interest in power management, driven by both the cost of electricity and a concern about conservation. In industry, data centers have become one of the largest users of power. In this environment, ARM processors have been designed for low-power applications. While ARM has been used in phones and other low-power devices, it is unusual for them to be used in either HPC or general-purpose computers. That seems like it’s about to change.

Calxeda has been talking about producing ARM servers for some time, and made it official earlier in November when it announced a new ARM processor designed for servers, called EnergyCore. In addition, it announced that Hewlett Packard has plans to build a low-energy server around the chip. The processor is essentially a complete server on a chip, minus the mass storage and memory. A quad-core server uses just 5 watts of power per server and 1.25 watts per core, which is almost five times better than high-performance Intel servers.

With low power consumption and the use of multiple cores, ARM processors are an intriguing prospect for HPC. Of course, you still need an operating system and application software that have been ported to the ARM architecture, but if there is truly value in energy-efficient computing, that could happen.

Not a Panacea
The explosion in computing power driven by multicore processors benefits many, but not all, engineering applications. Those applications whose computations can be broken down into small parts, each executing independently, can receive a significant boost in performance. That includes most types of analysis, such as finite element analysis (FEA), computational fluid dynamics (CFD) and generic simulations, but doesn’t do a lot for bread-and-butter applications like design and rendering. In cases like these, though, extra GPU horsepower will likely pay off.

In fact, rendering and other graphical operations are getting a twist from next-generation GPUs. Innovations like NVIDIA’s Maximus enable workstations to automatically parse out work to the best processor available for the job. However, in this case that means one of two GPU architectures, not a general-purpose processor and a graphics processor.

As always, the availability of software will determine the ultimate value of these multicore innovations. Most engineering software vendors, including ANSYS, Autodesk, Dassault Syst mes and MathWorks have developed versions of their applications that take advantage of multiple cores if they are available. Some, such as MathWorks, enable engineers to specify how many cores they want to use.

Configurability is also available through virtualization technologies. Parallels Workstation Extreme lets engineers create a virtual machine and assign it a set number of cores and amount of memory. Jobs that can execute independently can have a virtual machine configured for optimal performance, while also retaining enough resources for interactive tasks.

But given enough time, the underlying system itself may take care of application software. NVIDIA’s Maximus starts to take the technology in that direction, as long as the code is compiled for that processor. In the future, there may be ways of just-in-time compiling code for a specific processor and its cores, depending on the nature of the code.

It’s clear that multicore processors have become a fixture in mainstream computing, and are essential in HPC. This new generation of multicore processors hints of systems in the future with multiple processors and cores, with system software that will automatically determine where best to execute any given task. In the next several years, as this vision is realized, engineers will see a significant boost in processing power that will have the potential to change engineering design more significantly than the changes of the last decade.

Contributing Editor Peter Varhol covers the HPC and IT beat for DE. His expertise is in software development, math systems, and systems management. You can reach him at [email protected].


Share This Article

Subscribe to our FREE magazine, FREE email newsletters or both!

Join over 90,000 engineering professionals who get fresh engineering news as soon as it is published.

About the Author

Peter Varhol

Contributing Editor Peter Varhol covers the HPC and IT beat for Digital Engineering. His expertise is software development, math systems, and systems management. You can reach him at [email protected].

Follow DE