Heterogeneous Clustering with Homogeneous Code: Accelerate MPI Applications Without Code Surgery Using Intel Xeon Phi Coprocessors

This paper reports on our experience with a heterogeneous cluster execution environment, in which a distributed parallel application utilizes two types of compute devices: those employing general-purpose processors, and those based on computing accelerators known as Intel Xeon Phi coprocessors.

Unlike general-purpose graphics processing units (GPGPUs), Intel Xeon Phi coprocessors are able to execute native applications. In this mode, the application runs in the coprocessor’s operating system, and does not require a host process executing on the CPU and offloading data to the accelerator (coprocessor). Therefore, for an application in the MPI framework, it is possible to run MPI processes directly on coprocessors. In this case, coprocessors behave like independent compute nodes in the cluster, with an MPI rank, peer-to-peer communication capability, and access to a network-shared file system. With such configuration, there is no need to instrument data offload in the application in order to utilize a heterogeneous system comprised of processors and coprocessors. That said, an MPI application designed for a CPU-only cluster can be used on coprocessor-enabled clusters without code modification.

We discuss the issues of portable code design, load balancing and system configuration (networking and MPI) necessary in order for such a setup to be efficient. An example application used for this study carries out a Monte Carlo simulation for Asian option pricing. The paper includes the performance metrics of this application with CPU-only and heterogeneous cluster configurations.

Complete paper:  Colfax_Heterogeneous_Clustering_Xeon_Phi.pdf (443 KB) — this file is available only to registered users. Register or Log In.

Source code for Linux: Heterogeneous_Asian_Options.tgz (7 KB) — this archive is available only to registered users. Register or Log In.

This visualization based on the paper was exhibited by Colfax at SC13 at the Intel corporate booth:

5 Comments on Heterogeneous Clustering with Homogeneous Code: Accelerate MPI Applications Without Code Surgery Using Intel Xeon Phi Coprocessors

  1. In this paper, we promised to benchmark the Asian Options application with Infiniband.
    We did that, and the results are in http://colfaxresearch.com/post/2014/03/11/InfiniBand-for-MIC.aspx

  2. Hi Andrey,
    I tried to download the full paper, but something is wrong with the server

  3. Hi
    I bought a xeon phi for mistake, I thought it is just a normal cpu or GPU that I just need to put in the PC and that will work. Is that any possible to run it with other software or it sure use you own code?
    thank you

    • Xeon Phi is a specialized high-performance parallel processor for computing applications. In its first generation, Xeon Phi has the form factor of a PCIe device, similarly to a GPU. However, unlike a GPU, Xeon Phi is not for graphics processing, but for computing workloads (although specialized graphics applications for Xeon Phi do exist).

      Whether or not you can use Xeon Phi with a certain application depends on whether this application has specific support for Xeon Phi. However some computational workloads (e.g., LAPACK and BLAS functions in Matlab or R) can take advantage of Xeon Phi automatically (see, e.g., the 3rd webinar here).