HPC System Administration

Installing Intel MPSS 3.3 in Arch Linux

August 20, 2014

This technical publication provides instructions for installing the Intel Manycore Platform Software Stack (MPSS) version 3.3 in Arch Linux operating system. Intel MPSS is a suite of tools necessary for operation of Intel Xeon Phi coprocessors. Instructions provided here enable offload and networking functionality for coprocessors in Arch Linux. The procedure described in this paper is completely reversible via an uninstallation script. Downloads: Product Direct Link Intel MPSS 3.3 (page, archive) mpss-3.3-linux.tar (~400 MB) Linux Kernel 3.10 LTS (AUR) linux-lts310.tar.gz (78 KB) TRee Installation Generator (TRIG) trig.sh (3 KB) RHEL networking utilities rhnet.tgz (34 KB) Offload functionality test Offload-Hello.cc (347 B) GNU Public License v2 (applies to TRIG and RHEL utilities) page Paper: Colfax_MPSS_in_Arch_Linux.pdf (97 KB) Make sure to read important additional in the “Comments” below [...]

File I/O on Intel Xeon Phi Coprocessors: RAM disks, VirtIO, NFS and Lustre

July 28, 2014

The key innovation brought about by Intel Xeon Phi coprocessors is the possibility to port most HPC applications to manycore computing accelerators without code modification. One of the reasons why this is possible is support for file input/output (I/O) directly from applications running on coprocessors. These facilities allow seamless usage of manycore accelerators in common HPC tasks such as application initialization from file data, saving running output, checkpointing and restarting, data post-processing and visualization, and other. This paper provides information and benchmarks necessary to make the choice of the best file system for a given application from a number of the available options: RAM disks, virtualized local hard drives, and distributed storage shared with NFS or Lustre. We report benchmarks of I/O performance and parallel scalability on Intel Xeon Phi coprocessors, strengths and limitations of each option. In addition, the paper presents system administration procedures necessary for using each file system on coprocessors, including bridged networking and [...]

Configuration and Benchmarks of Peer-to-Peer Communication over Gigabit Ethernet and InfiniBand in a Cluster with Intel Xeon Phi Coprocessors

March 11, 2014

Intel Xeon Phi coprocessors allow symmetric heterogeneous clustering models, in which MPI processes are run fully on coprocessors, as opposed to offload-based clustering. These symmetric models are attractive, because they allow effortless porting of CPU-based applications to clusters with manycore computing accelerators. However, with the default software configuration and without specialized networking hardware, peer-to-peer communication between coprocessors in a cluster is quenched by orders of magnitude compared to the capabilities of Gigabit Ethernet networking hardware. This situation is remedied by InfiniBand interconnects and the software supporting them. In this paper we demonstrate the procedures for configuring a cluster with Intel Xeon Phi coprocessors connected with Gigabit Ethernet as well as InfiniBand interconnects. We measure and discuss the latencies and bandwidths of MPI messages with and without the advanced configuration with InfiniBand support. The paper contains a discussion of MPI application tuning in an InfiniBand-enabled cluster with Intel Xeon Phi [...]

Heterogeneous Clustering with Homogeneous Code: Accelerate MPI Applications Without Code Surgery Using Intel Xeon Phi Coprocessors

October 17, 2013

This paper reports on our experience with a heterogeneous cluster execution environment, in which a distributed parallel application utilizes two types of compute devices: those employing general-purpose processors, and those based on computing accelerators known as Intel Xeon Phi coprocessors. Unlike general-purpose graphics processing units (GPGPUs), Intel Xeon Phi coprocessors are able to execute native applications. In this mode, the application runs in the coprocessor’s operating system, and does not require a host process executing on the CPU and offloading data to the accelerator (coprocessor). Therefore, for an application in the MPI framework, it is possible to run MPI processes directly on coprocessors. In this case, coprocessors behave like independent compute nodes in the cluster, with an MPI rank, peer-to-peer communication capability, and access to a network-shared file system. With such configuration, there is no need to instrument data offload in the application in order to utilize a heterogeneous system comprised of processors and coprocessors. That said, an [...]

Scientific Computing in a Web Browser: GALPROP WebRun

June 30, 2012

As scientific software tools become increasingly complex and computationally demanding, sharing the source code of a scientific project with the community may be insufficient to support peer interest and ensure the appropriate use of the tools. In order to facilitate the use of astrophysical code GALPROP, our group has launched a public online service named GALPROP WebRun. This service, live since August 2010, includes: the ability to configure GALPROP computing tasks in a Web browser; access to a dedicated computing cluster and precompiled binaries for code execution; and user support in the form of online documentation, automated validation tools, and forum/bug reporting online software. This paper reports the details and status of the GALPROP WebRun project as well as our experience with it. Complete paper: Colfax_Galprop_WebRun.pdf (2 [...]