Development Tools

A Performance-Based Comparison of C/C++ Compilers

November 11, 2017

This paper reports a performance-based comparison of six state-of-the-art C/C++ compilers: AOCC, Clang, G++, Intel C++ compiler, PGC++, and Zapcc. We measure two aspects of the compilers’ performance: The speed of compiled C/C++ code parallelized with OpenMP 4.x directives for multi-threading and vectorization. The compilation time for large projects with heavy C++ templating. In addition to measuring the performance, we interpret the results by examining the assembly instructions produced by each compiler. The tests are performed on an Intel Xeon Platinum processor featuring the Skylake architecture with AVX-512 vector instructions. Colfax_Compiler_Comparison.pdf (562 KB) Table of Contents 1. The Importance of a Good Compiler 2. Testing Methodology 2.1. Meet the Compilers 2.2. Target Architecture 2.3. Computational Kernels 2.4. Compilation Time 2.5. Test Details 2.6. Test Platform 2.7. Code Analysis 3. Results 3.1. Performance of Compiled Code 3.2. Compilation Speed 4. Summary Appendix A. LU Decomposition Appendix B. Jacobi Solver Appendix C. Structure Function Appendix [...]

Intel® Python* on 2nd Generation Intel® Xeon Phi™ Processors: Out-of-the-Box Performance

June 20, 2016

This paper reports on the value and performance for computational applications of the Intel® distribution for Python* 2017 Beta on 2nd generation Intel® Xeon Phi™ processors (formerly codenamed Knights Landing). Benchmarks of LU decomposition, Cholesky decomposition, singular value decomposition and double precision general matrix-matrix multiplication routines in the SciPy and NumPy libraries are presented, and tuning methodology for use with high-bandwidth memory (HBM) is laid out. Download as PDF: Colfax-Intel-Python.pdf (1 MB) or read online below. Code: coming soon, check back later. See also: colfaxresearch.com/get-ready-for-intel-knights-landing-3-papers/ 1. A Case for Python in Computing Python is a popular scripting language in computational applications. Empowered with the fundamental tools for scientific computing, NumPy and SciPy libraries, Python applications can express in brief and convenient form basic linear algebra subroutines (BLAS) and linear algebra package (LAPACK) functions for operations on matrices and systems of linear algebraic [...]

Guided Code Vectorization with Intel® Advisor XE

April 12, 2016

Early Stage Application Optimization made Easy with Step-by-Step Guide In this publication we discuss the usage of an optimization tool called Intel® Advisor. The discussion is illustrated with an example workload that computes the electric potential in a set of points in 3-D space produced by a group of charged particles. The example workload runs on a multi-core Intel Xeon processor with Intel AVX2 instructions. The application was originally parallelized across cores, but otherwise neither optimized nor vectorized. In the publication, we discuss three performance issues that the Intel Advisor detected: vector dependence, type conversion and inefficient memory access pattern. For each issue, we discuss how to interpret the data presented by the Intel Advisor, and also how to to optimize the application to resolve these issues. After the optimization, we observed a 16x performance boost compared to the original, non-optimized implementation. Complete paper: Colfax_Advisor_Vectorization.pdf (1 MB) Sample code for Linux: Colfax_Advisor_Vectorization_Code.zip (50 [...]

Introduction to Intel DAAL, Part 2: Distributed Variance-Covariance Matrix Computation

March 28, 2016

This is the part 2 of 3 of an introductory series of publications on the Intel® Data Analytics Acceleration Library (DAAL). DAAL is a data analytics library optimized for modern highly parallel computer architectures such as Intel Xeon and Intel Xeon Phi processors. The goal of this series is to provide developers a technical overview for developing applications using DAAL. In part 1 of the series we discussed how to implement batch mode computation on a single node. In the present publication, we discuss the distributed mode computation. Our discussion will focus both on how and when to implement distributed mode computation with Intel DAAL. As an example workload, we implement an application that uses DAAL to compute a covariance matrix of a set of vectors. We first demonstrate how to use distributed mode with this example. Then, using this example application, we scan the parameter space to determine what parameter ranges benefit from distributed computation. We also demonstrate how the output of this computation may be used in image processing to compute the eigenvectors of [...]

Introduction to Intel DAAL, Part 1: Polynomial Regression with Batch Mode Computation

October 28, 2015

This is the part 1 of 3 of an introductory series of publications on the Intel Data Analytics Acceleration Library (DAAL). DAAL is a data analytics library optimized for modern highly parallel computer architectures such as Intel Xeon and Intel Xeon Phi processors. The goal of this series is to provide developers a technical overview for developing applications using DAAL. In this paper we focus on two aspects of developing an application with Intel DAAL: data management and computation. As a practical example, we implement a simple machine learning application with polynomial regression using the library in the batch computation mode. We demonstrate using this application for data-based prediction of hydrodynamics properties of yachts. The source code and data for the sample application are available for free download. The second and third part of the series will discuss other aspects of data analysis with DAAL. In part 2, we discuss distributed data and computation in conjunction with MPI. In the third part, we discuss the case with multiple data sets and interfacing with a [...]