avx-512

A Performance-Based Comparison of C/C++ Compilers

November 11, 2017

This paper reports a performance-based comparison of six state-of-the-art C/C++ compilers: AOCC, Clang, G++, Intel C++ compiler, PGC++, and Zapcc. We measure two aspects of the compilers’ performance: The speed of compiled C/C++ code parallelized with OpenMP 4.x directives for multi-threading and vectorization. The compilation time for large projects with heavy C++ templating. In addition to measuring the performance, we interpret the results by examining the assembly instructions produced by each compiler. The tests are performed on an Intel Xeon Platinum processor featuring the Skylake architecture with AVX-512 vector instructions. Colfax_Compiler_Comparison.pdf (562 KB) Table of Contents 1. The Importance of a Good Compiler 2. Testing Methodology 2.1. Meet the Compilers 2.2. Target Architecture 2.3. Computational Kernels 2.4. Compilation Time 2.5. Test Details 2.6. Test Platform 2.7. Code Analysis 3. Results 3.1. Performance of Compiled Code 3.2. Compilation Speed 4. Summary Appendix A. LU Decomposition Appendix B. Jacobi Solver Appendix C. Structure Function Appendix [...]

Capabilities of Intel® AVX-512 in Intel® Xeon® Scalable Processors (Skylake)

September 19, 2017

This paper reviews the Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set and answers two critical questions: How do Intel® Xeon® Scalable processors based on the Skylake architecture (2017) compare to their predecessors based on Broadwell due to AVX-512? How are Intel Xeon processors based on Skylake different from their alternative, Intel® Xeon Phi™ processors with the Knights Landing architecture, which also feature AVX-512? We address these questions from the programmer’s perspective by demonstrating C language code of microkernels benefitting from AVX-512. For each example, we dig deeper and analyze the compilation practices, resultant assembly, and optimization reports. In addition to code studies, the paper contains performance measurements for a synthetic benchmark with guidelines on estimating peak performance. In conclusion, we outline the workloads and application domains that can benefit from the new features of AVX-512 instructions. Colfax-SKL-AVX512-Guide.pdf (524 KB) Table of Contents 1. Intel Advanced Vector Extensions 512 [...]

HOW Series “Deep Dive”: Webinars on Performance Optimization – 2017 Edition

June 30, 2017

Register Why Attend Roadmap Instructor Prerequisites Cluster Materials Book In a Nutshell HOW Series “Deep Dive” is a free Web-based training on parallel programming and performance optimization on Intel architecture. The workshop includes 20 hours of instruction and code for hands-on exercises. This training is free to everyone thanks to Intel’s sponsorship. You can access the video recordings of lectures, slides of presentations and code of practical exercises on this page using a free Colfax Research account. To run the hands-on exercises, you will need a multi-core Intel architecture processor and the Intel C++ Compiler. You can get this compiler for 30 days at no cost using an evaluation license for Intel Parallel Studio [...]

Webinar: Demystifying Vectorization

May 18, 2017

Free Webinar Abstract Have you heard of code vectorization, but not sure how it applies to your work? Rest assured, you are in a good company. Furthermore, even seasoned computing professionals have a good excuse for not being familiar with this concept! That said, now is a great time to learn about writing vectorized code. That is because in modern Intel processors, vector instructions may speed up arithmetic instructions by up to a factor of 16. However, you must design computational code in a way that makes vector processing possible. In this 1-hour webinar I will explain what to expect from vectorization, and how to make sure that your code has it: Manual and compiler-assisted vectorization Assessing your success with vectorization Loop was vectorized – what’s next? Speaker Andrey Vladimirov, Head of HPC Research, Colfax International Dr. Andrey Vladimirov’s primary research interest is the application of modern computing technologies to computationally demanding scientific problems. Prior to joining Colfax, Andrey was involved in theoretical astrophysics [...]

Performance Optimization for Intel® Xeon Phi™ x200 Product Family: Video

September 29, 2016

Optimization for Intel Xeon Phi Processors x200 Colfax now offers a 2-hour Hands-On Workshop (HOW) video on the best practices for performance optimization for Intel® Xeon Phi™ processor (formerly Knights Landing). Use links below the video to navigate the 10 episodes. Slides: HOW-Knights-Landing.pdf (4 MB) Part 1. Meet Intel Xeon Phi processors Purpose of Intel Xeon Phi processors and their organization from the programmer’s point of view. Episode 01. ► Intel architecture: today and tomorrow (14 min) Episode 02. ► Cores in Intel Xeon Phi processors (7 min) Episode 03. ► Vector Instruction Support (14 min) Episode 04. ► High-bandwidth memory (8 min) Episode 05. ► Clustering modes (9 min) Part 2. Hands-on Demonstrations Exercises in performance optimization for Intel Xeon Phi processors. Episode 06. ► Memory bandwidth optimization (19 min) (bonus: ► with memkind) (9 min) Episode 07. ► Vectorization with AVX-512 (13 min) (bonus: ► threading) (9 min) Episode 08. ► Tuning with Intel Math Kernel Library (MKL) (20 min) Episode 09. [...]

Knights Landing Webinar Slides Translated to Japanese

May 13, 2016

日XLsoft社の協力で、弊社の “Introduction to Next-Generation Intel® Xeon Phi™ Processor: Developer’s Guide to Knights Landing” で使われているスライド集が日本語に翻訳されました。 With the help of our partners at XLsoft, the slide deck for the webinar “Introduction to Next-Generation Intel® Xeon Phi™ Processor: Developer’s Guide to Knights Landing” has been translated to the Japanese language. XLsoft社のウェブサイト/XLsoft website Download here: JP-Colfax-Programmers-Guide-to-KNL.pdf (5 MB) For more information, and to register for the webinar, please visit: Webinar [...]

Guide to Automatic Vectorization with Intel AVX-512 Instructions in Knights Landing Processors

May 11, 2016

This publication is part of a developer guide focusing on the new features in 2nd generation Intel® Xeon Phi™processors code-named Knights Landing (KNL). In this document, we focus on the new vector instruction set introduced in Knights Landing processors, Intel® Advanced Vector Extensions 512 (Intel® AVX-512). The discussion includes: Introduction to vector instructions in general, The structure and specifics of AVX-512, and Practical usage tips: checking if a processor has support for various features, compilation process and compiler arguments, and pros and cons of explicit and automatic vectorization using the Intel® C++ Compiler and the GNU Compiler Collection. Colfax_KNL_AVX512_Guide.pdf () See also: colfaxresearch.com/get-ready-for-intel-knights-landing-3-papers/ Table of Contents 1. Vector Instructions 2. Structure and Functionality of AVX-512 2.1. Subsets 2.2. AVX512-F 2.3. AVX512-CD 2.4. AVX512-ER 2.5. AVX512-PF 3. Feature Check 3.1. Command Line 3.2. Source Code 4. Compiling 4.1 Usage Models 4.2. Intel C++ Compiler 4.3. The GNU [...]