vectorization

Demystifying Vectorization

May 18, 2017

Free Webinar Abstract Have you heard of code vectorization, but not sure how it applies to your work? Rest assured, you are in a good company. Furthermore, even seasoned computing professionals have a good excuse for not being familiar with this concept! That said, now is a great time to learn about writing vectorized code. That is because in modern Intel processors, vector instructions may speed up arithmetic instructions by up to a factor of 16. However, you must design computational code in a way that makes vector processing possible. In this 1-hour webinar I will explain what to expect from vectorization, and how to make sure that your code has it: Manual and compiler-assisted vectorization Assessing your success with vectorization Loop was vectorized – what’s next? Speaker Andrey Vladimirov, Head of HPC Research, Colfax International Dr. Andrey Vladimirov’s primary research interest is the application of modern computing technologies to computationally demanding scientific problems. Prior to joining Colfax, Andrey was involved in theoretical astrophysics [...]

“HOW Series”: Webinars on Performance Optimization, June 2017

April 28, 2017

  In a Nutshell HOW Series “Deep Dive” is a free 20-hour hands-on in-depth training on parallel programming and performance optimization in computational applications on Intel architecture. The 6th run in 2017 begins June 19, 2017. Broadcasts start at 16:00 GMT (9:00 am in San Francisco, 12:00 noon in New York, 5:00 pm in London, 7:00 pm in Moscow, 9:30 pm in New Delhi, 1:00 am in Tokyo). June 2017 S M T W H F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30                   — Webinar+remote access c GMT 16:00 San Francisco 9:00 am New York 12:00 noon London 5:00 pm Moscow 7:00 pm New Delhi 9:30 pm Tokyo 1:00 am Live status as of 13 minutes ago: 94 registrants. Register Cannot attend? Register anyway for cluster access, progress updates and recorded video.   Learn More Why Attend the HOW Series Course Roadmap Instructor Bio Prerequisites Remote Access for Hands-On Exercises Slides, Code and Video System Requirements (IMPORTANT!) Supplementary Materials Chat Why Attend the [...]

HOW Series “Deep Dive”: Webinars on Performance Optimization, May 2017

April 13, 2017

  In a Nutshell HOW Series “Deep Dive” is a free 20-hour hands-on in-depth training on parallel programming and performance optimization in computational applications on Intel architecture. The 5th run in 2017 begins May 15, 2017. Broadcasts start at 16:00 GMT (9:00 am in San Francisco, 12:00 noon in New York, 5:00 pm in London, 7:00 pm in Moscow, 9:30 pm in New Delhi, 1:00 am in Tokyo). May 2017 S M T W H F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31                   — Webinar+remote access GMT 16:00 San Francisco 9:00 am New York 12:00 noon London 5:00 pm Moscow 7:00 pm New Delhi 10:30 pm Tokyo 1:00 am Live status as of 32 days ago: 100 registrants. Register Registration for this training in June 2017 is also open. Cannot attend? Register anyway for cluster access, progress updates and recorded video.   Learn More Why Attend the HOW Series Course Roadmap Instructor Bio Prerequisites Remote Access for Hands-On Exercises Slides, Code and Video System Requirements [...]

HOW Series “Deep Dive”: Webinars on Performance Optimization, April 2017

March 16, 2017

  In a Nutshell HOW Series “Deep Dive” is a free 20-hour hands-on in-depth training on parallel programming and performance optimization in computational applications on Intel architecture. The 4th run in 2017 begins April 17, 2017. Broadcasts start at 16:00 UTC (9:00 am in San Francisco, 12:00 noon in New York, 5:00 pm in London, 7:00 pm in Moscow, 9:30 pm in New Delhi, 1:00 am in Tokyo). April 2017 S M T W H F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30     — Webinar+remote access UTC 16:00 San Francisco 9:00 am New York 12:00 noon London 5:00 pm Moscow 7:00 pm New Delhi 9:30 pm Tokyo 1:00 am Live status as of 60 days ago: 354 registrants. Registration for this workshop is closed, but you can register for the upcoming HOW series in May or you can watch the recordings of all presentations below.   Learn More Why Attend the HOW Series Course Roadmap Instructor Bio Prerequisites Remote Access for Hands-On Exercises Slides, Code and Video System Requirements (IMPORTANT!) Supplementary Materials Chat Why [...]

HOW Series “Deep Dive”: Webinars on Performance Optimization, March 2017

February 15, 2017

  In a Nutshell HOW Series “Deep Dive” is a free 20-hour hands-on in-depth training on parallel programming and performance optimization in computational applications on Intel architecture. The 3rd run in 2017 begins March 13, 2017. Broadcasts start at 16:00 UTC (9:00 am in San Francisco, 12:00 noon in New York, 4:00 pm in London, 7:00re pm in Moscow, 9:30 pm in New Delhi, 1:00 am in Tokyo). March 2017 S M T W H F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31                   — Webinar+remote access UTC 16:00 San Francisco 9:00 am New York 12:00 noon London 4:00 pm Moscow 7:00 pm New Delhi 9:30 pm Tokyo 1:00 am Live status as of 95 days ago: 138 registrants. Registration for this workshop is closed, but you can register for the April HOW Series.   Learn More Why Attend the HOW Series Course Roadmap Instructor Bio Prerequisites Remote Access for Hands-On Exercises Slides, Code and Video System Requirements (IMPORTANT!) Supplementary Materials Chat Why Attend the HOW [...]

Training Calendar

October 4, 2016

“HOW” Series: Deep Dive   June 2017 S M T W H F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30                   — Upcoming Information Register   Learn Modern Code Are you realizing the payoff of parallel processing? Are you aware that without code optimization, computational applications may perform orders of magnitude worse than they are supposed to? The Web-based HOW Series training provides extensive knowledge needed to extract more of the parallel compute performance potential found in both Intel® Xeon® and Intel® Xeon Phi™ processors and coprocessors. Practice New Skills The HOW series is an experiential learning program because comprising instructional and hands-on self-study components: The instructional part: 10 lecture sessions with 1 hour of theory and 1 hour of practical demonstrations. In the self-study part: attendees are provided with remote access over SSH to a Linux-based cluster of training server with Intel Xeon Phi processors (KNL) and Intel [...]

Guide to Automatic Vectorization with Intel AVX-512 Instructions in Knights Landing Processors

May 11, 2016

This publication is part of a developer guide focusing on the new features in 2nd generation Intel® Xeon Phi™processors code-named Knights Landing (KNL). In this document, we focus on the new vector instruction set introduced in Knights Landing processors, Intel® Advanced Vector Extensions 512 (Intel® AVX-512). The discussion includes: Introduction to vector instructions in general, The structure and specifics of AVX-512, and Practical usage tips: checking if a processor has support for various features, compilation process and compiler arguments, and pros and cons of explicit and automatic vectorization using the Intel® C++ Compiler and the GNU Compiler Collection.  Colfax_KNL_AVX512_Guide.pdf (195 KB) — this file is available only to registered users. Register or Log In. See also: colfaxresearch.com/get-ready-for-intel-knights-landing-3-papers/ 1. Vector Instructions Intel® Xeon Phi™products are highly parallel processors with the Intel® Many Integrated Core (MIC) architecture. Parallelism is present in these [...]

Auto-Vectorization with the Intel Compilers: is Your Code Ready for Sandy Bridge and Knights Corner?

March 12, 2012

One of the features of Intel’s Sandy Bridge-E processor released this month is the support for the Advanced Vector Extensions (AVX) instruction set. Codes suitable for efficient auto-vectorization by the compiler will be able to take advantage of AVX without any code modification, with only re-compilation. This paper explains the guidelines for code design suitable for auto-vectorization by the compiler (elimination of vector dependence, implementation of unit-stride data access and proper address alignment) and walks the reader through a practical example of code development with auto-vectorization. The resulting code is compiled and executed on two computer systems: a Westmere CPU-based system with SSE 4.2 support, and a Sandy Bridge-based system with AVX support. The benefit of vectorization is more significant in the AVX version, if the code is designed efficiently. An ‘elegant’, but inefficient solution is also provided and discussed. In addition, the paper provides a comparative benchmark of the Sandy Bridge and Westmere systems, based on the discussed [...]