One of the features of Intel’s Sandy Bridge-E processor released this month is the support for the Advanced Vector Extensions (AVX) instruction set. Codes suitable for efficient auto-vectorization by the compiler will be able to take advantage of AVX without any code modification, with only re-compilation.
This paper explains the guidelines for code design suitable for auto-vectorization by the compiler (elimination of vector dependence, implementation of unit-stride data access and proper address alignment) and walks the reader through a practical example of code development with auto-vectorization. The resulting code is compiled and executed on two computer systems: a Westmere CPU-based system with SSE 4.2 support, and a Sandy Bridge-based system with AVX support. The benefit of vectorization is more significant in the AVX version, if the code is designed efficiently. An ‘elegant’, but inefficient solution is also provided and discussed.
In addition, the paper provides a comparative benchmark of the Sandy Bridge and Westmere systems, based on the discussed algorithm. Implications of auto-vectorization methods for Intel’s future Many Integrated Core technology based on the Knights Corner chip are discussed at the end.