Optimization Techniques for the Intel MIC Architecture. Part 3 of 3: False Sharing and Padding
This is part 3 of a 3-part educational series of publications introducing select topics on optimization of applications for Intel’s multi-core and manycore architectures (Intel Xeon processors and Intel Xeon Phi coprocessors).
In this paper we discuss false sharing, highlighting the situations in which it may occur, and eliminating it with the help of data container padding.
For a practical illustration, we construct and optimize a micro-kernel for binning particles based on their coordinates. Similar workloads occur in Monte Carlo simulations, particle physics software, and statistical analysis.
Results show that the impact of false sharing may be as high as an order of magnitude performance loss in a parallel application. On Intel Xeon processors, padding required to eliminate false sharing is greater than on Intel Xeon Phi coprocessors, so target-specific padding values may be used in real-life applications.
See also:
- Part 1: Multi-Threading and Parallel Reduction
- Part 2: Strip-Mining for Vectorization
- Part 3: False Sharing and Padding
Complete paper: Colfax_Optimization_Techniques_3_of_3.pdf (629 KB)
Source code for Linux: Colfax_Tutorial_Binning.zip (6 KB)