Best Practices for Speed in Deep Learning Applications on Intel Architecture

You have set up a deep learning model that you are planning to train on an Intel architecture processor. In order to be productive, you have to minimize the training time. You run the application and see that it takes N seconds for a single training epoch. How do you know if it is good? If improvement is possible, what can you do to improve the training time? Are there tools to identify a tuning strategy?

Intel software development tools can answer these questions to maximize your productivity in deep learning on Intel architecture. At the Intel AI DevCon 2018 in San Francisco, Alaa Eltablawy (Colfax) presented a workshop that demonstrates how this works.

For the workshop, attendees received access to the Intel® AI DevCloud, where they could experiment with the optimization of a TensorFlow-based application for image segmentation. The instructor demonstrated the performance analysis results obtained with Intel® VTune Amplifier and Application Performance Snapshot and explained how this analysis consistently guides you to the use of known “performance tuning knobs” in TensorFlow. You can apply the techniques presented in the workshop to other deep learning frameworks, as well as classical machine learning with Python and other AI-related applications. In addition, the workshop presents information on obtaining Intel Distribution for Python, Intel VTune Amplifier, and other tools for enterprise, academic, and non-commercial use.