You are viewing archived content (2011-2018). For current research, visit


Best Practices for Speed in Deep Learning Applications on Intel Architecture

July 3, 2018

You have set up a deep learning model that you are planning to train on an Intel architecture processor. In order to be productive, you have to minimize the training time. You run the application and see that it takes N seconds for a single training epoch. How do you know if it is good? If improvement is possible, what can you do to improve the training time? Are there tools to identify a tuning strategy? Intel software development tools can answer these questions to maximize your productivity in deep learning on Intel architecture. At the Intel AI DevCon 2018 in San Francisco, Alaa Eltablawy (Colfax) presented a workshop that demonstrates how this works. For the workshop, attendees received access to the Intel® AI DevCloud, where they could experiment with the optimization of a TensorFlow-based application for image segmentation. The instructor demonstrated the performance analysis results obtained with Intel® VTune Amplifier and Application Performance Snapshot and explained how this analysis consistently guides you to the use of known “performance tuning knobs” in [...]

Webinar: Demystifying Vectorization

May 18, 2017

Free Webinar Abstract Have you heard of code vectorization, but not sure how it applies to your work? Rest assured, you are in a good company. Furthermore, even seasoned computing professionals have a good excuse for not being familiar with this concept! That said, now is a great time to learn about writing vectorized code. That is because in modern Intel processors, vector instructions may speed up arithmetic instructions by up to a factor of 16. However, you must design computational code in a way that makes vector processing possible. In this 1-hour webinar I will explain what to expect from vectorization, and how to make sure that your code has it: Manual and compiler-assisted vectorization Assessing your success with vectorization Loop was vectorized – what’s next? Speaker Andrey Vladimirov, Head of HPC Research, Colfax International Dr. Andrey Vladimirov’s primary research interest is the application of modern computing technologies to computationally demanding scientific problems. Prior to joining Colfax, Andrey was involved in theoretical astrophysics [...]

Optimizing Torch Performance for Intel Xeon Phi Processors

November 18, 2016

    In this 1-hour webinar, Ryo Asai (Colfax) discusses how machine learning applications can benefit from code modernization. He begins by exploring the parallelism that gives modern computer architecture its performance, and how it can be leveraged. Then he applies code modernization techniques live on-screen to the Torch machine learning framework. Specifically, he optimizes image recognition through a deep convolutional neural network that uses the VGG-net architecture. For each code modernization technique, he explains why it works, and how to apply it in practice. What you will learn: What code modernization is, and its importance for machine learning Practical knowledge of modern computer architectures Code modernization techniques for leveraging parallelism Slides:  Colfax-Torch-VGG-Webinar.pdf (2 [...]

Interview with James Reinders: future of Intel MIC architecture, parallel programming, education

March 5, 2015

A few weeks ago we recorded our conversation with James Reinders, the Director and Chief Evangelist at Intel Corporation. We discussed the future of the parallel programming and Intel MIC architecture products: Intel Xeon Phi coprocessors, Knights Landing (KNL), and future 3rd generation – Knight Hill (KNH). We also talked about how students can learn parallel programming and optimization for high performance applications. Watch the whole interview by clicking the player above, or jump straight to one of the questions in the list below. James Reinders and his role at Intel. – 00:47 Why Parallel Programming and Code Modernization is important? – 01:49 Brief introduction to MIC architecture and Xeon Phi coprocessors. – 04:03 What type of applications benefit from MIC architecture? – 07:16 How to approach porting your code for MIC architecture? – 09:58 What is new in Knights Landing. – 15:24 Details of chip design of Knights Landing. – 19:54 3rd MIC generation – Knights Hill. – 21:16 How to future-proof my code? – 23:15 [...]

Scientific Computing with Intel Xeon Phi Coprocessors

February 4, 2015

I had the privilege of giving a presentation at the HPC Advisory Council Stanford Conference 2015. Thanks to insideHPC, a recording of this presentation is available on YouTube. Slides are available here and here:  Colfax-HPCAC.pdf () If you are interested in individual case studies mentioned in the talk, here they are: Paper: 2013a, 2013b Papers: 2013, 2014 Paper: 2013 Paper: [...]

Fluid Dynamics with Fortran on Intel Xeon Phi coprocessors

February 4, 2015

In this demonstration, a Colfax ProEdge™ SXP8400 workstation runs a shallow water flow solver, demonstrating CFD acceleration with Intel Xeon Phi coprocessors. The key feature of this demonstration is that exactly the same source code is used to compile the MPI executables for the Intel Xeon E5-2697 V3 processor and for Intel Xeon Phi 7120A coprocessors. The code is written in Fortran with OpenMP and MPI. For performance results with this code in a MIC-enabled cluster, see companion [...]