Optimizing Torch Performance for Intel Xeon Phi Processors

November 18, 2016

    In this 1-hour webinar, Ryo Asai (Colfax) discusses how machine learning applications can benefit from code modernization. He begins by exploring the parallelism that gives modern computer architecture its performance, and how it can be leveraged. Then he applies code modernization techniques live on-screen to the Torch machine learning framework. Specifically, he optimizes image recognition through a deep convolutional neural network that uses the VGG-net architecture. For each code modernization technique, he explains why it works, and how to apply it in practice. What you will learn: What code modernization is, and its importance for machine learning Practical knowledge of modern computer architectures Code modernization techniques for leveraging parallelism Slides:  Colfax-Torch-VGG-Webinar.pdf (2 MB) — this file is available only to registered users. Register or Log [...]

Machine Learning on 2nd Generation Intel® Xeon Phi™ Processors: Image Captioning with NeuralTalk2, Torch

June 20, 2016

  In this case study, we describe a proof-of-concept implementation of a highly optimized machine learning application for Intel Architecture. Our results demonstrate the capabilities of Intel Architecture, particularly the 2nd generation Intel Xeon Phi processors (formerly codenamed Knights Landing), in the machine learning domain. Download as PDF:  Colfax-NeuralTalk2-Summary.pdf (814 KB) — this file is available only to registered users. Register or Log In. or read online below. Code: see our branch of NeuralTalk2 for instructions on reproducing our results (in It uses our optimized branch of Torch to run efficiently on Intel architecture. See also: 1. Case Study It is common in the machine learning (ML) domain to see applications implemented with the use of frameworks and libraries such as Torch, Caffe, TensorFlow, and similar. This approach allows the computer scientist to focus on the learning algorithm, leaving the details of performance optimization to the framework. Similarly, the ML [...]

Interview with James Reinders: future of Intel MIC architecture, parallel programming, education

March 5, 2015

A few weeks ago we recorded our conversation with James Reinders, the Director and Chief Evangelist at Intel Corporation. We discussed the future of the parallel programming and Intel MIC architecture products: Intel Xeon Phi coprocessors, Knights Landing (KNL), and future 3rd generation – Knight Hill (KNH). We also talked about how students can learn parallel programming and optimization for high performance applications. Watch the whole interview by clicking the player above, or jump straight to one of the questions in the list below. James Reinders and his role at Intel. – 00:47 Why Parallel Programming and Code Modernization is important? – 01:49 Brief introduction to MIC architecture and Xeon Phi coprocessors. – 04:03 What type of applications benefit from MIC architecture? – 07:16 How to approach porting your code for MIC architecture? – 09:58 What is new in Knights Landing. – 15:24 Details of chip design of Knights Landing. – 19:54 3rd MIC generation – Knights Hill. – 21:16 How to future-proof my code? – 23:15 [...]

Scientific Computing with Intel Xeon Phi Coprocessors

February 4, 2015

I had the privilege of giving a presentation at the HPC Advisory Council Stanford Conference 2015. Thanks to insideHPC, a recording of this presentation is available on YouTube. Slides are available here and here:  Colfax-HPCAC.pdf () — this file is available only to registered users. Register or Log In. If you are interested in individual case studies mentioned in the talk, here they are: Paper: 2013a, 2013b Papers: 2013, 2014 Paper: 2013 Paper: [...]

Fluid Dynamics with Fortran on Intel Xeon Phi coprocessors

February 4, 2015

In this demonstration, a Colfax ProEdge™ SXP8400 workstation runs a shallow water flow solver, demonstrating CFD acceleration with Intel Xeon Phi coprocessors. The key feature of this demonstration is that exactly the same source code is used to compile the MPI executables for the Intel Xeon E5-2697 V3 processor and for Intel Xeon Phi 7120A coprocessors. The code is written in Fortran with OpenMP and MPI. For performance results with this code in a MIC-enabled cluster, see companion [...]