Machine Learning

Optimizing Torch Performance for Intel Xeon Phi Processors

November 18, 2016

    In this 1-hour webinar, Ryo Asai (Colfax) discusses how machine learning applications can benefit from code modernization. He begins by exploring the parallelism that gives modern computer architecture its performance, and how it can be leveraged. Then he applies code modernization techniques live on-screen to the Torch machine learning framework. Specifically, he optimizes image recognition through a deep convolutional neural network that uses the VGG-net architecture. For each code modernization technique, he explains why it works, and how to apply it in practice. What you will learn: What code modernization is, and its importance for machine learning Practical knowledge of modern computer architectures Code modernization techniques for leveraging parallelism Slides:  Colfax-Torch-VGG-Webinar.pdf (2 MB) — this file is available only to registered users. Register or Log [...]

FALCON Library: Fast Image Convolution in Neural Networks on Intel Architecture

November 9, 2016

We describe FALCON, an original open-source implementation of image convolution with a 3×3 filter based on Winograd’s minimal filtering algorithm. Compared to direct convolution, Winograd’s algorithm reduces the number of arithmetic operations at the cost of complicating the memory access pattern. This study is carried out in the context of image analysis in convolutional neural networks. Our implementation combines C language code with BLAS function calls for general matrix-matrix multiplication. The code is optimized for Intel Xeon Phi processors x200 (formerly Knights Landing) with Intel Math Kernel Library (MKL) used for BLAS call to the SGEMM function. To test the performance of FALCON in the context of machine learning, we benchmarked it for a set of image and filter sizes corresponding to the VGG Net architecture. In this test, FALCON achieves 10% greater overall performance than convolution from DNN primitives in Intel MKL. However, for some layers, FALCON is faster than MKL by 1.5x, but for other layers slower by as much as 4x. This indicates a possibility of a [...]

Machine Learning on 2nd Generation Intel® Xeon Phi™ Processors: Image Captioning with NeuralTalk2, Torch

June 20, 2016

  In this case study, we describe a proof-of-concept implementation of a highly optimized machine learning application for Intel Architecture. Our results demonstrate the capabilities of Intel Architecture, particularly the 2nd generation Intel Xeon Phi processors (formerly codenamed Knights Landing), in the machine learning domain. Download as PDF:  Colfax-NeuralTalk2-Summary.pdf (814 KB) — this file is available only to registered users. Register or Log In. or read online below. Code: see our branch of NeuralTalk2 for instructions on reproducing our results (in Readme.md). It uses our optimized branch of Torch to run efficiently on Intel architecture. See also: colfaxresearch.com/get-ready-for-intel-knights-landing-3-papers/ 1. Case Study It is common in the machine learning (ML) domain to see applications implemented with the use of frameworks and libraries such as Torch, Caffe, TensorFlow, and similar. This approach allows the computer scientist to focus on the learning algorithm, leaving the details of performance optimization to the framework. Similarly, the ML [...]

Research Engineer in Machine Learning

February 23, 2016

Colfax International, a Silicon Valley company with 29 years of experience in high-end computing systems, is growing its research team by opening a full-time position of Research Engineer in Machine Learning based in Sunnyvale, California, USA. We are looking for professionals with passion for parallel computing with emphasis on machine learning tasks to help us expand our activity in research, education and consulting on modern computing technologies (see colfaxresearch.com). ABOUT THE JOB: RESEARCH: As a researcher, you will have a chance to experiment with machine learning solutions that utilize novel computing technologies including processors, computing accelerators, interconnects and storage. You will be one of the first people in the world to learn how to use new processors and use emerging software tools and hardware solutions to solve challenging problems in machine learning. You will report the results of your learning in fast-track and peer-reviewed publications, conference presentations, and technology demonstrations. See colfaxresearch.com for examples of our work. [...]