You are viewing archived content (2011-2018). For current research, visit research.colfax-intl.com

hbm

HOW Series “Deep Dive”: Webinars on Performance Optimization – 2017 Edition

June 30, 2017

Register Why Attend Roadmap Instructor Prerequisites Cluster Materials Book In a Nutshell HOW Series “Deep Dive” is a free Web-based training on parallel programming and performance optimization on Intel architecture. The workshop includes 20 hours of instruction and code for hands-on exercises. This training is free to everyone thanks to Intel’s sponsorship. You can access the video recordings of lectures, slides of presentations and code of practical exercises on this page using a free Colfax Research account. To run the hands-on exercises, you will need a multi-core Intel architecture processor and the Intel C++ Compiler. You can get this compiler for 30 days at no cost using an evaluation license for Intel Parallel Studio [...]

FALCON Library: Fast Image Convolution in Neural Networks on Intel Architecture

November 9, 2016

We describe FALCON, an original open-source implementation of image convolution with a 3×3 filter based on Winograd’s minimal filtering algorithm. Compared to direct convolution, Winograd’s algorithm reduces the number of arithmetic operations at the cost of complicating the memory access pattern. This study is carried out in the context of image analysis in convolutional neural networks. Our implementation combines C language code with BLAS function calls for general matrix-matrix multiplication. The code is optimized for Intel Xeon Phi processors x200 (formerly Knights Landing) with Intel Math Kernel Library (MKL) used for BLAS call to the SGEMM function. To test the performance of FALCON in the context of machine learning, we benchmarked it for a set of image and filter sizes corresponding to the VGG Net architecture. In this test, FALCON achieves 10% greater overall performance than convolution from DNN primitives in Intel MKL. However, for some layers, FALCON is faster than MKL by 1.5x, but for other layers slower by as much as 4x. This indicates a possibility of a [...]

Performance Optimization for Intel® Xeon Phi™ x200 Product Family: Video

September 29, 2016

Optimization for Intel Xeon Phi Processors x200 Colfax now offers a 2-hour Hands-On Workshop (HOW) video on the best practices for performance optimization for Intel® Xeon Phi™ processor (formerly Knights Landing). Use links below the video to navigate the 10 episodes.     Slides:   HOW-Knights-Landing.pdf (4 MB) Part 1. Meet Intel Xeon Phi processors Purpose of Intel Xeon Phi processors and their organization from the programmer’s point of view. Episode 01. ► Intel architecture: today and tomorrow (14 min) Episode 02. ► Cores in Intel Xeon Phi processors (7 min) Episode 03. ► Vector Instruction Support (14 min) Episode 04. ► High-bandwidth memory (8 min) Episode 05. ► Clustering modes (9 min) Part 2. Hands-on Demonstrations Exercises in performance optimization for Intel Xeon Phi processors. Episode 06. ► Memory bandwidth optimization (19 min) (bonus: ► with memkind) (9 min) Episode 07. ► Vectorization with AVX-512 (13 min) (bonus: ► threading) (9 min) Episode 08. ► Tuning with Intel Math Kernel Library (MKL) (20 min) Episode 09. [...]

Intel® Python* on 2nd Generation Intel® Xeon Phi™ Processors: Out-of-the-Box Performance

June 20, 2016

This paper reports on the value and performance for computational applications of the Intel® distribution for Python* 2017 Beta on 2nd generation Intel® Xeon Phi™ processors (formerly codenamed Knights Landing). Benchmarks of LU decomposition, Cholesky decomposition, singular value decomposition and double precision general matrix-matrix multiplication routines in the SciPy and NumPy libraries are presented, and tuning methodology for use with high-bandwidth memory (HBM) is laid out. Download as PDF:  Colfax-Intel-Python.pdf (1 MB) or read online below. Code: coming soon, check back later. See also: colfaxresearch.com/get-ready-for-intel-knights-landing-3-papers/ 1. A Case for Python in Computing Python is a popular scripting language in computational applications. Empowered with the fundamental tools for scientific computing, NumPy and SciPy libraries, Python applications can express in brief and convenient form basic linear algebra subroutines (BLAS) and linear algebra package (LAPACK) functions for operations on matrices and systems of linear algebraic [...]

Knights Landing Webinar Slides Translated to Japanese

May 13, 2016

日XLsoft社の協力で、弊社の “Introduction to Next-Generation Intel® Xeon Phi™ Processor: Developer’s Guide to Knights Landing” で使われているスライド集が日本語に翻訳されました。 With the help of our partners at XLsoft, the slide deck for the webinar “Introduction to Next-Generation Intel® Xeon Phi™ Processor: Developer’s Guide to Knights Landing” has been translated to the Japanese language. XLsoft社のウェブサイト/XLsoft website Download here:  JP-Colfax-Programmers-Guide-to-KNL.pdf (5 MB) For more information, and to register for the webinar, please visit: Webinar [...]

MCDRAM as High-Bandwidth Memory (HBM) in Knights Landing Processors: Developer’s Guide

May 11, 2016

This publication is part of a developer guide focusing on the new features in 2nd generation Intel® Xeon Phi™ processors code-named Knights Landing (KNL). In this document we discuss the on-package high-bandwidth memory (HBM) based on the multi-channel dynamic random access memory (MCDRAM) technology: Three configuration modes of HBM: Flat mode, Cache mode and Hybrid mode Utilization of the HBM as addressable memory using two methods: by setting affinity policy with the numactl tool and through the usage of special allocators in the memkind library Guidelines for determining the optimal usage model for applications running on bootable Knights Landing.  Colfax_KNL_MCDRAM_Guide.pdf (255 KB) See also: colfaxresearch.com/get-ready-for-intel-knights-landing-3-papers/ Table of Contents 1. MCDRAM in KNL 2.1. Cache Mode 2.2. Flat Mode 2.3. Hybrid Mode 3. Using HBM as addressable memory 3.1. numactl 3.2. Memkind Library 3.3. Fortran 4. Choosing Memory and Programming Model 4.1. Programming with HBM… 4.2. …and Programming without HBM Appendix A: Application Memory [...]