hbm

“HOW Series”: Webinars on Performance Optimization, June 2017

April 28, 2017

  In a Nutshell HOW Series “Deep Dive” is a free 20-hour hands-on in-depth training on parallel programming and performance optimization in computational applications on Intel architecture. The 6th run in 2017 begins June 12, 2017. Broadcasts start at 16:00 GMT (9:00 am in San Francisco, 12:00 noon in New York, 5:00 pm in London, 7:00 pm in Moscow, 9:30 pm in New Delhi, 1:00 am in Tokyo). June 2017 S M T W H F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30                   — Webinar+remote access GMT 16:00 San Francisco 9:00 am New York 12:00 noon London 5:00 pm Moscow 7:00 pm New Delhi 9:30 pm Tokyo 1:00 am Live status as of 35 minutes ago: 0 registrants. Register Cannot attend? Register anyway for cluster access, progress updates and recorded video.   Learn More Why Attend the HOW Series Course Roadmap Instructor Bio Prerequisites Remote Access for Hands-On Exercises Slides, Code and Video System Requirements (IMPORTANT!) Supplementary Materials Chat Why Attend the [...]

HOW Series “Deep Dive”: Webinars on Performance Optimization, May 2017

April 13, 2017

  In a Nutshell HOW Series “Deep Dive” is a free 20-hour hands-on in-depth training on parallel programming and performance optimization in computational applications on Intel architecture. The 5th run in 2017 begins May 15, 2017. Broadcasts start at 16:00 GMT (9:00 am in San Francisco, 12:00 noon in New York, 5:00 pm in London, 7:00 pm in Moscow, 10:30 pm in New Delhi, 1:00 am in Tokyo). May 2017 S M T W H F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31                   — Webinar+remote access GMT 16:00 San Francisco 9:00 am New York 12:00 noon London 5:00 pm Moscow 7:00 pm New Delhi 10:30 pm Tokyo 1:00 am Live status as of 35 minutes ago: 37 registrants. Register Registration for this training in June 2017 is also open. Cannot attend? Register anyway for cluster access, progress updates and recorded video.   Learn More Why Attend the HOW Series Course Roadmap Instructor Bio Prerequisites Remote Access for Hands-On Exercises Slides, Code and Video System [...]

HOW Series “Deep Dive”: Webinars on Performance Optimization, April 2017

March 16, 2017

  In a Nutshell HOW Series “Deep Dive” is a free 20-hour hands-on in-depth training on parallel programming and performance optimization in computational applications on Intel architecture. The 4th run in 2017 begins April 17, 2017. Broadcasts start at 16:00 UTC (9:00 am in San Francisco, 12:00 noon in New York, 5:00 pm in London, 7:00 pm in Moscow, 9:30 pm in New Delhi, 1:00 am in Tokyo). April 2017 S M T W H F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30     — Webinar+remote access UTC 16:00 San Francisco 9:00 am New York 12:00 noon London 5:00 pm Moscow 7:00 pm New Delhi 9:30 pm Tokyo 1:00 am Live status as of 2 days ago: 354 registrants. Registration for this workshop is closed, but you can register for the upcoming HOW series in May or you can watch the recordings of all presentations below.   Learn More Why Attend the HOW Series Course Roadmap Instructor Bio Prerequisites Remote Access for Hands-On Exercises Slides, Code and Video System Requirements (IMPORTANT!) Supplementary Materials Chat Why [...]

FALCON Library: Fast Image Convolution in Neural Networks on Intel Architecture

November 9, 2016

We describe FALCON, an original open-source implementation of image convolution with a 3×3 filter based on Winograd’s minimal filtering algorithm. Compared to direct convolution, Winograd’s algorithm reduces the number of arithmetic operations at the cost of complicating the memory access pattern. This study is carried out in the context of image analysis in convolutional neural networks. Our implementation combines C language code with BLAS function calls for general matrix-matrix multiplication. The code is optimized for Intel Xeon Phi processors x200 (formerly Knights Landing) with Intel Math Kernel Library (MKL) used for BLAS call to the SGEMM function. To test the performance of FALCON in the context of machine learning, we benchmarked it for a set of image and filter sizes corresponding to the VGG Net architecture. In this test, FALCON achieves 10% greater overall performance than convolution from DNN primitives in Intel MKL. However, for some layers, FALCON is faster than MKL by 1.5x, but for other layers slower by as much as 4x. This indicates a possibility of a [...]

Training Calendar

October 4, 2016

“HOW” Series: Deep Dive   May 2017 S M T W H F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31                   — Current Information Register June 2017 S M T W H F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30                   — Upcoming Information Register   Learn Modern Code Are you realizing the payoff of parallel processing? Are you aware that without code optimization, computational applications may perform orders of magnitude worse than they are supposed to? The Web-based HOW Series training provides extensive knowledge needed to extract more of the parallel compute performance potential found in both Intel® Xeon® and Intel® Xeon Phi™ processors and coprocessors. Practice New Skills The HOW series is an experiential learning program because comprising instructional and hands-on self-study components: The instructional part: 10 lecture sessions with 1 hour of [...]

Performance Optimization for Intel® Xeon Phi™ x200 Product Family: Video

September 29, 2016

Optimization for Intel Xeon Phi Processors x200 Colfax now offers a 2-hour Hands-On Workshop (HOW) video on the best practices for performance optimization for Intel® Xeon Phi™ processor (formerly Knights Landing). Use links below the video to navigate the 10 episodes.     Slides:   HOW-Knights-Landing.pdf (4 MB) Part 1. Meet Intel Xeon Phi processors Purpose of Intel Xeon Phi processors and their organization from the programmer’s point of view. Episode 01. ► Intel architecture: today and tomorrow (14 min) Episode 02. ► Cores in Intel Xeon Phi processors (7 min) Episode 03. ► Vector Instruction Support (14 min) Episode 04. ► High-bandwidth memory (8 min) Episode 05. ► Clustering modes (9 min) Part 2. Hands-on Demonstrations Exercises in performance optimization for Intel Xeon Phi processors. Episode 06. ► Memory bandwidth optimization (19 min) (bonus: ► with memkind) (9 min) Episode 07. ► Vectorization with AVX-512 (13 min) (bonus: ► threading) (9 min) Episode 08. ► Tuning with Intel Math Kernel Library (MKL) (20 min) Episode 09. ► [...]

Intel® Python* on 2nd Generation Intel® Xeon Phi™ Processors: Out-of-the-Box Performance

June 20, 2016

This paper reports on the value and performance for computational applications of the Intel® distribution for Python* 2017 Beta on 2nd generation Intel® Xeon Phi™ processors (formerly codenamed Knights Landing). Benchmarks of LU decomposition, Cholesky decomposition, singular value decomposition and double precision general matrix-matrix multiplication routines in the SciPy and NumPy libraries are presented, and tuning methodology for use with high-bandwidth memory (HBM) is laid out. Download as PDF:  Colfax-Intel-Python.pdf (1 MB) — this file is available only to registered users. Register or Log In. or read online below. Code: coming soon, check back later. See also: colfaxresearch.com/get-ready-for-intel-knights-landing-3-papers/ 1. A Case for Python in Computing Python is a popular scripting language in computational applications. Empowered with the fundamental tools for scientific computing, NumPy and SciPy libraries, Python applications can express in brief and convenient form basic linear algebra subroutines (BLAS) and linear algebra package (LAPACK) [...]

Knights Landing Webinar Slides Translated to Japanese

May 13, 2016

日XLsoft社の協力で、弊社の “Introduction to Next-Generation Intel® Xeon Phi™ Processor: Developer’s Guide to Knights Landing” で使われているスライド集が日本語に翻訳されました。 With the help of our partners at XLsoft, the slide deck for the webinar “Introduction to Next-Generation Intel® Xeon Phi™ Processor: Developer’s Guide to Knights Landing” has been translated to the Japanese language. XLsoft社のウェブサイト/XLsoft website Download here:  JP-Colfax-Programmers-Guide-to-KNL.pdf (5 MB) — this file is available only to registered users. Register or Log In. For more information, and to register for the webinar, please visit: Webinar [...]

MCDRAM as High-Bandwidth Memory (HBM) in Knights Landing Processors: Developer’s Guide

May 11, 2016

This publication is part of a developer guide focusing on the new features in 2nd generation Intel® Xeon Phi™ processors code-named Knights Landing (KNL). In this document we discuss the on-package high-bandwidth memory (HBM) based on the multi-channel dynamic random access memory (MCDRAM) technology: Three configuration modes of HBM: Flat mode, Cache mode and Hybrid mode Utilization of the HBM as addressable memory using two methods: by setting affinity policy with the numactl tool and through the usage of special allocators in the memkind library Guidelines for determining the optimal usage model for applications running on bootable Knights Landing.  Colfax_KNL_MCDRAM_Guide.pdf (255 KB) — this file is available only to registered users. Register or Log In. See also: colfaxresearch.com/get-ready-for-intel-knights-landing-3-papers/ 1. MCDRAM in KNL Memory bandwidth in computing systems is one of the common bottlenecks for performance in computational application. Bandwidth-limited applications are characterized by algorithms that have few floating point [...]