xeon

Optimization of Real-Time Object Detection on Intel® Xeon® Scalable Processors

November 11, 2017

This publication demonstrates the process of optimizing an object detection inference workload on and Intel® Xeon® Scalable processor using TensorFlow. This project pursues two objectives: Achieve object detection with real-time throughput (frame rate) and low latency Minimize the required computational resources In this case study, a model described in the “You Only Look Once” (YOLO) project is used for object detection. The model consists of two components: a convolutional neural network and a post-processing pipeline. In this work, the original Darknet model is converted to a TensorFlow model. First, the convolutional neural network is optimized for inference. Then array programming with NumPy and TensorFlow is implemented for the post-processing pipeline. Finally, environment variables and other configuration parameters are tuned to maximize the computational performance. With these optimizations, real-time object detection is achieved while using a fraction of the available processing power of an Intel Xeon Scalable processor-based system. [...]

Capabilities of Intel® AVX-512 in Intel® Xeon® Scalable Processors (Skylake)

September 19, 2017

This paper reviews the Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set and answers two critical questions: How do Intel® Xeon® Scalable processors based on the Skylake architecture (2017) compare to their predecessors based on Broadwell due to AVX-512? How are Intel Xeon processors based on Skylake different from their alternative, Intel® Xeon Phi™ processors with the Knights Landing architecture, which also feature AVX-512? We address these questions from the programmer’s perspective by demonstrating C language code of microkernels benefitting from AVX-512. For each example, we dig deeper and analyze the compilation practices, resultant assembly, and optimization reports. In addition to code studies, the paper contains performance measurements for a synthetic benchmark with guidelines on estimating peak performance. In conclusion, we outline the workloads and application domains that can benefit from the new features of AVX-512 instructions.  Colfax-SKL-AVX512-Guide.pdf (524 KB) — this file is available only to registered users. [...]

HOW Series “Deep Dive”: Webinars on Performance Optimization – 2017 Edition

June 30, 2017

Register Why Attend Roadmap Instructor Prerequisites Cluster Materials Software Book Chat   In a Nutshell HOW Series “Deep Dive” is a free Web-based training on parallel programming and performance optimization on Intel architecture. The workshop includes 20 hours of instruction and up to 2 weeks of remote access to dedicated training servers for hands-on exercises. This training is free to everyone thanks to Intel’s sponsorship.   You can get trained in one of the two ways: Self-paced: Start Right Now You can access the video recordings of lectures, slides of presentations and code of practical exercises on this page using a free Colfax Research account. This option is free and open to everyone, however, self-paced study does not give you the benefits that you get by joining a workshop (which is also free, but tied to specific dates). To Registration   Upcoming Workshops: (the 10-day workshops occurring in different months have the same agenda) October 2017 M T W H F Sa Su 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 [...]