Presentations

Modern Code for Intel Xeon Phi Processors

December 8, 2016

This series of 45-minute webinars was presented by Colfax International in collaboration with Intel in 2016.   ► Part 1 | ► Part 2 | ► Part 3 1. Strategies for Multi-Threading on Intel Xeon Phi Processors Practical recipes for optimizing performance in multi-threaded computational applications on Intel Xeon Phi processors. Presentation covers common issues with thread parallelism: excessive synchronization, false sharing, insufficient iteration space size, and methods for overcoming these issues: parallel reduction, data padding, strip-mining and loop collapse, and nested parallelism. ► Click to watch recording (45 min) – this webinar aired September 28, 2016 Slides:  Colfax_Modern_Code_Webinar_01.pdf (5 MB) — this file is available only to registered users. Register or Log In. 2. Fine-Tuning Vectorization on Intel Xeon Phi Processors Vectorization of computational applications on Intel Xeon Phi processors. Covers automatic vectorization essentials and the toolkit for advanced tuning of vectorization performance, including compiler directives, data [...]

Optimizing Torch Performance for Intel Xeon Phi Processors

November 18, 2016

    In this 1-hour webinar, Ryo Asai (Colfax) discusses how machine learning applications can benefit from code modernization. He begins by exploring the parallelism that gives modern computer architecture its performance, and how it can be leveraged. Then he applies code modernization techniques live on-screen to the Torch machine learning framework. Specifically, he optimizes image recognition through a deep convolutional neural network that uses the VGG-net architecture. For each code modernization technique, he explains why it works, and how to apply it in practice. What you will learn: What code modernization is, and its importance for machine learning Practical knowledge of modern computer architectures Code modernization techniques for leveraging parallelism Slides:  Colfax-Torch-VGG-Webinar.pdf (2 MB) — this file is available only to registered users. Register or Log [...]

Knights Landing Webinar Slides Translated to Japanese

May 13, 2016

日XLsoft社の協力で、弊社の “Introduction to Next-Generation Intel® Xeon Phi™ Processor: Developer’s Guide to Knights Landing” で使われているスライド集が日本語に翻訳されました。 With the help of our partners at XLsoft, the slide deck for the webinar “Introduction to Next-Generation Intel® Xeon Phi™ Processor: Developer’s Guide to Knights Landing” has been translated to the Japanese language. XLsoft社のウェブサイト/XLsoft website Download here:  JP-Colfax-Programmers-Guide-to-KNL.pdf (5 MB) — this file is available only to registered users. Register or Log In. For more information, and to register for the webinar, please visit: Webinar [...]

HOW Series Archive

March 14, 2016

Welcome to the HOW Series! Since July 2015, Colfax has been conducting Web-based workshops parallel programming and optimization for Intel® architecture, including Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. Workshops include 20 hours of Web-based instruction and up to 3 weeks of remote access to dedicated training servers for hands-on exercises. Past Events Past events include: July 2015 Workshop (HOW-15-07): July 1 – July 31, 2015 August 2015 Workshop (HOW-15-08): August 17 – Sep 4, 2015 September 2015 Workshop (HOW-15-09): Sep 9 – Sep 30, 2015 October 2015 Workshop (HOW-15-10): Oct 13 – Oct 30, 2015 November 2015 Workshop (HOW-15-11): Nov 2 – Nov 20, 2015 March 2016 Workshop (HOW-16-04): March 7 – March 25, 2016 Upcoming Events We repeat the HOW series several times per year, and you can register for live webinars and remote access for an upcoming workshop at colfaxresearch.com/how-series. Video Recordings Below are video recordings of the most recent HOW series run. They are updated as we go, so you always have a fresh version [...]

Slide Deck for Colfax Developer Training on Parallel Programming

February 26, 2016

We are making publicly available the slide deck of the Colfax developer training titled “Parallel Programming and Optimization with Intel Architecture“. This training is an intensive course for developers wishing to leverage the Intel architecture. It is also useful for many-core and multi-core processor programming. The course is based on a book of the same name, which contains targeted exercises (“labs”) for hands-on practicum. In 2014-2015, “Parallel Programming and Optimization…” has visited over 100 locations across the United States: research institutions, government labs, universities, and regional trainings. Over 2000 students attended the course. Many of these events were free to attendees thanks to Intel’s sponsorship. Update: now with new information about the upcoming 2nd generation Intel Xeon Phi processor (Knights Landing, KNL). Slide deck:  Colfax-Developer-Training.pdf (12 MB) — this file is available only to registered users. Register or Log In. (last updated October [...]

Scientific Computing with Intel Xeon Phi Coprocessors

February 4, 2015

I had the privilege of giving a presentation at the HPC Advisory Council Stanford Conference 2015. Thanks to insideHPC, a recording of this presentation is available on YouTube. Slides are available here and here:  Colfax-HPCAC.pdf () — this file is available only to registered users. Register or Log In. If you are interested in individual case studies mentioned in the talk, here they are: Paper: 2013a, 2013b Papers: 2013, 2014 Paper: 2013 Paper: [...]

Crash Course on Programming and Optimization with Intel Xeon Phi Coprocessors at SC14

November 16, 2014

Programming and optimization of applications for Intel Xeon Phi processors is going to be discussed in more than ten presentations in four concurrent track sessions at the Intel HPC Developer Conference at SC14 in New Orleans, LA on November 16, 2014. Colfax has contributed two of these presentations: one a crash course on the applicability domain and programming models for Intel Xeon Phi coprocessors, and another a demonstration of optimization of an N-body simulation for coprocessors on the node level and cluster level. Slides of our presentations can be downloaded from this page. Stay tuned for an upcoming Colfax Research paper with downloadable code for the example demonstrated in our slides. If you are attending SC14 in New Orleans, visit us at Colfax’s booth 1047 and also at the Intel Channel Pavilion. Part 1. Introduction, Programming Models:  Colfax-Intro.pdf (10 MB) — this file is available only to registered users. Register or Log In. Part 2. Optimization Techniques:  Colfax-Optimization.pdf (9 MB) — this file is available only to registered [...]

Primer on Computing with Intel Xeon Phi Coprocessors

March 6, 2014

Geant4 is a high energy physics application package for simulation of elementary particle transport through matter. It is used in fundamental physics experiments, as well as in industrial and medical applications. For example, the ATLAS detector at LHC and the Fermi Gamma-Ray Space Telescope rely on Geant4 simulations, DNA damage due to ionizing radiation is studied by a derivative project Geant4-DNA, and radiotherapy planning can benefit from calculations with Geant4. Geant4 has long been employing distributed-memory parallelism in the MPI framework. However, due to the trend of increasing ratio of core count to memory size in modern computing systems, and due to the need to process larger geometry models, Geant4 is undergoing modernization through inclusion of thread parallelism in shared memory. This effort is led by SLAC researchers Dr. Makoto Asai and Dr. Andrea Dotti (see, e.g., slides 1 and slides 2). A beneficial by-product of such modernization is the possibility to use the Intel Many Integrated Core (MIC) architecture of Intel Xeon Phi coprocessors for Geant4 [...]

Accelerating Public Domain Applications: Lessons from Models of Radiation Transport in the Milky Way Galaxy

November 25, 2013

Last week I had the privilege of giving a talk at the Intel Theater at SC’13. I presented a case study done with Stanford University on using Intel Xeon Phi coprocessors for accelerating a new astrophysical library HEATCODE (HEterogeneous Architecture library for sTochastic COsmic Dust Emissivity). If this talk can be summarized in one sentence, that will be “One high performance code for two platforms is reality“. Indeed, the optimizations performed in order to optimize HEATCODE for the MIC architecture lead to a tremendous performance increase on the CPU platform. As a consequence, we have developed a high performance library which can be employed and modified both by users who have access to Xeon Phi coprocessors, and by those only using multi-core CPUs. The paper introducing HEATCODE library with details of the optimization process is under review at Computer Physics Communications. The preliminary manuscript can be obtained from arXiv, and the slides of the talk are available on this page (see links above and below). The open source code will be made available [...]

Accelerated Simulations of Cosmic Dust Heating Using the Intel Many Integrated Core Architecture

June 7, 2013

Cosmic dust absorbs starlight in the optical and ultraviolet ranges, and re-emits it in the infrared range. This process is crucial for radiative transport in our Galaxy. I am participating in a project to develop a computational tool for Galactic radiative transport simulation with stochastic light absorption and re-emission on small dust grains. This project has resulted in the development of a library called HEATCODE (HEterogeneous Architecture library for sTochastic COsmic Dust Emissivity) for fast calculation of the stochastic dust heating process using Intel Xeon Phi coprocessors. I presented HEATCODE and shared my experiences with the development and optimization of applications for Xeon Phi coprocessors in a talk at the Applied Mathematics and Statistics Department at UCSC. The slides from this talk can be downloaded here (see below). The full source code of the application, along with a detailed description of the optimization process, will soon be submitted for peer-reviewed publication, and will become publicly available. Slides from the talk: [...]
1 2