Articles by Andrey

Best Practices for Speed in Deep Learning Applications on Intel Architecture

July 3, 2018

You have set up a deep learning model that you are planning to train on an Intel architecture processor. In order to be productive, you have to minimize the training time. You run the application and see that it takes N seconds for a single training epoch. How do you know if it is good? If improvement is possible, what can you do to improve the training time? Are there tools to identify a tuning strategy? Intel software development tools can answer these questions to maximize your productivity in deep learning on Intel architecture. At the Intel AI DevCon 2018 in San Francisco, Alaa Eltablawy (Colfax) presented a workshop that demonstrates how this works. For the workshop, attendees received access to the Intel® AI DevCloud, where they could experiment with the optimization of a TensorFlow-based application for image segmentation. The instructor demonstrated the performance analysis results obtained with Intel® VTune Amplifier and Application Performance Snapshot and explained how this analysis consistently guides you to the use of known “performance tuning knobs” in [...]

Access to the Intel® AI DevCloud (Intel® AI DevCon’18)

May 3, 2018

This resource is open only to the participants of Intel® AI DevCon 2018. Request Access Now Please fill out and submit the form below to request access to the Intel® AI DevCloud. You will get additional instructions via the email address that you provide. Do you have a free account at Colfax Research? Save time by logging in – we will fill in the fields with data from your profile. What’s Inside When you get access, you will log in to a Linux-based head node of a batch farm. There you can stage your code and data, compile, and submit calculations to a queue. Once the queued job completes, your results will be in your home folder. Your account is active for up to 1 year. The termination date is determined by the passcode provided to you by Intel Jobs are scheduled on Intel® Xeon® Gold 6128 processors (formerly Skylake) Each processor has 12 cores with 2-way hyper-threading. Each processor has access to 96 GiB of on-platform RAM (DDR4). Only one job will run on any processor at a time. You will get 200 GB of file storage quota. Your home directory is not visible to [...]

An optimization approach for agent-based computational models of biological development

April 9, 2018

Pablo Gonzalez-de-Aledoa, Andrey Vladimirovd, Marco Mancab, Jerry Baughc, Ryo Asaid, Marcus Kaisere,f, Roman Bauerf,e a Software Performance Optimization Group, Imperial College London, London, United Kingdom b CERN Openlab, IT Department, CERN, Switzerland c Intel Corporation, USA d Colfax International, USA e Interdisciplinary Computing and Complex BioSystems Research Group, School of Computing, Newcastle University, Newcastle upon Tyne, United Kingdom f Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, United Kingdom   A paper led by Pablo Gonzales-de-Aledo (Imperial College London) with contributions from his colleagues from CERN, Intel, Colfax and Newcastle University was published in the journal Advances in Engineering Software. This is a case study on performance optimization in a biological simulation code. The code presents a highly parallel implementation of a computer simulation that involves millions of agents interacting in a 3D environment. The paper explains the general approach to transforming a sequential code to run on modern, highly [...]

Academic Signup for Access to the Intel® AI DevCloud

November 27, 2017

This resource is open only to the participants of the Intel program for academic access. Request Access Now Please fill out and submit the form below to request access to the Intel® AI DevCloud. You will get additional instructions via the email address that you provide. Do you have a free account at Colfax Research? Save time by logging in – we will fill in the fields with data from your profile. What’s Inside When you get access, you will log in to a Linux-based head node of a batch farm. There you can stage your code and data, compile, and submit calculations to a queue. Once the queued job completes, your results will be in your home folder. Your account is active for up to 1 year. The termination date is determined by the passcode provided to you by Intel Jobs are scheduled on Intel® Xeon® Gold 6128 processors (formerly Skylake) Each processor has 12 cores with 2-way hyper-threading. Each processor has access to 96 GiB of on-platform RAM (DDR4). Only one job will run on any processor at a time. You will get 200 GB of file storage quota. Your home directory is not [...]

A Performance-Based Comparison of C/C++ Compilers

November 11, 2017

This paper reports a performance-based comparison of six state-of-the-art C/C++ compilers: AOCC, Clang, G++, Intel C++ compiler, PGC++, and Zapcc. We measure two aspects of the compilers’ performance: The speed of compiled C/C++ code parallelized with OpenMP 4.x directives for multi-threading and vectorization. The compilation time for large projects with heavy C++ templating. In addition to measuring the performance, we interpret the results by examining the assembly instructions produced by each compiler. The tests are performed on an Intel Xeon Platinum processor featuring the Skylake architecture with AVX-512 vector instructions.  Colfax_Compiler_Comparison.pdf (562 KB) — this file is available only to registered users. Register or Log In. Table of Contents 1. The Importance of a Good Compiler 2. Testing Methodology 2.1. Meet the Compilers 2.2. Target Architecture 2.3. Computational Kernels 2.4. Compilation Time 2.5. Test Details 2.6. Test Platform 2.7. Code Analysis 3. Results 3.1. Performance of Compiled Code 3.2. Compilation Speed 4. Summary Appendix A. LU [...]

A Survey and Benchmarks of Intel® Xeon® Gold and Platinum Processors

November 7, 2017

This paper provides quantitative guidelines and performance estimates for choosing a processor among the Platinum and Gold groups of the Intel Xeon Scalable family (formerly Skylake). The performance estimates are based on detailed technical specifications of the processors, including the efficiency of the Intel Turbo Boost technology. The achievable performance metrics are experimentally validated on several processor models with synthetic workloads. The best choice of the processor must take into account the nature of the application for which the processor is intended: multi-threading or multi-processing efficiency, support for vectorization, and dependence on memory bandwidth.  Colfax-Xeon-Scalable.pdf (334 KB) — this file is available only to registered users. Register or Log In. Table of Contents 1. Which Xeon is Right for You? 2. CPU Comparison for Different Workloads 2.4. Bandwidth-Limited 3. Processor Choice Recommendations 4. Silver and Bronze Models 5. Large Memory, Integrated Fabric, Thermal Optimization Why are some sections grayed out? You are viewing an [...]

HOW Series “Deep Dive”: Webinars on Performance Optimization – 2017 Edition

June 30, 2017

Register Why Attend Roadmap Instructor Prerequisites Cluster Materials Book   In a Nutshell HOW Series “Deep Dive” is a free Web-based training on parallel programming and performance optimization on Intel architecture. The workshop includes 20 hours of instruction and code for hands-on exercises. This training is free to everyone thanks to Intel’s sponsorship.   You can access the video recordings of lectures, slides of presentations and code of practical exercises on this page using a free Colfax Research account. To run the hands-on exercises, you will need a multi-core Intel architecture processor and the Intel C++ Compiler. You can get this compiler for 30 days at no cost using an evaluation license for Intel Parallel Studio [...]

Webinar: Demystifying Vectorization

May 18, 2017

Free Webinar Abstract Have you heard of code vectorization, but not sure how it applies to your work? Rest assured, you are in a good company. Furthermore, even seasoned computing professionals have a good excuse for not being familiar with this concept! That said, now is a great time to learn about writing vectorized code. That is because in modern Intel processors, vector instructions may speed up arithmetic instructions by up to a factor of 16. However, you must design computational code in a way that makes vector processing possible. In this 1-hour webinar I will explain what to expect from vectorization, and how to make sure that your code has it: Manual and compiler-assisted vectorization Assessing your success with vectorization Loop was vectorized – what’s next? Speaker Andrey Vladimirov, Head of HPC Research, Colfax International Dr. Andrey Vladimirov’s primary research interest is the application of modern computing technologies to computationally demanding scientific problems. Prior to joining Colfax, Andrey was involved in theoretical astrophysics [...]

Get the Most out of Your Free Trial of Intel Xeon Phi Processors

April 7, 2017

Free Webinar Abstract Intel® Xeon Phi™ processors x200 (formerly Knights Landing) are computational beasts. Their theoretical peak performance is up to 3 TFLOP/s and measured memory bandwidth is up to 490 GB/s. This performance is available without any difference in programming models compared to general-purpose x86-like CPUs. Colfax is offering a free trial program for this technology. This program is available through Intel’s sponsorship. The Colfax Cluster has 64 compute nodes based on Intel Xeon Phi 7250 processors. Intel® Omni-Path fabric interconnects the nodes. This cluster is at your service for two weeks for testing and evaluation. In this 1-hour webinar I will describe how you can get the most out of your two weeks on the cluster: What workloads you can run to see the performance How to prepare your own code to run on the cluster Where to learn the best optimization practices for this and similar architectures Slides:  Colfax-Remote-Access-Webinar-2017.pdf (2 MB) — this file is available only to registered users. Register or Log In. Free trial: here [...]

MC² Series: Modern Code Contributed Talks

February 10, 2017

In Modern Code Contributed Talks, or MC² Series, experts in computational disciplines share their experience. Register for these ongoing webinars to learn the performance optimization methods used in real-life applications. Would you like to contribute a talk? Contact us. Scholarship is available in the form of access to a diverse collection of powerful computing [...]
1 2 3 5