Articles by Andrey

Linux Enterprise Automation Engineer

April 13, 2018

Colfax International, a Silicon Valley company with 30+ years of experience in high-end computing systems, is growing its research and development (R&D) team by opening a full-time position of Linux Enterprise Automation Engineer based in Sunnyvale, California, USA. The Automation Engineer will help us to create and maintain unique computing services in support of our research, education and consulting in modern computing technologies (see ABOUT THE JOB: TASKS: As a Linux Enterprise Automation Engineer, you will develop software for process automation in Linux and be responsible for maintaining the computing infrastructure used by customers of Colfax’s business and educational programs. Your level of participation in the creative process will range from making decisions on the architecture of the systems to configuring third-party software and creating custom scripts for process automation. You will be documenting the deployed services, responding to failures, and training others in the support and configuration of your systems. COMPANY: Colfax [...]

An optimization approach for agent-based computational models of biological development

April 9, 2018

Pablo Gonzalez-de-Aledoa, Andrey Vladimirovd, Marco Mancab, Jerry Baughc, Ryo Asaid, Marcus Kaisere,f, Roman Bauerf,e a Software Performance Optimization Group, Imperial College London, London, United Kingdom b CERN Openlab, IT Department, CERN, Switzerland c Intel Corporation, USA d Colfax International, USA e Interdisciplinary Computing and Complex BioSystems Research Group, School of Computing, Newcastle University, Newcastle upon Tyne, United Kingdom f Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, United Kingdom   A paper led by Pablo Gonzales-de-Aledo (Imperial College London) with contributions from his colleagues from CERN, Intel, Colfax and Newcastle University was published in the journal Advances in Engineering Software. This is a case study on performance optimization in a biological simulation code. The code presents a highly parallel implementation of a computer simulation that involves millions of agents interacting in a 3D environment. The paper explains the general approach to transforming a sequential code to run on modern, highly [...]

Academic Signup for Access to the Intel® AI DevCloud

November 27, 2017

This resource is open only to the participants of the Intel program for academic access. Request Access Now Please fill out and submit the form below to request access to the Intel® AI DevCloud. You will get additional instructions via the email address that you provide. Do you have a free account at Colfax Research? Save time by logging in – we will fill in the fields with data from your profile. What’s Inside When you get access, you will log in to a Linux-based head node of a batch farm. There you can stage your code and data, compile, and submit calculations to a queue. Once the queued job completes, your results will be in your home folder. Your account is active for up to 1 year. The termination date is determined by the passcode provided to you by Intel Jobs are scheduled on Intel® Xeon® Gold 6128 processors (formerly Skylake) Each processor has 12 cores with 2-way hyper-threading. Each processor has access to 96 GiB of on-platform RAM (DDR4). Only one job will run on any processor at a time. You will get 200 GB of file storage quota. Your home directory is not [...]

A Performance-Based Comparison of C/C++ Compilers

November 11, 2017

This paper reports a performance-based comparison of six state-of-the-art C/C++ compilers: AOCC, Clang, G++, Intel C++ compiler, PGC++, and Zapcc. We measure two aspects of the compilers’ performance: The speed of compiled C/C++ code parallelized with OpenMP 4.x directives for multi-threading and vectorization. The compilation time for large projects with heavy C++ templating. In addition to measuring the performance, we interpret the results by examining the assembly instructions produced by each compiler. The tests are performed on an Intel Xeon Platinum processor featuring the Skylake architecture with AVX-512 vector instructions.  Colfax_Compiler_Comparison.pdf (562 KB) — this file is available only to registered users. Register or Log In. Table of Contents 1. The Importance of a Good Compiler 2. Testing Methodology 2.1. Meet the Compilers 2.2. Target Architecture 2.3. Computational Kernels 2.4. Compilation Time 2.5. Test Details 2.6. Test Platform 2.7. Code Analysis 3. Results 3.1. Performance of Compiled Code 3.2. Compilation Speed 4. Summary Appendix A. LU [...]

A Survey and Benchmarks of Intel® Xeon® Gold and Platinum Processors

November 7, 2017

This paper provides quantitative guidelines and performance estimates for choosing a processor among the Platinum and Gold groups of the Intel Xeon Scalable family (formerly Skylake). The performance estimates are based on detailed technical specifications of the processors, including the efficiency of the Intel Turbo Boost technology. The achievable performance metrics are experimentally validated on several processor models with synthetic workloads. The best choice of the processor must take into account the nature of the application for which the processor is intended: multi-threading or multi-processing efficiency, support for vectorization, and dependence on memory bandwidth.  Colfax-Xeon-Scalable.pdf (334 KB) — this file is available only to registered users. Register or Log In. Table of Contents 1. Which Xeon is Right for You? 2. CPU Comparison for Different Workloads 3. Processor Choice Recommendations 4. Silver and Bronze Models 5. Large Memory, Integrated Fabric, Thermal Optimization Why are some sections grayed out? You are viewing an abridged version of this [...]

HOW Series “Deep Dive”: Webinars on Performance Optimization – 2017 Edition

June 30, 2017

Register Why Attend Roadmap Instructor Prerequisites Cluster Materials Book   In a Nutshell HOW Series “Deep Dive” is a free Web-based training on parallel programming and performance optimization on Intel architecture. The workshop includes 20 hours of instruction and code for hands-on exercises. This training is free to everyone thanks to Intel’s sponsorship.   You can access the video recordings of lectures, slides of presentations and code of practical exercises on this page using a free Colfax Research account. To run the hands-on exercises, you will need a multi-core Intel architecture processor and the Intel C++ Compiler. You can get this compiler for 30 days at no cost using an evaluation license for Intel Parallel Studio [...]

Webinar: Demystifying Vectorization

May 18, 2017

Free Webinar Abstract Have you heard of code vectorization, but not sure how it applies to your work? Rest assured, you are in a good company. Furthermore, even seasoned computing professionals have a good excuse for not being familiar with this concept! That said, now is a great time to learn about writing vectorized code. That is because in modern Intel processors, vector instructions may speed up arithmetic instructions by up to a factor of 16. However, you must design computational code in a way that makes vector processing possible. In this 1-hour webinar I will explain what to expect from vectorization, and how to make sure that your code has it: Manual and compiler-assisted vectorization Assessing your success with vectorization Loop was vectorized – what’s next? Speaker Andrey Vladimirov, Head of HPC Research, Colfax International Dr. Andrey Vladimirov’s primary research interest is the application of modern computing technologies to computationally demanding scientific problems. Prior to joining Colfax, Andrey was involved in theoretical astrophysics [...]

Get the Most out of Your Free Trial of Intel Xeon Phi Processors

April 7, 2017

Free Webinar Abstract Intel® Xeon Phi™ processors x200 (formerly Knights Landing) are computational beasts. Their theoretical peak performance is up to 3 TFLOP/s and measured memory bandwidth is up to 490 GB/s. This performance is available without any difference in programming models compared to general-purpose x86-like CPUs. Colfax is offering a free trial program for this technology. This program is available through Intel’s sponsorship. The Colfax Cluster has 64 compute nodes based on Intel Xeon Phi 7250 processors. Intel® Omni-Path fabric interconnects the nodes. This cluster is at your service for two weeks for testing and evaluation. In this 1-hour webinar I will describe how you can get the most out of your two weeks on the cluster: What workloads you can run to see the performance How to prepare your own code to run on the cluster Where to learn the best optimization practices for this and similar architectures Slides:  Colfax-Remote-Access-Webinar-2017.pdf (2 MB) — this file is available only to registered users. Register or Log In. Free trial: here [...]

MC² Series: Modern Code Contributed Talks

February 10, 2017

In Modern Code Contributed Talks, or MC² Series, experts in computational disciplines share their experience. Register for these ongoing webinars to learn the performance optimization methods used in real-life applications. Would you like to contribute a talk? Contact us. Scholarship is available in the form of access to a diverse collection of powerful computing [...]

Modern Code for Intel Xeon Phi Processors

December 8, 2016

This series of 45-minute webinars was presented by Colfax International in collaboration with Intel in 2016.   ► Part 1 | ► Part 2 | ► Part 3 1. Strategies for Multi-Threading on Intel Xeon Phi Processors Practical recipes for optimizing performance in multi-threaded computational applications on Intel Xeon Phi processors. Presentation covers common issues with thread parallelism: excessive synchronization, false sharing, insufficient iteration space size, and methods for overcoming these issues: parallel reduction, data padding, strip-mining and loop collapse, and nested parallelism. ► Click to watch recording (45 min) – this webinar aired September 28, 2016 Slides:  Colfax_Modern_Code_Webinar_01.pdf (5 MB) — this file is available only to registered users. Register or Log In. 2. Fine-Tuning Vectorization on Intel Xeon Phi Processors Vectorization of computational applications on Intel Xeon Phi processors. Covers automatic vectorization essentials and the toolkit for advanced tuning of vectorization performance, including compiler directives, data [...]
1 2 3 4