Linux Enterprise Automation Engineer

April 13, 2018

Colfax International, a Silicon Valley company with 30+ years of experience in high-end computing systems, is growing its research and development (R&D) team by opening a full-time position of Linux Enterprise Automation Engineer based in Sunnyvale, California, USA. The Automation Engineer will help us to create and maintain unique computing services in support of our research, education and consulting in modern computing technologies (see ABOUT THE JOB: TASKS: As a Linux Enterprise Automation Engineer, you will develop software for process automation in Linux and be responsible for maintaining the computing infrastructure used by customers of Colfax’s business and educational programs. Your level of participation in the creative process will range from making decisions on the architecture of the systems to configuring third-party software and creating custom scripts for process automation. You will be documenting the deployed services, responding to failures, and training others in the support and configuration of your systems. COMPANY: Colfax [...]

Second Edition of “Parallel Programming and Optimization with Intel Xeon Phi Coprocessors”

May 19, 2015

We did it! The second edition of our book, “Parallel Programming and Optimization with Intel Xeon Phi Coprocessors”, is available at and on Members of Colfax Research can enjoy $10 off with a discount code! See table of contents below:   If you cannot see the preview above, download the table of contents here:  table-of-contents-2nd-edition.pdf (1 MB) — this file is available only to registered users. Register or Log [...]

Colfax Research papers translated to Japanese

July 14, 2014

With the help of our partners at Intel, some of our articles on Intel Xeon Phi coprocessor programming were translated to the Japanese language. インテル社の協力で、弊社のインテル(R) Xeon Phi(TM) コプロセッサーのプログラミングについての白書の一部が日本語に翻訳されました。 Original: Configuration and Benchmarks of Peer-to-Peer Communication over Gigabit Ethernet and InfiniBand in a Cluster with Intel Xeon Phi Coprocessors Translation:  JP-Colfax_InfiniBand_for_MIC.pdf (2 MB) — this file is available only to registered users. Register or Log In. Original: Heterogeneous Clustering with Homogeneous Code: Accelerate MPI Applications Without Code Surgery Using Intel Xeon Phi Coprocessors Translation:  JP-Colfax_Heterogeneous_Clustering_Xeon_Phi.pdf (657 KB) — this file is available only to registered users. Register or Log In. Original: Multithreaded Transposition of Square Matrices with Common Code for Intel Xeon Processors and Intel Xeon Phi Coprocessors Translation:  JP-Colfax_Transposition-7110P.pdf (987 [...]

Parallel Computing in the Search for New Physics at LHC

December 2, 2013

In the past few months we have had the pleasure of collaborating with Prof. Valerie Halyo of Princeton University on modernization of a high energy physics application for the needs of the Large Hadron Collider (LHC). The objective of our project is to improve the performance of the trigger at LHC, so as to enable real-time detection of exotic collision event products, such as black holes or jets. For the numerical algorithm of the new trigger software, the Hough transform was chosen. This method allows fast detection of straight or curved tracks in a set of points (detector hits), which could be the traces of new exotic particles. The nature of the numerical Hough transform is highly parallelizable, however, existing implementations did not use hardware parallelism or used it sub-optimally. Colfax’s role in the project was to optimize a thread-parallel implementation of the Hough transform for multi-core processors. The result of our involvement was a code capable of detecting 5000 tracks in a synthetic dataset 250x faster than prior art, on a multi-core desktop CPU. By [...]

Avoiding communication saves time and energy (if you are an algorithm)

May 30, 2012

In this post, I would like to reflect on a seminar that I recently attended at Stanford University’s Institute for Computational and Mathematical Engineering. The talk was given by Prof. James Demmel, who leads the research on communication avoiding algorithms at the UC Berkley Computer Science department. The lessons I took home from this talk are two: first, the research in communication avoiding algorithms has brought about amazing optimization possibilities, which reduce the time and energy usage of a number of computing problems; and second, the trend of hardware upgrades in the academic HPC arena goes in the direction that is counter-productive for these novel methods. Why avoiding communication is important It is common knowledge that arithmetic capabilities of computing systems progress much faster than the bandwidth and latency of computer networks and random-access memory. An explanation of this trend offered by Mark Hoemmen, a student of Demmel, is that “Flops are cheap, bandwidth is money, latency is physics“. The consequence of the skyrocketing [...]