Introduction to Intel DAAL, Part 2: Distributed Variance-Covariance Matrix Computation
This is the part 2 of 3 of an introductory series of publications on the Intel® Data Analytics Acceleration Library (DAAL). DAAL is a data analytics library optimized for modern highly parallel computer architectures such as Intel Xeon and Intel Xeon Phi processors. The goal of this series is to provide developers a technical overview for developing applications using DAAL.
In part 1 of the series we discussed how to implement batch mode computation on a single node.
In the present publication, we discuss the distributed mode computation. Our discussion will focus both on how and when to implement distributed mode computation with Intel DAAL.
As an example workload, we implement an application that uses DAAL to compute a covariance matrix of a set of vectors. We first demonstrate how to use distributed mode with this example. Then, using this example application, we scan the parameter space to determine what parameter ranges benefit from distributed computation.
We also demonstrate how the output of this computation may be used in image processing to compute the eigenvectors of a set of images. The source code for this application is available for free download.
In the upcoming 3rd part of the series we will discuss the online computation mode, using an example workload with multiple datasets and interfacing with a relational database via SQL.
Complete paper:
Colfax_Introduction_to_Intel_DAAL_2_of_3.pdf (478 KB)
Sample code for Linux:
Colfax_Intro_to_DAAL_2.zip (4 KB)