   LU Decomposition with Optimization for the Intel MIC Architecture
   Copyright 2015, Colfax International

   Author:  andrey@colfax-intl.com  Andrey Vladimirov
            phi@colfax-intl.com     General inquiries

   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation, either version 3 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.



DESCRIPTION: The code in this archive supplements the publication
	     "Fine-Tuning Vectorization and Memory Traffic
              on Intel Xeon Phi Coprocessors:	
              LU Decomposition of Small Matrices"
             (A. Vladimirov, 2015 -- Colfax Research 
              http://research.colfaxinternational.com/post/2015/01/27/LU.aspx )

	     Directories step-00/ through step-05/ contain
             the LU decomposition code at different stages of optimization,
             with step-05/ being the most optimized.
             Directory step-mkl/ contains the code used for Intel MKL benchmarks.

REQUIREMENTS:
  - Intel C++ compiler version 15.0.1.133 or greater;
  - Multi-core processor based on Intel architecture;
  - 8 GB of RAM or more;
  - An Intel Xeon Phi coprocessor with passwordless SSH 
    authentication configured
  - Linux operating system in order to use the included
    Makefile and benchmark script.

EXAMPLES OF USAGE:
  - To compile the code in one of the steps, run "make"
  - To execute the code on the CPU, run "make run-cpu"
  - To execute the code on the coprocessor, run "make run-mic"

