Xla Optimizations

Accelerated Linea Algebra JIT Compiler Optimization

Difficulty: Advanced

The XLA compiler available for tensorflow must be custom-built from source, and is only compatible with Nvidia devices with compute capability 5.2 or greater (P100 or better!). Though extremely difficult to get working, it is well worth the trouble, as these traces below demonstrate:

No XLA on the System: base trace

XLA on System but Disabled: mkl trace

XLA on System Enabled: mkl trace

Background

Like many compilers, XLA produces an intermediate representation (IR) from the original tensorflow code, with which it analyzes whether or not certain linear algebra calculations can be fused together to execute more efficiently. When executed well, this methodology can significantly reduce running time on CPU and GPU, reduce memory usage, and enable cross-platform compatibility. Tensorflow has published a series of posts detailing the techincal details of XLA here.

Setup Guide

What follows is a step-by-step guide to build, install, and test the Tensorflow XLA JIT compiler. Unfortunately, since the software is still in experimental development phase, we were only able to demonstrate its use with some very specific system settings. Briefly, this process involves taking an existing docker container from an Nvidia managed image repository, updating its configurations so that we may rebuild tensorflow from scratch on it, and pip-installing the generated .whl. Testing, as in previous posts, all takes place within the safety of the docker container so as not to contaminate the host system’s settings.

Hardware Configuration
- Gcloud Copmute instance
- 8x vCPUs, 30G RAM
- 1x Nvidia Tesla P100 GPU
- 256G SSD Drive
System Requirements
- Ubuntu 16.04 LTS
- Python3.6, with Anaconda package manager
- Nvidia Tesla P100 Drivers website
- CUDA 9.0 Developer Toolkit website
- CuDNN 7.0 website
- Nvidia-docker2

Install and configure nvidia-docker2, setup your NGC account as per the steps detailed in this series of posts by Dr. Donald Kinghorn.

Download the nvidia/tensorflow docker container.

$ docker pull nvcr.io/nvidia/tensorflow:18.04-py3

Make a directory to do your build

$ mkdir TF-build
$ cd TF-build
$ export REPO=`pwd`

Download tensorflow source code (leave it at version 1.8). Also, pull down the XLA mnist test example from the Tensorflow website.

$ git clone https://github.com/tensorflow/tensorflow
$ wget https://raw.githubusercontent.com/tensorflow/tensorflow/r1.8/tensorflow/examples/tutorials/mnist/mnist_softmax_xla.py
$ cd tensorflow/

Setup docker container build directory
```
$ mkdir dockerfile
$ cd dockerfile
```
Supply the necessary dependency files/hosts for Anaconda and Bazel. Note that if you are using another system aside from x86-linux you will need to acquire the appropriate anaconda file.
```
$ wget https://repo.anaconda.com/archive/Anaconda3-5.1.0-Linux-x86_64.sh
$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" > bazel.list
```

Create the Dockerfile. Save the following as “Dockerfile”, with a capital D!

 # Dockerfile to setup a build environment for TensorFlow
 # using Intel MKL and Anaconda3 Python

 # We are modifying the existing Nvidia container
 FROM nvcr.io/nvidia/tensorflow:18.04-py3

 # Add a few needed packages to the base Ubuntu 16.04
 # OK, maybe *you* don't need emacs :-)
 RUN \
    apt-get update && apt-get install -y \
    build-essential \
    curl \
    emacs-nox \
    git \
    openjdk-8-jdk \
    && rm -rf /var/lib/lists/*

 # Add the repo for bazel and install it.
 # I just put it in a file bazel.list and coped in the file
 # containing the following line
 # deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8
 COPY bazel.list /etc/apt/sources.list.d/
 RUN \
  curl https://bazel.build/bazel-release.pub.gpg | apt-key add - && \
  apt-get update && apt-get install -y bazel

 # Copy in and install Anaconda3 from the shell archive
 # Anaconda3-5.1.0-Linux-x86_64.sh
 COPY Anaconda3* /root/
 RUN \
  cd /root; chmod 755 Anaconda3*.sh && \
  ./Anaconda3*.sh -b && \
  echo 'export PATH="$HOME/anaconda3/bin:$PATH"' >> .bashrc && \
  rm -f Anaconda3*.sh

 # That's it! That should be enough to do a TensorFlow 1.7 CPU build
 # using Anaconda Python 3.6 Intel MKL with gcc 5.4

Build the modified nvidia/tensorflow container and run it. Note: make sure $REPO is set to your TF-build directory.

$ docker build -t tf-build-1.8-gpu-mkl-xla .
$ docker run --runtime=nvidia --rm -it -v $REPO:/root/TF-build tf-build-1.8-gpu-mkl-xla

Configure Tensorflow. You should now be greeted with a custom CLI prompt, which indicates that we are running inside the container.
```
> cd root/TF-build/tensorflow
```
- You will need to manually edit the configure.py script here to get the correct settings:
- Change the default _DEFAULT_CUDA_COMPUTE_CAPABILITIES = '6.0'
- Turn on XLA with set_build_var(environ_cp, 'TF_ENABLE_XLA', 'XLA JIT', 'with_xla_support', True, 'xla') - Now you can run the script as before…
```
> python configure.py
```
```
  - Say yes to "jemalloc support", and no to every other prompt (including CUDA support, as we are not yet demonstrating GPU).
```
Build Tensorflow. Warning: This can take quite some time, on the order of 5-6 HOURS in the case of our GCP instance. Compute time with the P100 is expensive, so proceed at your own risk!
```
> bazel build //tensorflow/tools/pip_package:build_pip_package
```

Create the pip package

> bazel-bin/tensorflow/tools/pip_package/build_pip_package ../tensorflow_pkg

Create a new conda environment and install the custom Tensorflow build

> conda create -n tftest
> source activate tftest
> cd ../ && pip install tensorflow_pkg/tensorflow-1.8.0-cp36-cp36m-linux_x86_64.whl

Test that your XLA is working by running the mnist example and examining the generated timeline.ctf.json file via chrome://tracing. (From this Tensorflow XLA tutorial)
```
> TF_XLA_FLAGS=--xla_generate_hlo_graph=.* python mnist_softmax_xla.py
```