Today, I wrote my first “Hello World” script using the freshly open-sourced version of TensorFlow with distributed GPU support. At the time of this writing, the binary releases of TensorFlow don’t come with the distributed GPU support therefore I had to build TensorFlow from sources. All the documentation to do this already exists but is a bit scattered on multiple websites. Here is a condensed version of the install process (on a Linux Ubuntu 14.04 platform).
[su_spoiler title=”Install the dependencies”]
In order to build TensorFlow, you first need to install a few basic tools. Here is the command :
[su_box title=”Code” box_color=”#00AECF”]
$ sudo apt-get install pkg-config zip g++ zlib1g-dev unzip swig git
[/su_box]
You also need to install Java 8. On Ubuntu 14.04, it can easily be done with the following commands as openJDK-8 is not available for Ubuntu 14.04 :
[su_box title=”Code” box_color=”#00AECF”]
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
[/su_box]
The last tool needed is Bazel. Again, it is simply a question of two command lines :
[su_box title=”Code” box_color=”#00AECF”]
$ wget https://github.com/bazelbuild/bazel/releases/download/0.2.0/bazel_0.2.0-linux-x86_64.deb
$ sudo dpkg -i bazel_0.2.0-linux-x86_64.deb
[/su_box]
If all the commands were successfull then you are ready to build TensorFlow … no wait, I have said distributed GPU !
[/su_spoiler]
[su_spoiler title=”Install CUDA and cuDNN”]
Please refer to https://developer.nvidia.com/cuda-download for CUDA and https://developer.nvidia.com/cudnn for cuDNN (you will need to register to the Accelerated Computing Developer Program).
Assuming a standard installation in /usr/local/cuda and the following cuDNN cudnn-7.0-linux-x64-v3.0-prod.tgz, simply run :
[su_box title=”Code” box_color=”#00AECF”]
$ tar -xf cudnn-7.0-linux-x64-v3.0-prod.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
[/su_box]
Now you are ready to build TensorFlow !
[/su_spoiler]
[su_spoiler title=”Build the gRPC server”]
TensorFlow uses gRPC for inter-process communication. To build the server binary, first clone TensorFlow repository :
[su_box title=”Code” box_color=”#00AECF”]
$ git clone –recurse-submodules https://github.com/tensorflow/tensorflow
[/su_box]
NOTE: The initial commit of the open-source distributed TensorFlow runtime is 00986d48bb646daab659503ad3a713919865f32d.
Then, cd into the TensorFlow repository and run the ./configure script. Now, you can build the server binary with :
[su_box title=”Code” box_color=”#00AECF”]
$ bazel build -c opt –config=cuda //tensorflow/core/distributed_runtime/rpc:grpc_tensorflow_server
[/su_box]
[/su_spoiler]
[su_spoiler title=”Build and install the pip package”]
To build the pip package with GPU support, just run :
[su_box title=”Code” box_color=”#00AECF”]
$ bazel build -c opt –config=cuda //tensorflow/tools/pip_package:build_pip_package
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
[/su_box]
and install it with pip :
[su_box title=”Code” box_color=”#00AECF”]
# The name of the .whl file will depend on your platform.
$ pip install /tmp/tensorflow_pkg/tensorflow-0.7.1-py2-none-linux_x86_64.whl
[/su_box]
[/su_spoiler]
[su_spoiler title=”It is time to test !”]
First, start a TensorFlow server as a single-process “cluster” :
[su_box title=”Code” box_color=”#00AECF”]
$ bazel-bin/tensorflow/core/distributed_runtime/rpc/grpc_tensorflow_server –cluster_spec=’local|localhost:2222′ –job_name=local –task_id=0 &
[/su_box]
Then start a Python interpreter and create a remote session with a simple “Hello World !” command:
[su_box title=”Code” box_color=”#00AECF”]
$ python
>>> import tensorflow as tf
>>> c = tf.constant(“Hello World !”)
>>> sess = tf.Session(“grpc://localhost:2222”)
>>> sess.run(c)
“Hello World !”
[/su_box]
Now repeat the process on the different nodes of your cluster and start playing with TensorFlow !
[/su_spoiler]
Remarks
As you can see, it is really easy to set up a cluster supporting distributed Deep Learning with TensorFlow. If you want to know more about what is possible to do, please refer to the README. The approach of distributed support followed by TensorFlow is quite a low-level one, enabling the user to tune any step of the learning process.
What about the others
CNTK claims a huge performance gap, especially in the distributed GPU setting. After a quick look at the documentation, it is not easy to understand their distribution policy and how to reproduce their tests. Digging into the GitHub repository, I found this configuration file : Multigpu.cntk. Apparently, the only option for parrallelism is a DataParallelSGD approach.
MXNet seems to be a serious competitor in the distributed setting. Their approach to distribute the training progress based on a distributed key-value store to exchange the gradients parameters is straightforward yet flexible enough in practice to switch from synchronous to asynchronous learning.
And you ? What is your experience with distributed Deep Learning ?