Installing TensorFlow with distributed GPU support.

Data science

March 4, 2016

Today, I wrote my first “Hello World” script using the freshly open-sourced version of TensorFlow with distributed GPU support. At the time of this writing, the binary releases of TensorFlow don’t come with the distributed GPU support therefore I had to build TensorFlow from sources. All the documentation to do this already exists but is a bit scattered on multiple websites. Here is a condensed version of the install process (on a Linux Ubuntu 14.04 platform).

[su_spoiler title=”Install the dependencies”]

In order to build TensorFlow, you first need to install a few basic tools. Here is the command :

[su_box title=”Code” box_color=”#00AECF”]
$ sudo apt-get install pkg-config zip g++ zlib1g-dev unzip swig git
[/su_box]

You also need to install Java 8. On Ubuntu 14.04, it can easily be done with the following commands as openJDK-8 is not available for Ubuntu 14.04 :

[su_box title=”Code” box_color=”#00AECF”]
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
[/su_box]

The last tool needed is Bazel. Again, it is simply a question of two command lines :

[su_box title=”Code” box_color=”#00AECF”]
$ wget https://github.com/bazelbuild/bazel/releases/download/0.2.0/bazel_0.2.0-linux-x86_64.deb
$ sudo dpkg -i bazel_0.2.0-linux-x86_64.deb
[/su_box]

If all the commands were successfull then you are ready to build TensorFlow … no wait, I have said distributed GPU !

[/su_spoiler]

[su_spoiler title=”Install CUDA and cuDNN”]

Please refer to https://developer.nvidia.com/cuda-download for CUDA and https://developer.nvidia.com/cudnn for cuDNN (you will need to register to the Accelerated Computing Developer Program).

Assuming a standard installation in /usr/local/cuda and the following cuDNN cudnn-7.0-linux-x64-v3.0-prod.tgz, simply run :

[su_box title=”Code” box_color=”#00AECF”]
$ tar -xf cudnn-7.0-linux-x64-v3.0-prod.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
[/su_box]

Now you are ready to build TensorFlow !

[/su_spoiler]

[su_spoiler title=”Build the gRPC server”]

TensorFlow uses gRPC for inter-process communication. To build the server binary, first clone TensorFlow repository :

[su_box title=”Code” box_color=”#00AECF”]
$ git clone –recurse-submodules https://github.com/tensorflow/tensorflow
[/su_box]

NOTE: The initial commit of the open-source distributed TensorFlow runtime is 00986d48bb646daab659503ad3a713919865f32d.

Then, cd into the TensorFlow repository and run the ./configure script. Now, you can build the server binary with :

[su_box title=”Code” box_color=”#00AECF”]
$ bazel build -c opt –config=cuda //tensorflow/core/distributed_runtime/rpc:grpc_tensorflow_server
[/su_box]

[/su_spoiler]

[su_spoiler title=”Build and install the pip package”]

To build the pip package with GPU support, just run :

[su_box title=”Code” box_color=”#00AECF”]
$ bazel build -c opt –config=cuda //tensorflow/tools/pip_package:build_pip_package
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
[/su_box]

and install it with pip :

[su_box title=”Code” box_color=”#00AECF”]
# The name of the .whl file will depend on your platform.
$ pip install /tmp/tensorflow_pkg/tensorflow-0.7.1-py2-none-linux_x86_64.whl
[/su_box]

[/su_spoiler]

[su_spoiler title=”It is time to test !”]

First, start a TensorFlow server as a single-process “cluster” :

[su_box title=”Code” box_color=”#00AECF”]
$ bazel-bin/tensorflow/core/distributed_runtime/rpc/grpc_tensorflow_server –cluster_spec=’local|localhost:2222′ –job_name=local –task_id=0 &
[/su_box]

Then start a Python interpreter and create a remote session with a simple “Hello World !” command:

[su_box title=”Code” box_color=”#00AECF”]
$ python
>>> import tensorflow as tf
>>> c = tf.constant(“Hello World !”)
>>> sess = tf.Session(“grpc://localhost:2222”)
>>> sess.run(c)
“Hello World !”
[/su_box]

Now repeat the process on the different nodes of your cluster and start playing with TensorFlow !

[/su_spoiler]

Remarks

As you can see, it is really easy to set up a cluster supporting distributed Deep Learning with TensorFlow. If you want to know more about what is possible to do, please refer to the README. The approach of distributed support followed by TensorFlow is quite a low-level one, enabling the user to tune any step of the learning process.

What about the others

CNTK claims a huge performance gap, especially in the distributed GPU setting. After a quick look at the documentation, it is not easy to understand their distribution policy and how to reproduce their tests. Digging into the GitHub repository, I found this configuration file : Multigpu.cntk. Apparently, the only option for parrallelism is a DataParallelSGD approach.

MXNet seems to be a serious competitor in the distributed setting. Their approach to distribute the training progress based on a distributed key-value store to exchange the gradients parameters is straightforward yet flexible enough in practice to switch from synchronous to asynchronous learning.

And you ? What is your experience with distributed Deep Learning ?

Releated Posts

Insights from GTC Paris 2025

25.06.2025 / Engineering / Blog, Event

Among the NVIDIA GTC Paris crowd was our CTO Sabri Skhiri, and from quantum computing breakthroughs to the full-stack AI advancements powering industrial digital twins and robotics, there is a lot to share! Explore with Sabri GTC 2025 trends, keynotes, and what it means for businesses looking to innovate.

Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images

23.05.2025 / Engineering / Academic collaborations, Papers

Tumor tracking in PET/CT is essential for monitoring cancer progression and guiding treatment strategies. Traditionally, nuclear physicians manually track tumors, focusing on the five largest ones (PERCIST criteria), which is both time-consuming and imprecise. Automated tumor tracking can allow matching of the numerous metastatic lesions across scans, enhancing tumor change monitoring.

Installing TensorFlow with distributed GPU support.

Remarks

What about the others

Releated Posts

Insights from GTC Paris 2025

Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images

Recent Posts

Insights from GTC Paris 2025

Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images

Insights from Data & AI Tech Summit Warsaw 2025

Insights From Flink Forward 2024

Tracks

Mjolnir

Rune

Vadgelmir

Yggdrasil

Field of expertises

Data architecture

Data governance

Data science

Engineering

Academic collaboration

SERVE

Expertise

CRAFT

digazu

CONTACT

Belgium

France

Tunisia

CAREER

Job Offers

Social media