Benchmarks:AI
Training
TensorFlow
TensorFlow provides scripts to run training benchmarks with different models. The scripts are hosted on GitHub here.
It is recommended to run the scripts using nvidia-docker2 and the TensorFlow docker image obtained from NGC.
To simplify the setup I have created a Dockerfile to pull the image and download the scripts. To use this first create a directory to hold your Dockerfiles.
mkdir ~/DockerfilesThen create a file in this directory and add the following
FROM nvcr.io/nvidia/tensorflow:18.10-py3
RUN apt-get update && apt-get install git && git clone -b cnn_tf_v1.10_compatible https://github.com/tensorflow/benchmarks
ENTRYPOINT bashTo build the image run
docker build -f ~/Dockerfiles/tf_bench -t tf_bench .The best way to run the container is in interactive mode as this allows multiple runs to be performed in quick succession. To start the container run
docker run --runtime=nvidia -it tf_benchThe benchmark scripts are located in /workspace/scripts/tf_cnn_benchmarks.
To run the benchmark using synthetic data execute
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet50 --variable_update=parameter_serverThe benchmark can be run with different models. Current supported models are resnet50, resnet152, inception3 and vgg16.
The trained model can be saved by providing a checkpoint directory using the --train_dir flag. For example to train a model using ResNet-152 and 10 epochs with 8 GPUs, and save the trained model use
python tf_cnn_benchmarks.py --num_gpus=8 --batch_size=256 --model=resnet152 --variable_update=parameter_server --train_dir=/workspace/ckpt_dir --num_epochs=10The saved model can then be used to perform other benchmarks for inferencing.