Benchmarks:AI
Glossary
Common Terms
- Epochs
The number times that the learning algorithm will work through the entire training dataset. Increasing this has a linear effect on runtime, so doubling the number of epochs will double training time. For benchmark purposes the final accuracy of the trained model isn't important so only a small number of epochs is needed. For 16 V100 GPUs 10-20 epochs provides sufficient work to determine realistic performance expectations.
- Batch Size
The number of samples used per iteration of the algorithm. This value can be increased in accordance with the amount of available GPU memory and also depends on the model being used. For example ResNet50 is a small network and on a 32GB V100 the batch size can increase to 512, where as the same GPU could only use a batch size of 256 when running ResNet 152 because the network is 3 times larger.