3 minute read
Execution-Time Performance of Deep Learning Networks on CPU, GPU and TPU Runtime Environments
:o2: avoid using images, instaed use tables from markdown. Performabnce tables can be copied as is.
Summary
A performance review of execution times on Google Colab, for five deep learning network examples, was conducted on CPU, GPU and TPU runtime environments using the MNIST dataset. The networks were 1) a multi-layer perceptron (MLP) network, 2) a convolutional neural network (CNN), 3) a recurrent neural network (RNN), 4) a long short-term memory network (LSTM), and 5) an autoencoder.
General findings
Training times (Table 1) for all five network exemplars were significantly better on the GPU runtime environment than on Google Colab’s CPU environment. Of the networks, the CNN had the greatest performance improvement on GPUs than CPUs only, with a speedup of over 33 times (3332%). This was followed by the LSTM, which had a speedup of over 22 times (2257%), while speed ups for the autoencoder, MLP and RNN were 1464%, 697% and 229% respectively.
Execution time performance for model testing was also significantly better on GPUs than CPUs, for the exemplars. Speedups for the LSTM, CNN, RNN, autoencoder and MLP where 1113%, 915%, 601%, 326%, and 177% respectively.
The TPU runtime environment performed worse than the CPU environment, on training times for the autoencoder, RNN and CNN. Performance time declines were most significant for the autoencoder (-10%). TPU training times were nevertheless significantly better for the LSTM (+9%), and marginally better for the MLP (+1%), than on CPU runtime. All model exemplars performed worse on model evaluation times, on TPUs than on CPUs.
Discussion
To leverage advantages of using TPUs, optimizations could have been applied to the code used for the performance evaluations [^1]. Nevertheless, no customizations were made to the code used, for a head-to-head comparison in the environments. The network code examples were simply run under the three runtime environment options by changing the relevant Colab notebook settings.
Appendix:
Table 1: Summary of CPU, GPU, TPU Performance
Multi-Layer Perceptron (MLP) Example using MNIST Dataset
Table 2: MLP using CPUs only
Table 3: MLP using GPUs
Table 4: MLP using TPUs
Convolutional Neural Networks (CNN) Example using MNIST Dataset
Table 5: CNN using CPUs only
Table 6: CNN using GPUs
Table 7: CNN using TPUs
Recurrent Neural Networks (RNN) Example using MNIST Dataset
Table 8: RNN using CPUs only
Table 9: RNN using GPUs
Table 10: RNN using TPUs
Long Short-Term Memory (LSTM) Example using MNIST Dataset
Table 11: LSTM using CPUs only
Table 12: LSTM using GPUs
Table 13: LSTM using TPUs
Autoencoder Example using MNIST Dataset
Table 14: Autoencoder using CPUs only
Table 15: Autoencoder using GPUs
Table 16: Autoencoder using TPUs
References
[^1]. Google. (2021, 03 26). TPUs in Colab. Retrieved from [https://colab.research.google.com/:] (https://colab.research.google.com/notebooks/tpu.ipynb#scrollTo=kvPXiovhi3ZZ)