3 minute read
Execution-Time Performance of Deep Learning Networks on CPU, GPU and TPU Runtime Environments
:o2: avoid using images, instaed use tables from markdown. Performabnce tables can be copied as is.
Summary
A performance review of execution times on Google Colab, for five deep learning network examples, was conducted on CPU, GPU and TPU runtime environments using the MNIST dataset. The networks were 1) a multi-layer perceptron (MLP) network, 2) a convolutional neural network (CNN), 3) a recurrent neural network (RNN), 4) a long short-term memory network (LSTM), and 5) an autoencoder.
General findings
Training times (Table 1) for all five network exemplars were significantly better on the GPU runtime environment than on Google Colab’s CPU environment. Of the networks, the CNN had the greatest performance improvement on GPUs than CPUs only, with a speedup of over 33 times (3332%). This was followed by the LSTM, which had a speedup of over 22 times (2257%), while speed ups for the autoencoder, MLP and RNN were 1464%, 697% and 229% respectively.
Execution time performance for model testing was also significantly better on GPUs than CPUs, for the exemplars. Speedups for the LSTM, CNN, RNN, autoencoder and MLP where 1113%, 915%, 601%, 326%, and 177% respectively.
The TPU runtime environment performed worse than the CPU environment, on training times for the autoencoder, RNN and CNN. Performance time declines were most significant for the autoencoder (-10%). TPU training times were nevertheless significantly better for the LSTM (+9%), and marginally better for the MLP (+1%), than on CPU runtime. All model exemplars performed worse on model evaluation times, on TPUs than on CPUs.
Discussion
To leverage advantages of using TPUs, optimizations could have been applied to the code used for the performance evaluations [^1]. Nevertheless, no customizations were made to the code used, for a head-to-head comparison in the environments. The network code examples were simply run under the three runtime environment options by changing the relevant Colab notebook settings.
Appendix:
Table 1: Summary of CPU, GPU, TPU Performance

Multi-Layer Perceptron (MLP) Example using MNIST Dataset
Table 2: MLP using CPUs only

Table 3: MLP using GPUs

Table 4: MLP using TPUs

Convolutional Neural Networks (CNN) Example using MNIST Dataset
Table 5: CNN using CPUs only

Table 6: CNN using GPUs

Table 7: CNN using TPUs

Recurrent Neural Networks (RNN) Example using MNIST Dataset
Table 8: RNN using CPUs only

Table 9: RNN using GPUs

Table 10: RNN using TPUs

Long Short-Term Memory (LSTM) Example using MNIST Dataset
Table 11: LSTM using CPUs only

Table 12: LSTM using GPUs

Table 13: LSTM using TPUs

Autoencoder Example using MNIST Dataset
Table 14: Autoencoder using CPUs only

Table 15: Autoencoder using GPUs

Table 16: Autoencoder using TPUs

References
[^1]. Google. (2021, 03 26). TPUs in Colab. Retrieved from [https://colab.research.google.com/:] (https://colab.research.google.com/notebooks/tpu.ipynb#scrollTo=kvPXiovhi3ZZ)