Benchmarks of MXNet and Gluon

This is a performance comparison of native MXNet and Gluon

Introduction

MXNet is a deep learning framework supported by Amazon. I have had experience with MXNet in the year of 2014. At that time, the documentation of MXNet was far from satisfactory, and the amount of operators supported was not small. However, the performance of MXNet in terms of training speed and memory usage outperformed almost all others at that time. (I am talking about you, tensorflow 0.1 :-P)

Recently, MXNet introduced Gluon, which offers high-level abstractions for predefined layers, loss functions, and optimizers.

I am trying to adapt MXNet / Gluon for my next project. But I want to persue the best speed and memory efficiency, therefore I am looking for the answer that whether I should stick with native mxnet or try the new gluon

Gluon

There a three base block classes in Gluon, Block, HybridBlock, and SymbolBlock. SymbolBlock seems to provide a wrap outside the original mxnet symbol API. Block is the new imperative programming API, while HybridBlock provide more flexibility: it is similar to the Block, but can be hybridize() to make a symbolic computational graph, which provides a better performance.

Comparison

A natural questions is, what is the overheading of the wrap outside MXNet? I am going to compare the performance of native mxnet, gluon SymbolBlock, hybridized HybrideBlock, and Block on diffferent network architectures.

The code was here. forward and backward were repeated 100 times for average.

The hardware and software platforms are:

macOS 10.13, CUDA 9.0 CuDNN 7.0

Titan Xp, i7-4790K, 32G

AlexNet

Framework	Time (ms)
native mxnet	41.2
gluon SymbolBlock	40.8
gluon HybridBlock	36.8
gluon HybridBlock (hybridized)	36.8
gluon Block	36.9

It seems the network structure is too simple (sometime naive) to show the difference. I am going to test more complex structures.

GoogLeNet

Framework	Time (ms)
native mxnet	236.6
gluon SymbolBlock	255.1
gluon HybridBlock	270.4
gluon HybridBlock (hybridized)	222.7
gluon Block	272.4

Interestingly, the hybridized HybridBlock gives the best performance in both cases.

Conclusion

The performance lost due to the wrapping of gluon is minimal.