Optimization of Convolution Layer in Neuro Network of MXNet for GPU

Posted by Susie on December 15, 2017
Github Source
  • Converted convolution into matrix multiplication by unrolling input features and filters
  • Implemented tiling method for memory reuse, and double buffering to reduce synchronization overhead using CUDA
  • Classified 10000 images in 60ms with the speedup of 80 times compared to baseline