Reduced-memory Training and Deployment of Deep Residual Networks by Stochastic Binary Quantization
Abstract: Motivated by the goal of enabling energy-efficient and/or lower-cost hardware implementations of deep neural networks, we describe a method for modifying the standard backpropagation algorithm that significantly reduces the memory usage during training by up to a factor of 32 compared with standard single-precision floating point implementations. The method is inspired by recent work on feedback alignment in the context of seeking neurobiological correlates of backpropagationbased learning; similar to feedback alignment, we also calculate gradients imprecisely. Specifically, our method introduces stochastic binarization of hidden-unit activations for use in the backward pass, after they are no longer used in the forward pass. We show that without stochastic binarization the method is far less effective. As verification of the effectiveness of the method, we trained wide residual networks with 20 weight layers on the CIFAR-10 and CIFAR-100 image classification benchmarks, achieving error rates of 5.43%, 23.01% respectively. These error rates compare with 4.53% and 20.51% on the
same network trained without stochastic binarization. Moreover, we also investigated learning binary-weights in deep residual networks and demonstrate, for the first time, that networks using binary weights at test time can perform equally to full-precision networks on CIFAR-10, with both achieving 4.5% error rate using a wide residual network with 20 layers of weights. On CIFAR-100, binary-weights at test time had an error of 22.28%, within 2% of the full-precision case.