Ultra-Efficient Neural Algorithm Accelerator Using Processing With Memory
Abstract: Neuromorphic computing algorithms such as deep neural networks (DNNs) offer significant advantages for pattern recognition applications. Training of a DNN is limited to datasets which are small compared to internet-scale sets. Improving efficiency from 10-100 pF per neural operation (NOP) to a goal of 1 fJ per NOP will enable the training and processing of true internet scale datasets. This efficiency is not possible with current digital electronics but can be enabled with the implementation of key mathematical functions with an analog electronics. It is possible to use a resistive memory (RRAM) crossbar to perform a vector-matrix multiply function electronically, as illustrated in Fig. 1. A resistive switching element changes resistance through a wide range of states in response to electrical biasing. Performing this operation in analog-space with RRAM is very efficient because there is no data movement required, and the entire VMM can be done in parallel, rather than the serial process of moving each element from a register to an execution unit. Millions of these full crossbars can be integrated on a single silicon integrated circuit and operated in parallel.