A 55nm, 1.0V-0.4V, 1.25pJ/MAC Time-Domain Mixed-Signal Neuromorphic Accelerator with Stochastic Synapses for Reinforcement Learning in Autonomous Micro-Robots
Reinforcement learning (RL) is a bio-mimetic learning approach where agents can learn about an environment by performing specific tasks, without any human supervision. RL is inspired by behavioral psychology, where agents take actions to maximize a cumulative reward. In this paper, we present an RL neuromorphic accelerator performing obstacle avoidance in a micro-robot at the edge of the cloud. We propose an energy-efficient time-domain mixed-signal (TD-MS) computational framework. In TD-MS computation, we demonstrate that the energy to compute is proportional to the importance of the computation. We leverage the unique properties of stochastic networks and recent advances in Q-learning in the proposed RL implementation. The 55nm test-chip implements RL using a three-layered fully-connected neural network and consumes a peak power of 690uW.