A 127mW 1.63TOPS Sparse Spatio-Temporal Cognitive SoC for Action Classification and Motion Tracking in Videos

  • Authors:
    Zhengya Zhang (Univ. of Michigan), Ching-En Lee (Univ. of Michigan), Thomas Chen (Univ. of Michigan)
    Publication ID:
    Publication Type:
    Received Date:
    Last Edit Date:
    2385.003 (University of California/Berkeley)


A sparse spatio-temporal (ST) cognitive SoC is designed to extract ST features from videos for action classification and motion tracking. The SoC core is a sparse ST convolutional auto-encoder that implements recurrence using a 3-layer network. High sparsity is enforced in each layer of processing, reducing the complexity of ST convolution by two orders of magnitude and allowing all multiply-accumulates (MAC) to be replaced by select-adds (SA). The design is demonstrated in a 3.98mm2 40nm CMOS SoC with an OpenRISC processor providing software-defined control and classification. ST kernel compression is applied to reduce memory by 43%. At 0.9V and 240MHz, the SoC achieves 1.63TOPS to meet the 60fps 1920Ă—1080 HD video data rate, dissipating 127mW.

4819 Emperor Blvd, Suite 300 Durham, NC 27703 Voice: (919) 941-9400 Fax: (919) 941-9450