An Optimal Microarchitecture for Stencil Computation with Data Reuse and Fine-Grained Parallelism

  • Authors:
    Yuze Chi (UCLA), Peipei Zhou (UCLA), Jason Cong (UCLA)
    Publication ID:
    Publication Type:
    Received Date:
    Last Edit Date:
    2384.001 (Harvard University)
    2384.007 (University of Michigan)


Stencil computation is one of the most important kernels for many applications such as image processing, solving partial differential equations, and cellular automata. Nevertheless, implementing a high throughput stencil kernel is not trivial due to its nature of high memory access load and low operational intensity. In this work we adopt data reuse and fine-grained parallelism and present an optimal microarchitecture for stencil computation. The data reuse line buffers not only fully utilize the external memory bandwidth and fully reuse the input data, they also minimize the size of data reuse buffer given the number of fine-grained parallelized and fully pipelined PEs. With the proposed microarchitecture, the number of PEs can be increased to saturate all available off-chip memory bandwidth. We implement this microarchitecture with a high-level synthesis (HLS) based template instead of register transfer level (RTL) specifications, which provides great programmability. To guide the system design, we propose a performance model in addition to detailed model evaluation and optimization analysis. Experimental results from on-board execution show that our design can provide an average of 6.5x speedup over line buffer-only design with only 2.4x resource overhead. Compared with loop transformation-only design, our design can implement a fully pipelined accelerator for applications that cannot be implemented with loop transformation-only due to its high memory conflict and low design flexibility. Furthermore, our FPGA implementation provides 83% throughput of a 14-core CPU with 4x energy-efficiency.

4819 Emperor Blvd, Suite 300 Durham, NC 27703 Voice: (919) 941-9400 Fax: (919) 941-9450

Important Information for the SRC website. This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies. If you are not happy with the use of these cookies, please review our Cookie Policy to learn how they can be disabled. By disabling cookies, some features of the site will not work.