Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation
Pointer chasing is a fundamental operation, used by many important data-intensive applications (e.g., databases, key-value stores, graph processing workloads) to traverse linked data structures. This operation is both memory bound and latency sensitive, as it (1) exhibits irregular access patterns that cause frequent cache and TLB misses, and (2) requires the data from every memory access to be sent back to the CPU to determine the next pointer to access. Our goal is to accelerate pointer chasing by performing it inside main memory, thereby avoiding inefficient and high-latency data transfers between main memory and the CPU. To this end, we propose the In-Memory PoInter Chasing Accelerator (IMPICA), which leverages the logic layer within 3D-stacked memory for linked data structure traversal.
This paper identifies the key design challenges of designing a pointer chasing accelerator in memory, describes new mechanisms employed within IMPICA to solve these challenges, and evaluates the performance and energy benefits of our accelerator. IMPICA addresses the key challenges of (1) how to achieve high parallelism in the presence of serial accesses in pointer chasing, and (2) how to effectively perform virtual-to-physical address translation on the memory side without requiring expensive accesses to the CPU’s memory management unit. We show that the solutions to these challenges, address-access decoupling and a region-based page table, respectively, are simple and low-cost. We believe these solutions are also applicable to many other in-memory accelerators, which are likely to also face the two challenges.
Our evaluations on a quad-core system show that IMPICA improves the performance of pointer chasing operations in three commonly-used linked data structures (linked lists, hash tables, and B-trees) by 92%, 29%, and 18%, respectively. This leads to a significant performance improvement in applications that utilize linked data structures - on a real database application, DBx1000, IMPICA improves transaction throughput and response time by 16% and 13%, respectively. IMPICA also significantly reduces overall system energy consumption (by 41%, 23%, and 10% for the three commonly-used data structures, and by 6% for DBx1000).