Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology

  • Authors:
    Vivek Seshadri (Microsoft), Donghyuk Lee (Nvidia Corp.), Thomas Mullins (Intel), Hasan Hassan (TOBB ETĂś), Amirali Boroumand (Carnegie Mellon Univ.), Jeremie Kim (Carnegie Mellon Univ.), Michael A. Kozuch (Intel), Onur Mutlu (ETHZ), Phillip Gibbons (Carnegie Mellon Univ.), Todd Mowry (Carnegie Mellon Univ.)
    Publication ID:
    P091010
    Publication Type:
    Paper
    Received Date:
    30-May-2017
    Last Edit Date:
    14-Jul-2017
    Research:
    2719.001 (Swiss Federal Institute of Technology in Zurich)

Abstract

Many important applications trigger bitwise operations on large bit vectors (bulk bitwise operations). In fact, recent works have developed techniques that exploit fast bitwise operations on large bit vectors to accelerate databases (bitmap indices, BitWeaving) and web search (BitFunnel). Unfortunately, in existing systems, regardless of the underlying architecture (e.g., CPU, GPU, FPGA, processing near memory), the throughput of such bulk bitwise operations is limited by the memory bandwidth available to the processor.

To overcome this bandwidth bottleneck, we propose Ambit, an Accelerator in Memory for bulk Bitwise operations. Unlike prior works, Ambit exploits the analog operation of DRAM technology to perform bitwise operations completely inside DRAM, thereby exploiting the full internal DRAM bandwidth. Ambit consists of two components. First, simultaneous activation of three DRAM rows that share the same set of sense amplifiers enables us to perform bitwise AND and OR operations. Second, with modest changes to the sense amplifier, the inverters present inside the sense amplifier can be used to perform bitwise NOT operations. With these two components, Ambit can perform any bitwise operation efficiently inside DRAM-based memory. Ambit largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area). Importantly, Ambit uses the modern DRAM interface without any changes, and therefore Ambit can be directly plugged into the memory bus.

Our extensive SPICE simulations show that Ambit works as expected even with significant process variation. Across seven bitwise operations, Ambit improves performance by 32X and reduces energy consumption by 35X compared to state-of-the-art systems. When integrated with Hybrid Memory Cube (HMC), Ambit can improve performance of bulk bitwise operations by 9.7X compared to processing in the logic layer of the HMC. We evaluate three real-world data-intensive applications: 1) a database bitmap index, 2) BitWeaving, a technique to accelerate database scans, and 3) a bit-vector-based implementation of sets. Ambit improves performance of these applications by 3X-7X compared to the state-of-the-art baseline using SIMD optimizations. We also describe several other applications that can benefit from Ambit, including a recent technique proposed to speed up web search. We believe that large performance and energy improvements provided by Ambit can enable other applications to use bulk bitwise operations.

4819 Emperor Blvd, Suite 300 Durham, NC 27703 Voice: (919) 941-9400 Fax: (919) 941-9450