Real-Time and Low-Power Streaming Source Separation using Markov Random Field
There has been many successful deployment of machine learning algorithms for enterprise application on clouds and clusters. However, for machine learning algorithms on mobile perceptual applications, such solutions are infeasible, as most approaches often require access to large datasets and long off-line training. As a step towards solving this problem of implementing a usable machine learning application on a mobile form factor, we explore sound source separation to isolate human voice from background noise on a mobile phone. The challenges involved are real-time streaming execution and power constraints. As a solution, we present a novel hardware-base sound source separation capable of real-time streaming performance while consuming low power. The implementation uses Markov Random Field (MRF) formulation of Blind Source Separation (BSS) with two microphones. It uses Expectation-Maximization (EM) to learn hidden MRF parameters on the fly and also performs Maximum A Posterior (MAP) inference using Gibbs sampling to find the best separation of sources. We demonstrate a real-time streaming FPGA implementation running at 150 MHz with 207 KB RAM. It achieves a speed-up of 22X over a conventional software reference, performs with an SDR of up to 7.021 dB with 1.601 ms latency, and exhibits excellent perceived audio quality. A virtual ASIC design study shows that this architecture is small with less than 10M gates, consumes only 40.034 mW (which is only 10% of power on ARM Cortex-A9) running at 150 MHz.