Audio Keyword Reconstruction from On-Device Motion Sensor Signals via Neural Frequency Unfolding

Tianshi Wang, Shuochao Yao, Shengzhong Liu, Jinyang Li, Dongxin Liu, Huajie Shao, Ruijie Wang, Tarek Abdelzaher

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we present a novel deep neural network architecture that reconstructs the high-frequency audio of selected spoken human words from low-sampling-rate signals of (ego-)motion sensors, such as accelerometer and gyroscope data, recorded on everyday mobile devices. As the sampling rate of such motion sensors is much lower than the Nyquist rate of ordinary human voice (around 6kHz+), these motion sensor recordings suffer from a significant frequency aliasing effect. In order to recover the original high-frequency audio signal, our neural network introduces a novel layer, called the alias unfolding layer, specialized in expanding the bandwidth of an aliased signal by reversing the frequency folding process in the time-frequency domain. While perfect unfolding is known to be unrealizable, we leverage the sparsity of the original signal to arrive at a sufficiently accurate statistical approximation. Comprehensive experiments show that our neural network significantly outperforms the state of the art in audio reconstruction from motion sensor data, effectively reconstructing a pre-trained set of spoken keywords from low-frequency motion sensor signals (with a sampling rate of 100-400 Hz). The approach demonstrates the potential risk of information leakage from motion sensors in smart mobile devices.

Original languageEnglish (US)
Article number3478102
JournalProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Volume5
Issue number3
DOIs
StatePublished - Sep 2021

Keywords

  • Deep learning
  • Motion sensors
  • Time frequency analysis

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Audio Keyword Reconstruction from On-Device Motion Sensor Signals via Neural Frequency Unfolding'. Together they form a unique fingerprint.

Cite this