Using approximated auditory roughness as a pre-filtering feature for human screaming and affective speech AED

Research output: Contribution to journalConference article

Abstract

Detecting human screaming, shouting, and other verbal manifestations of fear and anger are of great interest to security Audio Event Detection (AED) systems. The Internet of Things (IoT) approach allows wide-covering, powerful AED systems to be distributed across the Internet. But a good feature to prefilter the audio is critical to these systems. This work evaluates the potential of detecting screaming and affective speech using Auditory Roughness and proposes a very light-weight approximation method. Our approximation uses a similar amount of Multiple Add Accumulate (MAA) compared to short-term energy (STE), and at least 10× less MAA than MFCC. We evaluated the performance of our approximated roughness on the Mandarin Affective Speech corpus and a subset of the Youtube AudioSet for screaming against other low-complexity features. We show that our approximated roughness returns higher accuracy.

Original languageEnglish (US)
Pages (from-to)1914-1918
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
StatePublished - Jan 1 2017
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: Aug 20 2017Aug 24 2017

Fingerprint

Event Detection
Roughness
Filtering
Surface roughness
Accumulate
Internet of Things
Approximation Methods
Low Complexity
High Accuracy
Covering
Internet
Subset
Evaluate
Approximation
Energy
Speech
Human
Affective
Hearing
World Wide Web

Keywords

  • Audio event detection
  • Auditory roughness
  • Computational complexity
  • Pre-filtering

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

@article{e16c5d731c0e4685b9b2b7bc4da6c844,
title = "Using approximated auditory roughness as a pre-filtering feature for human screaming and affective speech AED",
abstract = "Detecting human screaming, shouting, and other verbal manifestations of fear and anger are of great interest to security Audio Event Detection (AED) systems. The Internet of Things (IoT) approach allows wide-covering, powerful AED systems to be distributed across the Internet. But a good feature to prefilter the audio is critical to these systems. This work evaluates the potential of detecting screaming and affective speech using Auditory Roughness and proposes a very light-weight approximation method. Our approximation uses a similar amount of Multiple Add Accumulate (MAA) compared to short-term energy (STE), and at least 10× less MAA than MFCC. We evaluated the performance of our approximated roughness on the Mandarin Affective Speech corpus and a subset of the Youtube AudioSet for screaming against other low-complexity features. We show that our approximated roughness returns higher accuracy.",
keywords = "Audio event detection, Auditory roughness, Computational complexity, Pre-filtering",
author = "Di He and Zuofu Cheng and Hasegawa-Johnson, {Mark Allan} and Deming Chen",
year = "2017",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2017-593",
language = "English (US)",
volume = "2017-August",
pages = "1914--1918",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Using approximated auditory roughness as a pre-filtering feature for human screaming and affective speech AED

AU - He, Di

AU - Cheng, Zuofu

AU - Hasegawa-Johnson, Mark Allan

AU - Chen, Deming

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Detecting human screaming, shouting, and other verbal manifestations of fear and anger are of great interest to security Audio Event Detection (AED) systems. The Internet of Things (IoT) approach allows wide-covering, powerful AED systems to be distributed across the Internet. But a good feature to prefilter the audio is critical to these systems. This work evaluates the potential of detecting screaming and affective speech using Auditory Roughness and proposes a very light-weight approximation method. Our approximation uses a similar amount of Multiple Add Accumulate (MAA) compared to short-term energy (STE), and at least 10× less MAA than MFCC. We evaluated the performance of our approximated roughness on the Mandarin Affective Speech corpus and a subset of the Youtube AudioSet for screaming against other low-complexity features. We show that our approximated roughness returns higher accuracy.

AB - Detecting human screaming, shouting, and other verbal manifestations of fear and anger are of great interest to security Audio Event Detection (AED) systems. The Internet of Things (IoT) approach allows wide-covering, powerful AED systems to be distributed across the Internet. But a good feature to prefilter the audio is critical to these systems. This work evaluates the potential of detecting screaming and affective speech using Auditory Roughness and proposes a very light-weight approximation method. Our approximation uses a similar amount of Multiple Add Accumulate (MAA) compared to short-term energy (STE), and at least 10× less MAA than MFCC. We evaluated the performance of our approximated roughness on the Mandarin Affective Speech corpus and a subset of the Youtube AudioSet for screaming against other low-complexity features. We show that our approximated roughness returns higher accuracy.

KW - Audio event detection

KW - Auditory roughness

KW - Computational complexity

KW - Pre-filtering

UR - http://www.scopus.com/inward/record.url?scp=85039162896&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85039162896&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2017-593

DO - 10.21437/Interspeech.2017-593

M3 - Conference article

AN - SCOPUS:85039162896

VL - 2017-August

SP - 1914

EP - 1918

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -