TY - JOUR
T1 - An Audio-Visual System for Object-Based Audio
T2 - From Recording to Listening
AU - Coleman, Philip
AU - Franck, Andreas
AU - Francombe, Jon
AU - Liu, Qingju
AU - De Campos, Teofilo
AU - Hughes, Richard J.
AU - Menzies, Dylan
AU - Galvez, Marcos F.Simon
AU - Tang, Yan
AU - Woodcock, James
AU - Jackson, Philip J.B.
AU - Melchior, Frank
AU - Pike, Chris
AU - Fazi, Filippo Maria
AU - Cox, Trevor J.
AU - Hilton, Adrian
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2018/8
Y1 - 2018/8
N2 - Object-based audio is an emerging representation for audio content, where content is represented in a reproduction-format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This paper introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audio-visual interfaces to support object-based capture and listener-tracked rendering, and incorporates a proposed component for objectification, that is, recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system's capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group) is evaluated with perceptually motivated objective and subjective experiments. These experiments demonstrate that the novel components of the system add capabilities beyond the state of the art. Finally, we discuss challenges and future perspectives for object-based audio workflows.
AB - Object-based audio is an emerging representation for audio content, where content is represented in a reproduction-format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This paper introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audio-visual interfaces to support object-based capture and listener-tracked rendering, and incorporates a proposed component for objectification, that is, recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system's capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group) is evaluated with perceptually motivated objective and subjective experiments. These experiments demonstrate that the novel components of the system add capabilities beyond the state of the art. Finally, we discuss challenges and future perspectives for object-based audio workflows.
KW - Audio systems
KW - audio-visual systems
UR - http://www.scopus.com/inward/record.url?scp=85041684051&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041684051&partnerID=8YFLogxK
U2 - 10.1109/TMM.2018.2794780
DO - 10.1109/TMM.2018.2794780
M3 - Article
AN - SCOPUS:85041684051
SN - 1520-9210
VL - 20
SP - 1919
EP - 1931
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
IS - 8
ER -