TY - GEN
T1 - Audio-visual speaker localization using graphical models
AU - Kushal, Akash
AU - Rahurkar, Mandar
AU - Li, Fei Fei
AU - Ponce, Jean
AU - Huang, Thomas
PY - 2006
Y1 - 2006
N2 - In this work we propose an approach to combine audio and video modalities for person tracking using graphical models. We demonstrate a principled and intuitive frame-work for combining these modalities to obtain robustness against occlusion and change in appearance. We further exploit the temporal correlations that exist for a moving object between adjacent frames to account for the cases where having both modalities might still not be enough, e.g., when the person being tracked is occluded and not speaking. Improvement in tracking results is shown at each step and compared with manually annotated ground truth.
AB - In this work we propose an approach to combine audio and video modalities for person tracking using graphical models. We demonstrate a principled and intuitive frame-work for combining these modalities to obtain robustness against occlusion and change in appearance. We further exploit the temporal correlations that exist for a moving object between adjacent frames to account for the cases where having both modalities might still not be enough, e.g., when the person being tracked is occluded and not speaking. Improvement in tracking results is shown at each step and compared with manually annotated ground truth.
UR - https://www.scopus.com/pages/publications/34047229439
UR - https://www.scopus.com/pages/publications/34047229439#tab=citedBy
U2 - 10.1109/ICPR.2006.284
DO - 10.1109/ICPR.2006.284
M3 - Conference contribution
AN - SCOPUS:34047229439
SN - 0769525210
SN - 9780769525211
T3 - Proceedings - International Conference on Pattern Recognition
SP - 291
EP - 294
BT - Proceedings - 18th International Conference on Pattern Recognition, ICPR 2006
T2 - 18th International Conference on Pattern Recognition, ICPR 2006
Y2 - 20 August 2006 through 24 August 2006
ER -