Abstract
Person's identity is a very important high level information for video analysis and retrieval. Along the growth of multimedia data, the recording is not only multimodality and also multichannel(microphone array, camera array). In this paper, we describe a multimodal person identification system of UIUC team for CLEAR 2007 evaluation. The audio only system is based on a new proposed model - Chain of Gaussian Mixtures. The visual only system is a face recognition module based on nearest neighbor classifier at appearance space. Final system fuses 7 channel microphone recordings and 4 camera recordings at decision level. The experimental results indicate the effectiviness of speaker modeling methods and the fusion scheme.
Original language | English (US) |
---|---|
Pages (from-to) | 248-255 |
Number of pages | 8 |
Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 4625 LNCS |
DOIs | |
State | Published - 2008 |
Event | 2nd Annual Classifcation of Events Activities and Relationships, CLEAR 2007 and Rich Transcription, RT 2007 - Baltimore, MD, United States Duration: May 8 2007 → May 11 2007 |
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science