Multimodal pattern matching for audio-visual query and retrieval

M. R. Naphade, R. Wang, T. S. Huang

Research output: Contribution to journalConference articlepeer-review


A necessary capability for content-based retrieval is to support the paradigm of query by example. In the past, there have been several attempts to use low-level features for video retrieval. None of the approaches however uses the multimedia information content of the video. We present an algorithm for matching multimodal (audio-visual) patterns for the purpose of content-based video retrieval. The novel ability of our approach to use the information content in multiple media coupled with a strong emphasis on temporal similarity differentiates it from the state-of-the-art in content-based retrieval. At the core of the pattern matching scheme is a dynamic programming algorithm, which leads to a significant improvement in performance. Coupling the use of audio with video this algorithm can be applied to grouping of shots based on audio-visual similarity. This is much more effective in constructing scenes from shots than using only visual content to do the same.

Original languageEnglish (US)
Pages (from-to)188-195
Number of pages8
JournalProceedings of SPIE - The International Society for Optical Engineering
StatePublished - Jan 1 2001
EventStorage and Retrieval for Media Databases 2001 - San Jose,CA, United States
Duration: Jan 24 2001Jan 26 2001


  • Dynamic Programming
  • Local optimality
  • Pattern Matching
  • Video query by example

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering


Dive into the research topics of 'Multimodal pattern matching for audio-visual query and retrieval'. Together they form a unique fingerprint.

Cite this