Text from corners: A novel approach to detect text and caption in videos

Xu Zhao, Kai Hsiang Lin, Yun Fu, Yuxiao Hu, Yuncai Liu, Thomas S. Huang

Research output: Contribution to journalArticlepeer-review

Abstract

Detecting text and caption from videos is important and in great demand for video retrieval, annotation, indexing, and content analysis. In this paper, we present a corner based approach to detect text and caption from videos. This approach is inspired by the observation that there exist dense and orderly presences of corner points in characters, especially in text and caption. We use several discriminative features to describe the text regions formed by the corner points. The usage of these features is in a flexible manner, thus, can be adapted to different applications. Language independence is an important advantage of the proposed method. Moreover, based upon the text features, we further develop a novel algorithm to detect moving captions in videos. In the algorithm, the motion features, extracted by optical flow, are combined with text features to detect the moving caption patterns. The decision tree is adopted to learn the classification criteria. Experiments conducted on a large volume of real video shots demonstrate the efficiency and robustness of our proposed approaches and the real-world system. Our text and caption detection system was recently highlighted in a worldwide multimedia retrieval competition, Star Challenge, by achieving the superior performance with the top ranking.

Original languageEnglish (US)
Article number5551198
Pages (from-to)790-799
Number of pages10
JournalIEEE Transactions on Image Processing
Volume20
Issue number3
DOIs
StatePublished - Mar 1 2011

Keywords

  • Caption detection
  • Harris corner detector
  • moving caption
  • optical flow
  • text detection
  • video retrieval

ASJC Scopus subject areas

  • Software
  • Computer Graphics and Computer-Aided Design

Fingerprint Dive into the research topics of 'Text from corners: A novel approach to detect text and caption in videos'. Together they form a unique fingerprint.

Cite this