TY - GEN
T1 - Searching off-line arabic documents
AU - Chan, Jim
AU - Ziftci, Celai
AU - Forsyth, David
PY - 2006
Y1 - 2006
N2 - Currently an abundance of historical manuscripts, journals, and scientific notes remain largely unaccessible in library archives. Manual transcription and publication of such documents is unlikely, and automatic transcription with high enough accuracy to support a traditional text search is difficult. In this work we describe a lexicon-free system for performing text queries on off-line printed and handwritten Arabic documents. Our segmentation-based approach utilizes gHMMs with a bigram letter transition model, and KPCA/LDA for teller discrimination. The segmentation stage is integrated with inference. We show that our method is robust to varying letter forms, ligatures, and overlaps. Additionally, we find that ignoring letters beyond the adjoining neighbors has little effect on inference and localization, which leads to a significant performance increase over standard dynamic programming. Finally, we discuss an extension to perform batch searches of large word lists for indexing purposes.
AB - Currently an abundance of historical manuscripts, journals, and scientific notes remain largely unaccessible in library archives. Manual transcription and publication of such documents is unlikely, and automatic transcription with high enough accuracy to support a traditional text search is difficult. In this work we describe a lexicon-free system for performing text queries on off-line printed and handwritten Arabic documents. Our segmentation-based approach utilizes gHMMs with a bigram letter transition model, and KPCA/LDA for teller discrimination. The segmentation stage is integrated with inference. We show that our method is robust to varying letter forms, ligatures, and overlaps. Additionally, we find that ignoring letters beyond the adjoining neighbors has little effect on inference and localization, which leads to a significant performance increase over standard dynamic programming. Finally, we discuss an extension to perform batch searches of large word lists for indexing purposes.
UR - http://www.scopus.com/inward/record.url?scp=33845581507&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33845581507&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2006.269
DO - 10.1109/CVPR.2006.269
M3 - Conference contribution
AN - SCOPUS:33845581507
SN - 0769525970
SN - 9780769525976
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 1455
EP - 1462
BT - Proceedings - 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006
T2 - 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006
Y2 - 17 June 2006 through 22 June 2006
ER -