TY - GEN
T1 - Beyond the Euclidean distance
T2 - 12th International Conference on Computer Vision, ICCV 2009
AU - Wu, Jianxin
AU - Rehg, James M.
PY - 2009
Y1 - 2009
N2 - Common visual codebook generation methods used in a Bag of Visual words model, e.g. k-means or Gaussian Mixture Model, use the Euclidean distance to cluster features into visual code words. However, most popular visual descriptors are histograms of image measurements. It has been shown that the Histogram Intersection Kernel (HIK) is more effective than the Euclidean distance in supervised learning tasks with histogram features. In this paper, we demonstrate that HIK can also be used in an unsupervised manner to significantly improve the generation of visual codebooks. We propose a histogram kernel k-means algorithm which is easy to implement and runs almost as fast as k-means. The HIK codebook has consistently higher recognition accuracy over k-means codebooks by 2-4%. In addition, we propose a one-class SVM formulation to create more effective visual code words which can achieve even higher accuracy. The proposed method has established new state-of-the-art performance numbers for 3 popular benchmark datasets on object and scene recognition. In addition, we show that the standard k-median clustering method can be used for visual codebook generation and can act as a compromise between HIK and k-means approaches.
AB - Common visual codebook generation methods used in a Bag of Visual words model, e.g. k-means or Gaussian Mixture Model, use the Euclidean distance to cluster features into visual code words. However, most popular visual descriptors are histograms of image measurements. It has been shown that the Histogram Intersection Kernel (HIK) is more effective than the Euclidean distance in supervised learning tasks with histogram features. In this paper, we demonstrate that HIK can also be used in an unsupervised manner to significantly improve the generation of visual codebooks. We propose a histogram kernel k-means algorithm which is easy to implement and runs almost as fast as k-means. The HIK codebook has consistently higher recognition accuracy over k-means codebooks by 2-4%. In addition, we propose a one-class SVM formulation to create more effective visual code words which can achieve even higher accuracy. The proposed method has established new state-of-the-art performance numbers for 3 popular benchmark datasets on object and scene recognition. In addition, we show that the standard k-median clustering method can be used for visual codebook generation and can act as a compromise between HIK and k-means approaches.
UR - http://www.scopus.com/inward/record.url?scp=77953226691&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77953226691&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2009.5459178
DO - 10.1109/ICCV.2009.5459178
M3 - Conference contribution
AN - SCOPUS:77953226691
SN - 9781424444205
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 630
EP - 637
BT - 2009 IEEE 12th International Conference on Computer Vision, ICCV 2009
Y2 - 29 September 2009 through 2 October 2009
ER -