Automatic tagging with existing and novel tags

Junhui Wang, Xiaotong Shen, Yiwen Sun, Annie Qu

Research output: Contribution to journalArticle

Abstract

Automatic tagging by key words and phrases is important in multi-label classification of a document. In this paper, we first introduce a tagging loss to measure the discrepancy between predicted and actual tag sets, which is expressed in terms of a sum of weighted pairwise margins between two tags by their degree of similarity. We then construct a regularized empirical loss to incorporate linguistic knowledge, and identify a tagger maximizing the separations between the pairwise margins. One salient feature of the proposed method is its capability to identify novel tags absent from a training sample by using their similarity to existing tags. Computationally, the proposed method is implemented by an alternating direction method of multipliers, integrated with a difference convex algorithm. This permits scalable computation. We show that the method achieves accurate tagging, and that it compares favourably with existing methods. Finally, we apply the proposed method to tagging a Reuters news dataset.

Original languageEnglish (US)
Pages (from-to)273-290
Number of pages18
JournalBiometrika
Volume104
Issue number2
DOIs
StatePublished - Jun 1 2017

Keywords

  • Alternating direction method of multipliers
  • Large margin
  • Multi-label classification
  • Scalability
  • Social bookmarking system
  • Text mining

ASJC Scopus subject areas

  • Statistics and Probability
  • Mathematics(all)
  • Agricultural and Biological Sciences (miscellaneous)
  • Agricultural and Biological Sciences(all)
  • Statistics, Probability and Uncertainty
  • Applied Mathematics

Fingerprint Dive into the research topics of 'Automatic tagging with existing and novel tags'. Together they form a unique fingerprint.

  • Cite this

    Wang, J., Shen, X., Sun, Y., & Qu, A. (2017). Automatic tagging with existing and novel tags. Biometrika, 104(2), 273-290. https://doi.org/10.1093/biomet/asx016