Automatic navbox generation by interpretable clustering over linked entities

Chenhao Xie, Lihan Chen, Jiaqing Liang, Kezun Zhang, Yanghua Xiao, Hanghang Tong, Haixun Wang, Wei Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Rare efforts have been devoted to generating the structured Navigation Box (Navbox) for Wikipedia articles. A Navbox is a table in Wikipedia article page that provides a consistent navigation system for related entities. Navbox is critical for the readership and editing efficiency of Wikipedia. In this paper, we target on the automatic generation of Navbox for Wikipedia articles. Instead of performing information extraction over unstructured natural language text directly, an alternative avenue is explored by focusing on a rich set of semi-structured data in Wikipedia articles: linked entities. The core idea of this paper is as follows: If we cluster the linked entities and interpret them appropriately, we can construct a high-quality Navbox for the article entity. We propose a clustering-then-labeling algorithm to realize the idea. Experiments show that the proposed solutions are effective. Ultimately, our approach enriches Wikipedia with 1.95 million new Navboxes of high quality.

Original languageEnglish (US)
Title of host publicationCIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages1857-1865
Number of pages9
ISBN (Electronic)9781450349185
DOIs
StatePublished - Nov 6 2017
Externally publishedYes
Event26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore
Duration: Nov 6 2017Nov 10 2017

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
VolumePart F131841

Other

Other26th ACM International Conference on Information and Knowledge Management, CIKM 2017
Country/TerritorySingapore
CitySingapore
Period11/6/1711/10/17

Keywords

  • Clustering-thenlabeling
  • Interpretable clustering
  • Knowledge extraction
  • Navbox generation

ASJC Scopus subject areas

  • General Decision Sciences
  • General Business, Management and Accounting

Fingerprint

Dive into the research topics of 'Automatic navbox generation by interpretable clustering over linked entities'. Together they form a unique fingerprint.

Cite this