TY - GEN
T1 - Automatic navbox generation by interpretable clustering over linked entities
AU - Xie, Chenhao
AU - Chen, Lihan
AU - Liang, Jiaqing
AU - Zhang, Kezun
AU - Xiao, Yanghua
AU - Tong, Hanghang
AU - Wang, Haixun
AU - Wang, Wei
N1 - Publisher Copyright:
© 2017 Copyright held by the owner/author(s). Publication rights licensed to Association for Computing Machinery.
PY - 2017/11/6
Y1 - 2017/11/6
N2 - Rare efforts have been devoted to generating the structured Navigation Box (Navbox) for Wikipedia articles. A Navbox is a table in Wikipedia article page that provides a consistent navigation system for related entities. Navbox is critical for the readership and editing efficiency of Wikipedia. In this paper, we target on the automatic generation of Navbox for Wikipedia articles. Instead of performing information extraction over unstructured natural language text directly, an alternative avenue is explored by focusing on a rich set of semi-structured data in Wikipedia articles: linked entities. The core idea of this paper is as follows: If we cluster the linked entities and interpret them appropriately, we can construct a high-quality Navbox for the article entity. We propose a clustering-then-labeling algorithm to realize the idea. Experiments show that the proposed solutions are effective. Ultimately, our approach enriches Wikipedia with 1.95 million new Navboxes of high quality.
AB - Rare efforts have been devoted to generating the structured Navigation Box (Navbox) for Wikipedia articles. A Navbox is a table in Wikipedia article page that provides a consistent navigation system for related entities. Navbox is critical for the readership and editing efficiency of Wikipedia. In this paper, we target on the automatic generation of Navbox for Wikipedia articles. Instead of performing information extraction over unstructured natural language text directly, an alternative avenue is explored by focusing on a rich set of semi-structured data in Wikipedia articles: linked entities. The core idea of this paper is as follows: If we cluster the linked entities and interpret them appropriately, we can construct a high-quality Navbox for the article entity. We propose a clustering-then-labeling algorithm to realize the idea. Experiments show that the proposed solutions are effective. Ultimately, our approach enriches Wikipedia with 1.95 million new Navboxes of high quality.
KW - Clustering-thenlabeling
KW - Interpretable clustering
KW - Knowledge extraction
KW - Navbox generation
UR - http://www.scopus.com/inward/record.url?scp=85037371535&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85037371535&partnerID=8YFLogxK
U2 - 10.1145/3132847.3132899
DO - 10.1145/3132847.3132899
M3 - Conference contribution
AN - SCOPUS:85037371535
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1857
EP - 1865
BT - CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 26th ACM International Conference on Information and Knowledge Management, CIKM 2017
Y2 - 6 November 2017 through 10 November 2017
ER -