TY - GEN
T1 - NetTaxo
T2 - 29th International World Wide Web Conference, WWW 2020
AU - Shang, Jingbo
AU - Zhang, Xinyang
AU - Liu, Liyuan
AU - Li, Sha
AU - Han, Jiawei
N1 - Funding Information:
This work was sponsored in part by DARPA under Agreements No. W911NF-17-C-0099 and FA8750-19-2-1004, National Science Foundation IIS 16-18481, IIS 17-04532, and IIS-17-41317, and DTRA HDTRA11810026. Any opinions, findings, and conclusions or recommendations expressed in this document are those of the author(s) and should not be interpreted as the views of the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes not withstanding any copyright notation hereon.
Publisher Copyright:
© 2020 ACM.
PY - 2020/4/20
Y1 - 2020/4/20
N2 - The automated construction of topic taxonomies can benefit numerous applications, including web search, recommendation, and knowledge discovery. One of the major advantages of automatic taxonomy construction is the ability to capture corpus-specific information and adapt to different scenarios. To better reflect the characteristics of a corpus, we take the meta-data of documents into consideration and view the corpus as a text-rich network. In this paper, we propose NetTaxo, a novel automatic topic taxonomy construction framework, which goes beyond the existing paradigm and allows text data to collaborate with network structure. Specifically, we learn term embeddings from both text and network as contexts. Network motifs are adopted to capture appropriate network contexts. We conduct an instance-level selection for motifs, which further refines term embedding according to the granularity and semantics of each taxonomy node. Clustering is then applied to obtain sub-topics under a taxonomy node. Extensive experiments on two real-world datasets demonstrate the superiority of our method over the state-of-the-art, and further verify the effectiveness and importance of instance-level motif selection.
AB - The automated construction of topic taxonomies can benefit numerous applications, including web search, recommendation, and knowledge discovery. One of the major advantages of automatic taxonomy construction is the ability to capture corpus-specific information and adapt to different scenarios. To better reflect the characteristics of a corpus, we take the meta-data of documents into consideration and view the corpus as a text-rich network. In this paper, we propose NetTaxo, a novel automatic topic taxonomy construction framework, which goes beyond the existing paradigm and allows text data to collaborate with network structure. Specifically, we learn term embeddings from both text and network as contexts. Network motifs are adopted to capture appropriate network contexts. We conduct an instance-level selection for motifs, which further refines term embedding according to the granularity and semantics of each taxonomy node. Clustering is then applied to obtain sub-topics under a taxonomy node. Extensive experiments on two real-world datasets demonstrate the superiority of our method over the state-of-the-art, and further verify the effectiveness and importance of instance-level motif selection.
UR - http://www.scopus.com/inward/record.url?scp=85086584453&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85086584453&partnerID=8YFLogxK
U2 - 10.1145/3366423.3380259
DO - 10.1145/3366423.3380259
M3 - Conference contribution
AN - SCOPUS:85086584453
T3 - The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
SP - 1908
EP - 1919
BT - The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
PB - Association for Computing Machinery, Inc
Y2 - 20 April 2020 through 24 April 2020
ER -