TY - GEN
T1 - OA-Mine
T2 - 31st ACM World Wide Web Conference, WWW 2022
AU - Zhang, Xinyang
AU - Zhang, Chenwei
AU - Li, Xian
AU - Dong, Xin Luna
AU - Shang, Jingbo
AU - Faloutsos, Christos
AU - Han, Jiawei
N1 - Funding Information:
Research was supported in part by US DARPA KAIROS Program No. FA8750-19-2-1004, SocialSim Program No. W911NF-17-C-0099, and INCAS Program No. HR001121C0165, National Science Foundation IIS-19-56151, IIS-17-41317, and IIS 17-04532, and the Molecule Maker Lab Institute: An AI Research Institutes program supported by NSF under Award No. 2019897. Any opinions, findings, and conclusions or recommendations expressed herein are those of the authors and do not necessarily represent the views, either expressed or implied, of DARPA or the U.S. Government. Yu Meng is supported by the Google PhD Fellowship. We thank anonymous reviewers for valuable and insightful feedback.
Publisher Copyright:
© 2022 ACM.
PY - 2022/4/25
Y1 - 2022/4/25
N2 - Automatic extraction of product attributes from their textual descriptions is essential for online shopper experience. One inherent challenge of this task is the emerging nature of e-commerce products - we see new types of products with their unique set of new attributes constantly. Most prior works on this matter mine new values for a set of known attributes but cannot handle new attributes that arose from constantly changing data. In this work, we study the attribute mining problem in an open-world setting to extract novel attributes and their values. Instead of providing comprehensive training data, the user only needs to provide a few examples for a few known attribute types as weak supervision. We propose a principled framework that first generates attribute value candidates and then groups them into clusters of attributes. The candidate generation step probes a pre-trained language model to extract phrases from product titles. Then, an attribute-aware fine-tuning method optimizes a multitask objective and shapes the language model representation to be attribute-discriminative. Finally, we discover new attributes and values through the self-ensemble of our framework, which handles the open-world challenge. We run extensive experiments on a large distantly annotated development set and a gold standard human-annotated test set that we collected. Our model significantly outperforms strong baselines and can generalize to unseen attributes and product types.
AB - Automatic extraction of product attributes from their textual descriptions is essential for online shopper experience. One inherent challenge of this task is the emerging nature of e-commerce products - we see new types of products with their unique set of new attributes constantly. Most prior works on this matter mine new values for a set of known attributes but cannot handle new attributes that arose from constantly changing data. In this work, we study the attribute mining problem in an open-world setting to extract novel attributes and their values. Instead of providing comprehensive training data, the user only needs to provide a few examples for a few known attribute types as weak supervision. We propose a principled framework that first generates attribute value candidates and then groups them into clusters of attributes. The candidate generation step probes a pre-trained language model to extract phrases from product titles. Then, an attribute-aware fine-tuning method optimizes a multitask objective and shapes the language model representation to be attribute-discriminative. Finally, we discover new attributes and values through the self-ensemble of our framework, which handles the open-world challenge. We run extensive experiments on a large distantly annotated development set and a gold standard human-annotated test set that we collected. Our model significantly outperforms strong baselines and can generalize to unseen attributes and product types.
KW - Open-world product attribute mining
KW - weak supervision
UR - http://www.scopus.com/inward/record.url?scp=85129789740&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129789740&partnerID=8YFLogxK
U2 - 10.1145/3485447.3512035
DO - 10.1145/3485447.3512035
M3 - Conference contribution
AN - SCOPUS:85129789740
T3 - WWW 2022 - Proceedings of the ACM Web Conference 2022
SP - 3153
EP - 3161
BT - WWW 2022 - Proceedings of the ACM Web Conference 2022
PB - Association for Computing Machinery, Inc
Y2 - 25 April 2022 through 29 April 2022
ER -