OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision

Xinyang Zhang, Chenwei Zhang, Xian Li, Xin Luna Dong, Jingbo Shang, Christos Faloutsos, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automatic extraction of product attributes from their textual descriptions is essential for online shopper experience. One inherent challenge of this task is the emerging nature of e-commerce products - we see new types of products with their unique set of new attributes constantly. Most prior works on this matter mine new values for a set of known attributes but cannot handle new attributes that arose from constantly changing data. In this work, we study the attribute mining problem in an open-world setting to extract novel attributes and their values. Instead of providing comprehensive training data, the user only needs to provide a few examples for a few known attribute types as weak supervision. We propose a principled framework that first generates attribute value candidates and then groups them into clusters of attributes. The candidate generation step probes a pre-trained language model to extract phrases from product titles. Then, an attribute-aware fine-tuning method optimizes a multitask objective and shapes the language model representation to be attribute-discriminative. Finally, we discover new attributes and values through the self-ensemble of our framework, which handles the open-world challenge. We run extensive experiments on a large distantly annotated development set and a gold standard human-annotated test set that we collected. Our model significantly outperforms strong baselines and can generalize to unseen attributes and product types.

Original languageEnglish (US)
Title of host publicationWWW 2022 - Proceedings of the ACM Web Conference 2022
PublisherAssociation for Computing Machinery
Pages3153-3161
Number of pages9
ISBN (Electronic)9781450390965
DOIs
StatePublished - Apr 25 2022
Event31st ACM World Wide Web Conference, WWW 2022 - Virtual, Online, France
Duration: Apr 25 2022Apr 29 2022

Publication series

NameWWW 2022 - Proceedings of the ACM Web Conference 2022

Conference

Conference31st ACM World Wide Web Conference, WWW 2022
Country/TerritoryFrance
CityVirtual, Online
Period4/25/224/29/22

Keywords

  • Open-world product attribute mining
  • weak supervision

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision'. Together they form a unique fingerprint.

Cite this