AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types

Xin Luna Dong, Xiang He, Andrey Kan, Xian Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, Saurabh Deshpande, Alexandre Michetti Manduca, Jay Ren, Surender Pal Singh, Fan Xiao, Haw Shiuan Chang, Giannis Karamanolakis, Yuning Mao, Yaqing Wang, Christos FaloutsosAndrew McCallum, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs have firmly established themselves as valuable sources of information for search and question answering, and it is natural to wonder if a KG can contain information about products offered at online retail sites. There have been several successful examples of generic KGs, but organizing information about products poses many additional challenges, including sparsity and noise of structured data for products, complexity of the domain with millions of product types and thousands of attributes, heterogeneity across large number of categories, as well as large and constantly growing number of products. We describe AutoKnow, our automatic (self-driving) system that addresses these challenges. The system includes a suite of novel techniques for taxonomy construction, product property identification, knowledge extraction, anomaly detection, and synonym discovery. AutoKnow is (a) automatic, requiring little human intervention, (b) multi-scalable, scalable in multiple dimensions (many domains, many products, and many attributes), and (c) integrative, exploiting rich customer behavior logs. AutoKnow has been operational in collecting product knowledge for over 11K product types.

Original languageEnglish (US)
Title of host publicationKDD 2020 - Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages2724-2734
Number of pages11
ISBN (Electronic)9781450379984
DOIs
StatePublished - Aug 23 2020
Event26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020 - Virtual, Online, United States
Duration: Aug 23 2020Aug 27 2020

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020
Country/TerritoryUnited States
CityVirtual, Online
Period8/23/208/27/20

Keywords

  • attribute importance
  • data cleaning
  • data imputation
  • knowledge graphs
  • synonym finding
  • taxonomy enrichment

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types'. Together they form a unique fingerprint.

Cite this