Leveraging pattern semantics for extracting entities in enterprises

Fangbo Tao, Bo Zhao, Ariel Fuxman, Yang Li, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Entity Extraction is a process of identifying meaningful en-tities from text documents. In enterprises, extracting enti-ties improves enterprise effciency by facilitating numerous applications, including search, recommendation, etc. How-ever, the problem is particularly challenging on enterprise domains due to several reasons. First, the lack of redun-dancy of enterprise entities makes previous web-based sys-tems like NELL and OpenIE not effective, since using only high-precision/low-recall patterns like those systems would miss the majority of sparse enterprise entities, while using more low-precision patterns in sparse setting also introduces noise drastically. Second, semantic drift is common in enter-prises (\Blue" refers to \Windows Blue"), such that public signals from the web cannot be directly applied on entities. Moreover, many internal entities never appear on the web. Sparse internal signals are the only source for discovering them. To address these challenges, we propose an end-To-end framework for extracting entities in enterprises, taking the input of enterprise corpus and limited seeds to generate a high-quality entity collection as output. We introduce the novel concept of Semantic Pattern Graph to leverage pub-lic signals to understand the underlying semantics of lexical patterns, reinforce pattern evaluation using mined seman-tics, and yield more accurate and complete entities. Experi-ments on Microsoft enterprise data show the effectiveness of our approach.

Original languageEnglish (US)
Title of host publicationWWW 2015 - Proceedings of the 24th International Conference on World Wide Web
PublisherAssociation for Computing Machinery, Inc
Pages1078-1088
Number of pages11
ISBN (Electronic)9781450334693
DOIs
StatePublished - May 18 2015
Event24th International Conference on World Wide Web, WWW 2015 - Florence, Italy
Duration: May 18 2015May 22 2015

Publication series

NameWWW 2015 - Proceedings of the 24th International Conference on World Wide Web

Other

Other24th International Conference on World Wide Web, WWW 2015
Country/TerritoryItaly
CityFlorence
Period5/18/155/22/15

Keywords

  • En-Terprise Taxonomy
  • Enterprise Entity Extraction
  • Semantic Pattern Graph

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Leveraging pattern semantics for extracting entities in enterprises'. Together they form a unique fingerprint.

Cite this