Unsupervised concept categorization and extraction from scientific document titles

Adit Krishnan, Aravind Sankar, Shi Zhi, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper studies the automated categorization and extraction of scientific concepts from titles of scientific articles, in order to gain a deeper understanding of their key contributions and facilitate the construction of a generic academic knowledgebase. Towards this goal, we propose an unsupervised, domain-independent, and scalable two-phase algorithm to type and extract key concept mentions into aspects of interest (e.g., Techniques, Applications, etc.). In the first phase of our algorithm we propose PhraseType, a probabilistic generative model which exploits textual features and limited POS tags to broadly segment text snippets into aspect-typed phrases. We extend this model to simultaneously learn aspect-specific features and identify academic domains in multi-domain corpora, since the two tasks mutually enhance each other. In the second phase, we propose an approach based on adaptor grammars to extract fine grained concept mentions from the aspect-typed phrases without the need for any external resources or human e.ort, in a purely data-driven manner. We apply our technique to study literature from diverse scientific domains and show significant gains over state-of-the-art concept extraction techniques. We also present a qualitative analysis of the results obtained.

Original languageEnglish (US)
Title of host publicationCIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages1339-1348
Number of pages10
ISBN (Electronic)9781450349185
DOIs
StatePublished - Nov 6 2017
Event26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore
Duration: Nov 6 2017Nov 10 2017

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
VolumePart F131841

Other

Other26th ACM International Conference on Information and Knowledge Management, CIKM 2017
Country/TerritorySingapore
CitySingapore
Period11/6/1711/10/17

Keywords

  • Adaptor grammar
  • Concept extraction
  • Probabilistic model

ASJC Scopus subject areas

  • Decision Sciences(all)
  • Business, Management and Accounting(all)

Fingerprint

Dive into the research topics of 'Unsupervised concept categorization and extraction from scientific document titles'. Together they form a unique fingerprint.

Cite this