Entity set search of scientific literature

An unsupervised ranking approach

Jiaming Shen, Jinfeng Xiao, Xinwei He, Jingbo Shang, Saurabh Sinha, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Literature search is critical for any scientific research. Different from Web or general domain search, a large portion of queries in scientific literature search are entity-set queries, that is, multiple entities of possibly different types. Entity-set queries reflect user's need for finding documents that contain multiple entities and reveal inter-entity relationships and thus pose non-trivial challenges to existing search algorithms that model each entity separately. However, entity-set queries are usually sparse (i.e., not so repetitive), which makes ineffective many supervised ranking models that rely heavily on associated click history. To address these challenges, we introduce SetRank, an unsupervised ranking framework that models inter-entity relationships and captures entity type information. Furthermore, we develop a novel unsupervised model selection algorithm, based on the technique of weighted rank aggregation, to automatically choose the parameter settings in SetRank without resorting to a labeled validation set. We evaluate our proposed unsupervised approach using datasets from TREC Genomics Tracks and Semantic Scholar's query log. The experiments demonstrate that SetRank significantly outperforms the baseline unsupervised models, especially on entity-set queries, and our model selection algorithm effectively chooses suitable parameter settings.

Original languageEnglish (US)
Title of host publication41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
PublisherAssociation for Computing Machinery, Inc
Pages565-574
Number of pages10
ISBN (Electronic)9781450356572
DOIs
StatePublished - Jun 27 2018
Event41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018 - Ann Arbor, United States
Duration: Jul 8 2018Jul 12 2018

Publication series

Name41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018

Other

Other41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
CountryUnited States
CityAnn Arbor
Period7/8/187/12/18

Fingerprint

Agglomeration
Semantics
Experiments
Genomics

Keywords

  • Entity-set aware search
  • Literature search
  • Unsupervised model selection
  • Unsupervised ranking model

ASJC Scopus subject areas

  • Software
  • Computer Graphics and Computer-Aided Design
  • Information Systems

Cite this

Shen, J., Xiao, J., He, X., Shang, J., Sinha, S., & Han, J. (2018). Entity set search of scientific literature: An unsupervised ranking approach. In 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018 (pp. 565-574). (41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018). Association for Computing Machinery, Inc. https://doi.org/10.1145/3209978.3210055

Entity set search of scientific literature : An unsupervised ranking approach. / Shen, Jiaming; Xiao, Jinfeng; He, Xinwei; Shang, Jingbo; Sinha, Saurabh; Han, Jiawei.

41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018. Association for Computing Machinery, Inc, 2018. p. 565-574 (41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shen, J, Xiao, J, He, X, Shang, J, Sinha, S & Han, J 2018, Entity set search of scientific literature: An unsupervised ranking approach. in 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018. 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018, Association for Computing Machinery, Inc, pp. 565-574, 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018, Ann Arbor, United States, 7/8/18. https://doi.org/10.1145/3209978.3210055
Shen J, Xiao J, He X, Shang J, Sinha S, Han J. Entity set search of scientific literature: An unsupervised ranking approach. In 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018. Association for Computing Machinery, Inc. 2018. p. 565-574. (41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018). https://doi.org/10.1145/3209978.3210055
Shen, Jiaming ; Xiao, Jinfeng ; He, Xinwei ; Shang, Jingbo ; Sinha, Saurabh ; Han, Jiawei. / Entity set search of scientific literature : An unsupervised ranking approach. 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018. Association for Computing Machinery, Inc, 2018. pp. 565-574 (41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018).
@inproceedings{60ae963a15e847d98ad21d0bed690288,
title = "Entity set search of scientific literature: An unsupervised ranking approach",
abstract = "Literature search is critical for any scientific research. Different from Web or general domain search, a large portion of queries in scientific literature search are entity-set queries, that is, multiple entities of possibly different types. Entity-set queries reflect user's need for finding documents that contain multiple entities and reveal inter-entity relationships and thus pose non-trivial challenges to existing search algorithms that model each entity separately. However, entity-set queries are usually sparse (i.e., not so repetitive), which makes ineffective many supervised ranking models that rely heavily on associated click history. To address these challenges, we introduce SetRank, an unsupervised ranking framework that models inter-entity relationships and captures entity type information. Furthermore, we develop a novel unsupervised model selection algorithm, based on the technique of weighted rank aggregation, to automatically choose the parameter settings in SetRank without resorting to a labeled validation set. We evaluate our proposed unsupervised approach using datasets from TREC Genomics Tracks and Semantic Scholar's query log. The experiments demonstrate that SetRank significantly outperforms the baseline unsupervised models, especially on entity-set queries, and our model selection algorithm effectively chooses suitable parameter settings.",
keywords = "Entity-set aware search, Literature search, Unsupervised model selection, Unsupervised ranking model",
author = "Jiaming Shen and Jinfeng Xiao and Xinwei He and Jingbo Shang and Saurabh Sinha and Jiawei Han",
year = "2018",
month = "6",
day = "27",
doi = "10.1145/3209978.3210055",
language = "English (US)",
series = "41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018",
publisher = "Association for Computing Machinery, Inc",
pages = "565--574",
booktitle = "41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018",

}

TY - GEN

T1 - Entity set search of scientific literature

T2 - An unsupervised ranking approach

AU - Shen, Jiaming

AU - Xiao, Jinfeng

AU - He, Xinwei

AU - Shang, Jingbo

AU - Sinha, Saurabh

AU - Han, Jiawei

PY - 2018/6/27

Y1 - 2018/6/27

N2 - Literature search is critical for any scientific research. Different from Web or general domain search, a large portion of queries in scientific literature search are entity-set queries, that is, multiple entities of possibly different types. Entity-set queries reflect user's need for finding documents that contain multiple entities and reveal inter-entity relationships and thus pose non-trivial challenges to existing search algorithms that model each entity separately. However, entity-set queries are usually sparse (i.e., not so repetitive), which makes ineffective many supervised ranking models that rely heavily on associated click history. To address these challenges, we introduce SetRank, an unsupervised ranking framework that models inter-entity relationships and captures entity type information. Furthermore, we develop a novel unsupervised model selection algorithm, based on the technique of weighted rank aggregation, to automatically choose the parameter settings in SetRank without resorting to a labeled validation set. We evaluate our proposed unsupervised approach using datasets from TREC Genomics Tracks and Semantic Scholar's query log. The experiments demonstrate that SetRank significantly outperforms the baseline unsupervised models, especially on entity-set queries, and our model selection algorithm effectively chooses suitable parameter settings.

AB - Literature search is critical for any scientific research. Different from Web or general domain search, a large portion of queries in scientific literature search are entity-set queries, that is, multiple entities of possibly different types. Entity-set queries reflect user's need for finding documents that contain multiple entities and reveal inter-entity relationships and thus pose non-trivial challenges to existing search algorithms that model each entity separately. However, entity-set queries are usually sparse (i.e., not so repetitive), which makes ineffective many supervised ranking models that rely heavily on associated click history. To address these challenges, we introduce SetRank, an unsupervised ranking framework that models inter-entity relationships and captures entity type information. Furthermore, we develop a novel unsupervised model selection algorithm, based on the technique of weighted rank aggregation, to automatically choose the parameter settings in SetRank without resorting to a labeled validation set. We evaluate our proposed unsupervised approach using datasets from TREC Genomics Tracks and Semantic Scholar's query log. The experiments demonstrate that SetRank significantly outperforms the baseline unsupervised models, especially on entity-set queries, and our model selection algorithm effectively chooses suitable parameter settings.

KW - Entity-set aware search

KW - Literature search

KW - Unsupervised model selection

KW - Unsupervised ranking model

UR - http://www.scopus.com/inward/record.url?scp=85051487504&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051487504&partnerID=8YFLogxK

U2 - 10.1145/3209978.3210055

DO - 10.1145/3209978.3210055

M3 - Conference contribution

T3 - 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018

SP - 565

EP - 574

BT - 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018

PB - Association for Computing Machinery, Inc

ER -