TY - GEN
T1 - Entity set search of scientific literature
T2 - 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
AU - Shen, Jiaming
AU - Xiao, Jinfeng
AU - He, Xinwei
AU - Shang, Jingbo
AU - Sinha, Saurabh
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2018 ACM.
PY - 2018/6/27
Y1 - 2018/6/27
N2 - Literature search is critical for any scientific research. Different from Web or general domain search, a large portion of queries in scientific literature search are entity-set queries, that is, multiple entities of possibly different types. Entity-set queries reflect user's need for finding documents that contain multiple entities and reveal inter-entity relationships and thus pose non-trivial challenges to existing search algorithms that model each entity separately. However, entity-set queries are usually sparse (i.e., not so repetitive), which makes ineffective many supervised ranking models that rely heavily on associated click history. To address these challenges, we introduce SetRank, an unsupervised ranking framework that models inter-entity relationships and captures entity type information. Furthermore, we develop a novel unsupervised model selection algorithm, based on the technique of weighted rank aggregation, to automatically choose the parameter settings in SetRank without resorting to a labeled validation set. We evaluate our proposed unsupervised approach using datasets from TREC Genomics Tracks and Semantic Scholar's query log. The experiments demonstrate that SetRank significantly outperforms the baseline unsupervised models, especially on entity-set queries, and our model selection algorithm effectively chooses suitable parameter settings.
AB - Literature search is critical for any scientific research. Different from Web or general domain search, a large portion of queries in scientific literature search are entity-set queries, that is, multiple entities of possibly different types. Entity-set queries reflect user's need for finding documents that contain multiple entities and reveal inter-entity relationships and thus pose non-trivial challenges to existing search algorithms that model each entity separately. However, entity-set queries are usually sparse (i.e., not so repetitive), which makes ineffective many supervised ranking models that rely heavily on associated click history. To address these challenges, we introduce SetRank, an unsupervised ranking framework that models inter-entity relationships and captures entity type information. Furthermore, we develop a novel unsupervised model selection algorithm, based on the technique of weighted rank aggregation, to automatically choose the parameter settings in SetRank without resorting to a labeled validation set. We evaluate our proposed unsupervised approach using datasets from TREC Genomics Tracks and Semantic Scholar's query log. The experiments demonstrate that SetRank significantly outperforms the baseline unsupervised models, especially on entity-set queries, and our model selection algorithm effectively chooses suitable parameter settings.
KW - Entity-set aware search
KW - Literature search
KW - Unsupervised model selection
KW - Unsupervised ranking model
UR - http://www.scopus.com/inward/record.url?scp=85051487504&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051487504&partnerID=8YFLogxK
U2 - 10.1145/3209978.3210055
DO - 10.1145/3209978.3210055
M3 - Conference contribution
AN - SCOPUS:85051487504
T3 - 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
SP - 565
EP - 574
BT - 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
PB - Association for Computing Machinery
Y2 - 8 July 2018 through 12 July 2018
ER -