TY - JOUR
T1 - Capisco
T2 - low-cost concept-based access to digital libraries
AU - Hinze, Annika
AU - Bainbridge, David
AU - Cunningham, Sally Jo
AU - Taube-Schock, Craig
AU - Matamua, Rangi
AU - Downie, J. Stephen
AU - Rasmussen, Edie
N1 - Publisher Copyright:
© 2018, Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2019/12/1
Y1 - 2019/12/1
N2 - In this article, we present the conceptual design and report on the implementation of Capisco—a low-cost approach to concept-based access to digital libraries. Capisco avoids the need for complete semantic document markup using ontologies by leveraging an automatically generated Concept-in-Context (CiC) network. The network is seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system disambiguates the semantics of terms in the documents by their semantics and context and identifies the relevant CiC concepts. Supplementary to this, the disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. For established digital library systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing, and query interface) would require major technological effort and would most likely be disruptive. In addition to presenting Capisco, we describe ways to harness the results of our developed semantic analysis and disambiguation, while retaining the existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.
AB - In this article, we present the conceptual design and report on the implementation of Capisco—a low-cost approach to concept-based access to digital libraries. Capisco avoids the need for complete semantic document markup using ontologies by leveraging an automatically generated Concept-in-Context (CiC) network. The network is seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system disambiguates the semantics of terms in the documents by their semantics and context and identifies the relevant CiC concepts. Supplementary to this, the disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. For established digital library systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing, and query interface) would require major technological effort and would most likely be disruptive. In addition to presenting Capisco, we describe ways to harness the results of our developed semantic analysis and disambiguation, while retaining the existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.
KW - Disambiguation
KW - Indexing
KW - Metadata enrichment
KW - Semantic analysis
KW - Semantic enrichment
UR - http://www.scopus.com/inward/record.url?scp=85043689836&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85043689836&partnerID=8YFLogxK
U2 - 10.1007/s00799-018-0232-3
DO - 10.1007/s00799-018-0232-3
M3 - Article
AN - SCOPUS:85043689836
SN - 1432-5012
VL - 20
SP - 307
EP - 334
JO - International Journal on Digital Libraries
JF - International Journal on Digital Libraries
IS - 4
ER -