Capisco: low-cost concept-based access to digital libraries

Annika Hinze, David Bainbridge, Sally Jo Cunningham, Craig Taube-Schock, Rangi Matamua, J. Stephen Downie, Edie Rasmussen

Research output: Contribution to journalArticle

Abstract

In this article, we present the conceptual design and report on the implementation of Capisco—a low-cost approach to concept-based access to digital libraries. Capisco avoids the need for complete semantic document markup using ontologies by leveraging an automatically generated Concept-in-Context (CiC) network. The network is seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system disambiguates the semantics of terms in the documents by their semantics and context and identifies the relevant CiC concepts. Supplementary to this, the disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. For established digital library systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing, and query interface) would require major technological effort and would most likely be disruptive. In addition to presenting Capisco, we describe ways to harness the results of our developed semantic analysis and disambiguation, while retaining the existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.

Original languageEnglish (US)
Pages (from-to)307-334
Number of pages28
JournalInternational Journal on Digital Libraries
Volume20
Issue number4
DOIs
StatePublished - Dec 1 2019

Fingerprint

semantics
costs
Wikipedia
document analysis
indexing
ontology
import
engineer
knowledge

Keywords

  • Disambiguation
  • Indexing
  • Metadata enrichment
  • Semantic analysis
  • Semantic enrichment

ASJC Scopus subject areas

  • Library and Information Sciences

Cite this

Hinze, A., Bainbridge, D., Cunningham, S. J., Taube-Schock, C., Matamua, R., Downie, J. S., & Rasmussen, E. (2019). Capisco: low-cost concept-based access to digital libraries. International Journal on Digital Libraries, 20(4), 307-334. https://doi.org/10.1007/s00799-018-0232-3

Capisco : low-cost concept-based access to digital libraries. / Hinze, Annika; Bainbridge, David; Cunningham, Sally Jo; Taube-Schock, Craig; Matamua, Rangi; Downie, J. Stephen; Rasmussen, Edie.

In: International Journal on Digital Libraries, Vol. 20, No. 4, 01.12.2019, p. 307-334.

Research output: Contribution to journalArticle

Hinze, A, Bainbridge, D, Cunningham, SJ, Taube-Schock, C, Matamua, R, Downie, JS & Rasmussen, E 2019, 'Capisco: low-cost concept-based access to digital libraries', International Journal on Digital Libraries, vol. 20, no. 4, pp. 307-334. https://doi.org/10.1007/s00799-018-0232-3
Hinze, Annika ; Bainbridge, David ; Cunningham, Sally Jo ; Taube-Schock, Craig ; Matamua, Rangi ; Downie, J. Stephen ; Rasmussen, Edie. / Capisco : low-cost concept-based access to digital libraries. In: International Journal on Digital Libraries. 2019 ; Vol. 20, No. 4. pp. 307-334.
@article{e142ddae1c44425ea93213162e213b53,
title = "Capisco: low-cost concept-based access to digital libraries",
abstract = "In this article, we present the conceptual design and report on the implementation of Capisco—a low-cost approach to concept-based access to digital libraries. Capisco avoids the need for complete semantic document markup using ontologies by leveraging an automatically generated Concept-in-Context (CiC) network. The network is seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system disambiguates the semantics of terms in the documents by their semantics and context and identifies the relevant CiC concepts. Supplementary to this, the disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. For established digital library systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing, and query interface) would require major technological effort and would most likely be disruptive. In addition to presenting Capisco, we describe ways to harness the results of our developed semantic analysis and disambiguation, while retaining the existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.",
keywords = "Disambiguation, Indexing, Metadata enrichment, Semantic analysis, Semantic enrichment",
author = "Annika Hinze and David Bainbridge and Cunningham, {Sally Jo} and Craig Taube-Schock and Rangi Matamua and Downie, {J. Stephen} and Edie Rasmussen",
year = "2019",
month = "12",
day = "1",
doi = "10.1007/s00799-018-0232-3",
language = "English (US)",
volume = "20",
pages = "307--334",
journal = "International Journal on Digital Libraries",
issn = "1432-5012",
publisher = "Springer Verlag",
number = "4",

}

TY - JOUR

T1 - Capisco

T2 - low-cost concept-based access to digital libraries

AU - Hinze, Annika

AU - Bainbridge, David

AU - Cunningham, Sally Jo

AU - Taube-Schock, Craig

AU - Matamua, Rangi

AU - Downie, J. Stephen

AU - Rasmussen, Edie

PY - 2019/12/1

Y1 - 2019/12/1

N2 - In this article, we present the conceptual design and report on the implementation of Capisco—a low-cost approach to concept-based access to digital libraries. Capisco avoids the need for complete semantic document markup using ontologies by leveraging an automatically generated Concept-in-Context (CiC) network. The network is seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system disambiguates the semantics of terms in the documents by their semantics and context and identifies the relevant CiC concepts. Supplementary to this, the disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. For established digital library systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing, and query interface) would require major technological effort and would most likely be disruptive. In addition to presenting Capisco, we describe ways to harness the results of our developed semantic analysis and disambiguation, while retaining the existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.

AB - In this article, we present the conceptual design and report on the implementation of Capisco—a low-cost approach to concept-based access to digital libraries. Capisco avoids the need for complete semantic document markup using ontologies by leveraging an automatically generated Concept-in-Context (CiC) network. The network is seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system disambiguates the semantics of terms in the documents by their semantics and context and identifies the relevant CiC concepts. Supplementary to this, the disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. For established digital library systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing, and query interface) would require major technological effort and would most likely be disruptive. In addition to presenting Capisco, we describe ways to harness the results of our developed semantic analysis and disambiguation, while retaining the existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.

KW - Disambiguation

KW - Indexing

KW - Metadata enrichment

KW - Semantic analysis

KW - Semantic enrichment

UR - http://www.scopus.com/inward/record.url?scp=85043689836&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85043689836&partnerID=8YFLogxK

U2 - 10.1007/s00799-018-0232-3

DO - 10.1007/s00799-018-0232-3

M3 - Article

AN - SCOPUS:85043689836

VL - 20

SP - 307

EP - 334

JO - International Journal on Digital Libraries

JF - International Journal on Digital Libraries

SN - 1432-5012

IS - 4

ER -