Using bootstrapping to identify protein locations

Catherine Blake, Wu Zheng

Research output: Contribution to journalArticlepeer-review


Automated methods that leverage both large quantities of text, and knowledge resources are increasingly being used to identify relations from the web. One of the challenges with these approaches is that quantity of relations identified makes it difficult to evaluate system performance. Our goal in this paper is to demonstrate how bootstrapping can be used to identify subcellular location relations of proteins, which is a critically important to biologists in order to understand protein function. Specifically, we use protein-location pairs in the UniProt knowledge base and dependency paths from a collection of text, to infer new proteins, new locations and new protein-location pairs. Our second goal is to conduct a detailed manual analysis of the first iteration of the bootstrapping process. Such an analysis reveals pitfalls of this approach and enables us to conclude with specific recommendations to improve bootstrapping performance in subsequent iterations.

Original languageEnglish (US)
Pages (from-to)1-2
Number of pages2
JournalProceedings of the ASIST Annual Meeting
Issue number1
StatePublished - 2012


  • Bioinformatics
  • Bootstrapping
  • Information extraction

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences


Dive into the research topics of 'Using bootstrapping to identify protein locations'. Together they form a unique fingerprint.

Cite this