Using bootstrapping to identify protein locations

Catherine Blake, Wu Zheng

Research output: Contribution to journalArticlepeer-review

Abstract

Automated methods that leverage both large quantities of text, and knowledge resources are increasingly being used to identify relations from the web. One of the challenges with these approaches is that quantity of relations identified makes it difficult to evaluate system performance. Our goal in this paper is to demonstrate how bootstrapping can be used to identify subcellular location relations of proteins, which is a critically important to biologists in order to understand protein function. Specifically, we use protein-location pairs in the UniProt knowledge base and dependency paths from a collection of text, to infer new proteins, new locations and new protein-location pairs. Our second goal is to conduct a detailed manual analysis of the first iteration of the bootstrapping process. Such an analysis reveals pitfalls of this approach and enables us to conclude with specific recommendations to improve bootstrapping performance in subsequent iterations.

Original languageEnglish (US)
JournalProceedings of the ASIST Annual Meeting
Volume49
Issue number1
DOIs
StatePublished - Dec 1 2012

Keywords

  • Bioinformatics
  • Bootstrapping
  • Information extraction

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'Using bootstrapping to identify protein locations'. Together they form a unique fingerprint.

Cite this