Automated methods that leverage both large quantities of text, and knowledge resources are increasingly being used to identify relations from the web. One of the challenges with these approaches is that quantity of relations identified makes it difficult to evaluate system performance. Our goal in this paper is to demonstrate how bootstrapping can be used to identify subcellular location relations of proteins, which is a critically important to biologists in order to understand protein function. Specifically, we use protein-location pairs in the UniProt knowledge base and dependency paths from a collection of text, to infer new proteins, new locations and new protein-location pairs. Our second goal is to conduct a detailed manual analysis of the first iteration of the bootstrapping process. Such an analysis reveals pitfalls of this approach and enables us to conclude with specific recommendations to improve bootstrapping performance in subsequent iterations.
- Information extraction
ASJC Scopus subject areas
- Information Systems
- Library and Information Sciences