Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives

Prateek Jindal, Dan Roth

Research output: Contribution to journalArticlepeer-review

Abstract

Objective: This paper presents a coreference resolutionsystem for clinical narratives. Coreference resolutionaims at clustering all mentions in a single document tocoherent entities. Materials and methods: A knowledge-intensiveapproach for coreference resolution is employed. Thedomain knowledge used includes several domain-specificlists, a knowledge intensive mention parsing, and taskinformed discourse model. Mention parsing allows us toabstract over the surface form of the mention andrepresent each mention using a higher-levelrepresentation, which we call the mention's semanticrepresentation (SR). SR reduces the mention toa standard form and hence provides better support forcomparing and matching. Existing coreference resolutionsystems tend to ignore discourse aspects and relyheavily on lexical and structural cues in the text. Theauthors break from this tradition and present a discoursemodel for "person" type mentions in clinical narratives,which greatly simplifies the coreference resolution. Results: This system was evaluated on four differentdatasets which were made available in the 2011 i2b2/VAcoreference challenge. The unweighted average of F1scores (over B-cubed, MUC and CEAF) varied from 84.2% to 88.1%. These experiments show that domainknowledge is effective for different mention types for allthe datasets. Discussion: Error analysis shows that most of the recallerrors made by the system can be handled by furtheraddition of domain knowledge. The precision errors, onthe other hand, are more subtle and indicate the need tounderstand the relations in which mentions participatefor building a robust coreference system. Conclusion: This paper presents an approach thatmakes an extensive use of domain knowledge tosignificantly improve coreference resolution. The authorsstate that their system and the knowledge sourcesdeveloped will be made publicly available.

Original languageEnglish (US)
Pages (from-to)356-362
Number of pages7
JournalJournal of the American Medical Informatics Association
Volume20
Issue number2
DOIs
StatePublished - 2013

ASJC Scopus subject areas

  • Health Informatics

Fingerprint

Dive into the research topics of 'Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives'. Together they form a unique fingerprint.

Cite this