TY - GEN
T1 - Annotating named entities in consumer health questions
AU - Kilicoglu, Halil
AU - Abacha, Asma Ben
AU - Mrabet, Yassine
AU - Roberts, Kirk
AU - Rodriguez, Laritza
AU - Shooshan, Sonya E.
AU - Demner-Fushman, Dina
N1 - Funding Information:
• Questions asking for financial support for treatment of a disease
PY - 2016
Y1 - 2016
N2 - We describe a corpus of consumer health questions annotated with named entities. The corpus consists of 1548 de-identified questions about diseases and drugs, written in English. We defined 15 broad categories of biomedical named entities for annotation. A pilot annotation phase in which a small portion of the corpus was double-annotated by four annotators was followed by a main phase in which double annotation was carried out by six annotators, and a reconciliation phase in which all annotations were reconciled by an expert. We conducted the annotation in two modes, manual and assisted, to assess the effect of automatic pre-annotation and calculated inter-annotator agreement. We obtained moderate inter-annotator agreement; assisted annotation yielded slightly better agreement and fewer missed annotations than manual annotation. Due to complex nature of biomedical entities, we paid particular attention to nested entities for which we obtained slightly lower inter-annotator agreement, confirming that annotating nested entities is somewhat more challenging. To our knowledge, the corpus is the first of its kind for consumer health text and is publicly available.
AB - We describe a corpus of consumer health questions annotated with named entities. The corpus consists of 1548 de-identified questions about diseases and drugs, written in English. We defined 15 broad categories of biomedical named entities for annotation. A pilot annotation phase in which a small portion of the corpus was double-annotated by four annotators was followed by a main phase in which double annotation was carried out by six annotators, and a reconciliation phase in which all annotations were reconciled by an expert. We conducted the annotation in two modes, manual and assisted, to assess the effect of automatic pre-annotation and calculated inter-annotator agreement. We obtained moderate inter-annotator agreement; assisted annotation yielded slightly better agreement and fewer missed annotations than manual annotation. Due to complex nature of biomedical entities, we paid particular attention to nested entities for which we obtained slightly lower inter-annotator agreement, confirming that annotating nested entities is somewhat more challenging. To our knowledge, the corpus is the first of its kind for consumer health text and is publicly available.
KW - Assisted annotation
KW - Biomedical named entities
KW - Consumer health questions
KW - Nested entities
UR - http://www.scopus.com/inward/record.url?scp=85032822677&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85032822677&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85032822677
T3 - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
SP - 3325
EP - 3332
BT - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Mazo, Helene
A2 - Moreno, Asuncion
A2 - Declerck, Thierry
A2 - Goggi, Sara
A2 - Grobelnik, Marko
A2 - Odijk, Jan
A2 - Piperidis, Stelios
A2 - Maegaard, Bente
A2 - Mariani, Joseph
PB - European Language Resources Association (ELRA)
T2 - 10th International Conference on Language Resources and Evaluation, LREC 2016
Y2 - 23 May 2016 through 28 May 2016
ER -