Abstract
Named Entity Recognition (NER), the automated identification and tagging of entities in text, is a popular natural language processing task, and has the power to transform restricted data into open datasets of entities for further research. This project benchmarks four NER models–Stanford NER, BookNLP, spaCy-trf and RoBERTa–to identify the most accurate approach and generate an open-access, gold-standard dataset of human annotated entities. To meet a real-world use case, we benchmark these models on a sample dataset of sentences from Native American authored literature, identifying edge cases and areas of improvement for future NER work.
Original language | English (US) |
---|---|
Pages (from-to) | 681-685 |
Number of pages | 5 |
Journal | Proceedings of the Association for Information Science and Technology |
Volume | 60 |
Issue number | 1 |
DOIs | |
State | Published - Oct 2023 |
Keywords
- HathiTrust
- Named entity recognition
- Native American studies
- cultural analytics
- machine learning
ASJC Scopus subject areas
- General Computer Science
- Library and Information Sciences