TY - GEN
T1 - An Evaluation of NLP Methods to Extract Mathematical Token Descriptors
AU - Hamel, Emma
AU - Zheng, Hongbo
AU - Kani, Nickvash
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Mathematical formulae are a foundational component of information in all scientific and mathematical papers. Parsing meaning from these expressions by extracting textual descriptors of their variable tokens is a unique challenge that requires semantic and grammatical knowledge. In this work, we present a new manually-labeled dataset (called the MTDE dataset) of mathematical objects, the contexts in which they are defined, and their textual definitions. With this dataset, we evaluate the accuracy of several modern neural network models on two definition extraction tasks. While this is not a solved task, modern language models such as BERT perform well (∼ 90%). Both the dataset and neural network models (implemented in PyTorch jupyter notebooks) are available online to help aid future researchers in this space.
AB - Mathematical formulae are a foundational component of information in all scientific and mathematical papers. Parsing meaning from these expressions by extracting textual descriptors of their variable tokens is a unique challenge that requires semantic and grammatical knowledge. In this work, we present a new manually-labeled dataset (called the MTDE dataset) of mathematical objects, the contexts in which they are defined, and their textual definitions. With this dataset, we evaluate the accuracy of several modern neural network models on two definition extraction tasks. While this is not a solved task, modern language models such as BERT perform well (∼ 90%). Both the dataset and neural network models (implemented in PyTorch jupyter notebooks) are available online to help aid future researchers in this space.
KW - Dataset
KW - Mathematical language processing
KW - Named entity recognition
KW - Text summarization
UR - http://www.scopus.com/inward/record.url?scp=85138781592&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85138781592&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-16681-5_23
DO - 10.1007/978-3-031-16681-5_23
M3 - Conference contribution
AN - SCOPUS:85138781592
SN - 9783031166808
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 329
EP - 343
BT - Intelligent Computer Mathematics - 15th International Conference, CICM 2022, Proceedings
A2 - Buzzard, Kevin
A2 - Kutsia, Temur
PB - Springer
T2 - 15th Conference on Intelligent Computer Mathematics, CICM 2022
Y2 - 19 September 2022 through 23 September 2022
ER -