TY - JOUR
T1 - Illinois CCG LoReHLT 2016 named entity recognition and situation frame systems
AU - Tsai, Chen Tse
AU - Mayhew, Stephen
AU - Song, Yangqiu
AU - Sammons, Mark
AU - Roth, Dan
N1 - Funding Information:
Acknowledgements This work was supported by Contract HR0011-15-2-0025 with the US Defense Advanced Research Projects Agency (DARPA). Approved for Public Release, Distribution Unlimited. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government.
Publisher Copyright:
© 2017, Springer Science+Business Media B.V., part of Springer Nature.
PY - 2018/6/1
Y1 - 2018/6/1
N2 - This paper describes Illinois Cognitive Computation Group’s system for the 2016 NIST Low Resource Human Language Technology (LoReHLT) evaluation, in which the target language is Uyghur. We participate in two tasks, named entity recognition (NER) and situation frame (SF). For NER, we develop two models. The first model is a rule-based model, which is based on the knowledge obtained by inspecting the monolingual documents, reading the Uyghur grammar book, and interacting with the native informants. The second model is a transfer model, which is trained on the labeled Uzbek data. Combining the outputs of these two models yields significant improvement and achieves 60.4 F1-score on the official evaluation set. For the new SF task, we apply the dataless classification technique to build an English classifier for eight situation types, and use an Uyghur-to-English dictionary to translate the Uyghur documents. Using this classifier, we propose two frameworks of grounding situations to the locations mentioned in text.
AB - This paper describes Illinois Cognitive Computation Group’s system for the 2016 NIST Low Resource Human Language Technology (LoReHLT) evaluation, in which the target language is Uyghur. We participate in two tasks, named entity recognition (NER) and situation frame (SF). For NER, we develop two models. The first model is a rule-based model, which is based on the knowledge obtained by inspecting the monolingual documents, reading the Uyghur grammar book, and interacting with the native informants. The second model is a transfer model, which is trained on the labeled Uzbek data. Combining the outputs of these two models yields significant improvement and achieves 60.4 F1-score on the official evaluation set. For the new SF task, we apply the dataless classification technique to build an English classifier for eight situation types, and use an Uyghur-to-English dictionary to translate the Uyghur documents. Using this classifier, we propose two frameworks of grounding situations to the locations mentioned in text.
KW - Cross-lingual transfer
KW - Dataless classification
KW - Low-resource language
KW - Named entity recognition
KW - Situation frame
KW - Uyghur language
UR - http://www.scopus.com/inward/record.url?scp=85040706599&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040706599&partnerID=8YFLogxK
U2 - 10.1007/s10590-017-9211-5
DO - 10.1007/s10590-017-9211-5
M3 - Article
AN - SCOPUS:85040706599
SN - 0922-6567
VL - 32
SP - 91
EP - 103
JO - Machine Translation
JF - Machine Translation
IS - 1-2
ER -