Illinois CCG LoReHLT 2016 named entity recognition and situation frame systems

Chen Tse Tsai, Stephen Mayhew, Yangqiu Song, Mark Sammons, Dan Roth

Research output: Contribution to journalArticlepeer-review

Abstract

This paper describes Illinois Cognitive Computation Group’s system for the 2016 NIST Low Resource Human Language Technology (LoReHLT) evaluation, in which the target language is Uyghur. We participate in two tasks, named entity recognition (NER) and situation frame (SF). For NER, we develop two models. The first model is a rule-based model, which is based on the knowledge obtained by inspecting the monolingual documents, reading the Uyghur grammar book, and interacting with the native informants. The second model is a transfer model, which is trained on the labeled Uzbek data. Combining the outputs of these two models yields significant improvement and achieves 60.4 F1-score on the official evaluation set. For the new SF task, we apply the dataless classification technique to build an English classifier for eight situation types, and use an Uyghur-to-English dictionary to translate the Uyghur documents. Using this classifier, we propose two frameworks of grounding situations to the locations mentioned in text.

Original languageEnglish (US)
Pages (from-to)91-103
Number of pages13
JournalMachine Translation
Volume32
Issue number1-2
DOIs
StatePublished - Jun 1 2018

Keywords

  • Cross-lingual transfer
  • Dataless classification
  • Low-resource language
  • Named entity recognition
  • Situation frame
  • Uyghur language

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Linguistics and Language
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Illinois CCG LoReHLT 2016 named entity recognition and situation frame systems'. Together they form a unique fingerprint.

Cite this