Narrowing the Gap Between Termbases and Corpora in Commercial Environments

Kara Warburton

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Terminological resources offer potential to support applications beyond translation, such as controlled authoring and indexing, which are increasingly of interest to commercial enterprises. The ad-hoc semasiological approach adopted by commercial terminographers diverges considerably from methodologies prescribed by conventional theory. The notion of termhood in such production-oriented environments is driven by pragmatic criteria such as frequency and repurposability of the terminological unit. A high degree of correspondence between the commercial corpus and the termbase is desired. Research carried out at the City University of Hong Kong using four IT companies as case studies revealed a large gap between corpora and termbases. Problems in selecting terms and in encoding them properly in termbases account for a significant portion of this gap. A rigorous corpus-based approach to term selection would significantly reduce this gap and improve the effectiveness of commercial termbases. In particular, single-word terms (keywords) identified by comparison to a reference corpus offer great potential for identifying important multi-word terms in this context.

Original languageEnglish (US)
Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
EditorsNicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
PublisherEuropean Language Resources Association (ELRA)
Pages722-727
Number of pages6
ISBN (Electronic)9782951740884
StatePublished - 2014
Externally publishedYes
Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
Duration: May 26 2014May 31 2014

Other

Other9th International Conference on Language Resources and Evaluation, LREC 2014
Country/TerritoryIceland
CityReykjavik
Period5/26/145/31/14

Keywords

  • Corpora
  • Terminography

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Education
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Narrowing the Gap Between Termbases and Corpora in Commercial Environments'. Together they form a unique fingerprint.

Cite this