Rapidly building domain-specific entity-centric language models using semantic web knowledge sources

Murat Akbacak, Dilek Hakkani-Tür, Gokhan Tur

Research output: Contribution to journalConference articlepeer-review

Abstract

For domain-specific speech recognition tasks, it is best if the statistical language model component is trained with text data that is content-wise and style-wise similar to the targeted domain for which the application is built. For state-of-the-art language modeling techniques that can be used in real-time within speech recognition engines during first-pass decoding (e.g., N-gram models), the above constraints have to be fulfilled in the training data. However collecting such data, even through crowd sourcing, is expensive and time consuming, and can still be not representative of how a much larger user population would interact with the recognition system. In this paper, we address this problem by employing several semantic web sources that already contain the domain-specific knowledge, such as query click logs and knowledge graphs. We build statistical language models that meet the requirements listed above for domain-specific recognition tasks where natural language is used and the user queries are about name entities in a specific domain. As a case study, in the movies domain where users' voice queries are movie related, compared to a generic web language model, a language model trained with the above resources not only yields significant perplexity and word-errorrate improvements, but also presents an approach where such language models can be rapidly developed for other domains.

Original languageEnglish (US)
Pages (from-to)2872-2876
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2014
Externally publishedYes
Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
Duration: Sep 14 2014Sep 18 2014

Keywords

  • Knowledge graphs
  • Language modeling
  • Name entities
  • Query click graphs
  • Semantic web
  • Speech recognition

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Rapidly building domain-specific entity-centric language models using semantic web knowledge sources'. Together they form a unique fingerprint.

Cite this