Aligning texts and knowledge bases with semantic sentence simplification

Yassine Mrabet, Pavlos Vougiouklis, Halil Kilicoglu, Claire Gardent, Dina Demner-Fushman, Jonathon Hare, Elena Simperl

Research output: Contribution to conferencePaperpeer-review


Finding the natural language equivalent of structured data is both a challenging and promising task. In particular, an efficient alignment of knowledge bases with texts would benefit many applications, including natural language generation, information retrieval and text simplification. In this paper, we present an approach to build a dataset of triples aligned with equivalent sentences written in natural language. Our approach consists of three main steps. First, target sentences are annotated automatically with knowledge base (KB) concepts and instances. The triples linking these elements in the KB are extracted as candidate facts to be aligned with the annotated sentence. Second, we use textual mentions referring to the subject and object of these facts to semantically simplify the target sentence via crowdsourcing. Third, the sentences provided by different contributors are post-processed to keep only the most relevant simplifications for the alignment with KB facts. We present different filtering methods, and share the constructed datasets in the public domain. These datasets contain 1,050 sentences aligned with 1,885 triples. They can be used to train natural language generators as well as semantic or contextual text simplifiers.

Original languageEnglish (US)
Number of pages8
StatePublished - 2016
Externally publishedYes
Event2nd International Workshop on Natural Language Generation and the Semantic Web, WebNLG 2016 - Edinburgh, United Kingdom
Duration: Sep 6 2016 → …


Conference2nd International Workshop on Natural Language Generation and the Semantic Web, WebNLG 2016
Country/TerritoryUnited Kingdom
Period9/6/16 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Networks and Communications
  • Computer Science Applications
  • Linguistics and Language
  • Media Technology


Dive into the research topics of 'Aligning texts and knowledge bases with semantic sentence simplification'. Together they form a unique fingerprint.

Cite this