TextFlow: A text similarity measure based on continuous sequences

Yassine Mrabet, Halil Kilicoglu, Dina Demner-Fushman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Text similarity measures are used in multiple tasks such as plagiarism detection, information ranking and recognition of paraphrases and textual entailment. While recent advances in deep learning highlighted further the relevance of sequential models in natural language generation, existing similarity measures do not fully exploit the sequential nature of language. Examples of such similarity measures include n-grams and skip-grams overlap which rely on distinct slices of the input texts. In this paper we present a novel text similarity measure inspired from a common representation in DNA sequence alignment algorithms. The new measure, called TextFlow, represents input text pairs as continuous curves and uses both the actual position of the words and sequence matching to compute the similarity value. Our experiments on eight different datasets show very encouraging results in paraphrase detection, textual entailment recognition and ranking relevance.

Original languageEnglish (US)
Title of host publicationACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
PublisherAssociation for Computational Linguistics (ACL)
Pages763-772
Number of pages10
ISBN (Electronic)9781945626753
DOIs
StatePublished - 2017
Externally publishedYes
Event55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 - Vancouver, Canada
Duration: Jul 30 2017Aug 4 2017

Publication series

NameACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Volume1

Other

Other55th Annual Meeting of the Association for Computational Linguistics, ACL 2017
Country/TerritoryCanada
CityVancouver
Period7/30/178/4/17

ASJC Scopus subject areas

  • Language and Linguistics
  • Artificial Intelligence
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'TextFlow: A text similarity measure based on continuous sequences'. Together they form a unique fingerprint.

Cite this