Detecting and modeling local text reuse

David A. Smith, Ryan Cordell, Elizabeth Maddock Dillon, Nick Stramp, John Wilkerson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Texts propagate through many social networks and provide evidence for their structure. We describe and evaluate efficient algorithms for detecting clusters of reused passages embedded within longer documents in large collections. We apply these techniques to two case studies: analyzing the culture of free reprinting in the nineteenth-century United States and the development of bills into legislation in the U.S. Congress. Using these divergent case studies, we evaluate both the efficiency of the approximate local text reuse detection methods and the accuracy of the results. These techniques allow us to explore how ideas spread, which ideas spread, and which subgroups shared ideas.

Original languageEnglish (US)
Title of host publication2014 IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages183-192
Number of pages10
ISBN (Electronic)9781479955695
DOIs
StatePublished - Dec 2014
Externally publishedYes
Event2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014 - London, United Kingdom
Duration: Sep 8 2014Sep 12 2014

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Other

Other2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014
Country/TerritoryUnited Kingdom
CityLondon
Period9/8/149/12/14

ASJC Scopus subject areas

  • Engineering(all)

Cite this