TY - GEN
T1 - Infectious texts
T2 - 2013 IEEE International Conference on Big Data, Big Data 2013
AU - Smith, David A.
AU - Cordell, Ryan
AU - Dillon, Elizabeth Maddock
PY - 2013
Y1 - 2013
N2 - Texts propagate through many social networks and provide evidence for their structure. We present efficient algorithms for detecting clusters of reused passages embedded within longer documents in large collections. We apply these techniques to analyzing the culture of reprinting in the United States before the Civil War. Without substantial copyright enforcement, stories, poems, news, and anecdotes circulated freely among newspapers, magazines, and books. From a collection of OCR'd newspapers, we extract a new corpus of reprinted texts, explore the geographic spread and network connections of different publications, and analyze the time dynamics of different genres.
AB - Texts propagate through many social networks and provide evidence for their structure. We present efficient algorithms for detecting clusters of reused passages embedded within longer documents in large collections. We apply these techniques to analyzing the culture of reprinting in the United States before the Civil War. Without substantial copyright enforcement, stories, poems, news, and anecdotes circulated freely among newspapers, magazines, and books. From a collection of OCR'd newspapers, we extract a new corpus of reprinted texts, explore the geographic spread and network connections of different publications, and analyze the time dynamics of different genres.
KW - clustering algorithms
KW - nearest neighbor searches
UR - http://www.scopus.com/inward/record.url?scp=84893253373&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893253373&partnerID=8YFLogxK
U2 - 10.1109/BigData.2013.6691675
DO - 10.1109/BigData.2013.6691675
M3 - Conference contribution
AN - SCOPUS:84893253373
SN - 9781479912926
T3 - Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
SP - 86
EP - 94
BT - Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
PB - IEEE Computer Society
Y2 - 6 October 2013 through 9 October 2013
ER -