TY - GEN
T1 - Impact of entity disambiguation errors on social network properties
AU - Diesner, Jana
AU - Evans, Craig S.
AU - Kim, Jinseok
N1 - Publisher Copyright:
© Copyright 2015, Association for the Advancement of Artificial Intelligence. All rights reserved.
PY - 2015
Y1 - 2015
N2 - Entities in social networks may be subject to consolidation when they are inconsistently indexed, and subject to splitting when multiple entities share the same name. How much do errors or shortfalls in entity disambiguation distort network properties? We show empirically how network analysis results and derived implications can tremendously change depending solely on entity resolution techniques. We present a series of controlled experiments where we vary disambiguation accuracy to study error propagation and the robustness of common network metrics, topologies and key players. Our results suggest that for email data, not conducting deduplication, e.g. when operating on the level of email addressed instead of individuals, can make organizational communication networks appear to be less coherent and integrated as well as bigger than they truly are. For copublishing networks, improper merging as caused by the commonly used initial based disambiguation techniques can make a scientific sector seem more dense and cohesive than it really is, and individual authors appear to be more productive, collaborative and diversified than they actually are. Disambiguation errors can also lead to the false detection of power law distributions of node degree; suggesting preferential attachment processes that might not apply.
AB - Entities in social networks may be subject to consolidation when they are inconsistently indexed, and subject to splitting when multiple entities share the same name. How much do errors or shortfalls in entity disambiguation distort network properties? We show empirically how network analysis results and derived implications can tremendously change depending solely on entity resolution techniques. We present a series of controlled experiments where we vary disambiguation accuracy to study error propagation and the robustness of common network metrics, topologies and key players. Our results suggest that for email data, not conducting deduplication, e.g. when operating on the level of email addressed instead of individuals, can make organizational communication networks appear to be less coherent and integrated as well as bigger than they truly are. For copublishing networks, improper merging as caused by the commonly used initial based disambiguation techniques can make a scientific sector seem more dense and cohesive than it really is, and individual authors appear to be more productive, collaborative and diversified than they actually are. Disambiguation errors can also lead to the false detection of power law distributions of node degree; suggesting preferential attachment processes that might not apply.
UR - http://www.scopus.com/inward/record.url?scp=84960956815&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84960956815&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84960956815
T3 - Proceedings of the 9th International Conference on Web and Social Media, ICWSM 2015
SP - 81
EP - 90
BT - Proceedings of the 9th International Conference on Web and Social Media, ICWSM 2015
PB - American Association for Artificial Intelligence (AAAI) Press
T2 - 9th International Conference on Web and Social Media, ICWSM 2015
Y2 - 26 May 2015 through 29 May 2015
ER -