TY - JOUR
T1 - From Texts to Networks
T2 - Detecting and Managing the Impact of Methodological Choices for Extracting Network Data from Text Data
AU - Diesner, Jana
N1 - Publisher Copyright:
© 2013, Springer-Verlag Berlin Heidelberg.
PY - 2013/2/1
Y1 - 2013/2/1
N2 - This thesis (Diesner in Technical Report CMU-ISR-12-101, 2012) addresses a series of methodological problems related to extracting information on socio-technical networks from natural language text data. Theories and models from the social sciences are leveraged and combined with computational approaches to (a) construct, analyze and compare network data and (b) combine text data and network data for analysis. This thesis entails various projects that serve three purposes: First, the impact of various common coding choices, including reference resolution and co-occurrence-based link formation, on network data and analysis results is empirically identified across multiple types of text data and domains. Second, different relation extraction methods are compared across various over-time, open-source, large-scale datasets with respect to the resulting network data and analysis results. This study offers a complement to traditional strategies for accuracy assessment. The relation extraction methods considered include network data construction based on (a) manually versus automatically built thesauri, (b) meta-data, and (c) collaboration with subject matter experts. Third, the concepts of grouping and roles from network analysis are integrated with text mining methods to enable the theoretically grounded, joint consideration of text data and network data for real-world applications. Overall, in this thesis, an interdisciplinary and computationally rigorous approach is used; thereby advancing the intersection of network analysis, natural language processing and computing. The contributions made with this work help people to utilize text data for network analysis, and to collect, manage and interpret rich network data at any scale. These steps are preconditions for asking substantive and graph-theoretic questions, testing hypotheses, and advancing theories about networks.
AB - This thesis (Diesner in Technical Report CMU-ISR-12-101, 2012) addresses a series of methodological problems related to extracting information on socio-technical networks from natural language text data. Theories and models from the social sciences are leveraged and combined with computational approaches to (a) construct, analyze and compare network data and (b) combine text data and network data for analysis. This thesis entails various projects that serve three purposes: First, the impact of various common coding choices, including reference resolution and co-occurrence-based link formation, on network data and analysis results is empirically identified across multiple types of text data and domains. Second, different relation extraction methods are compared across various over-time, open-source, large-scale datasets with respect to the resulting network data and analysis results. This study offers a complement to traditional strategies for accuracy assessment. The relation extraction methods considered include network data construction based on (a) manually versus automatically built thesauri, (b) meta-data, and (c) collaboration with subject matter experts. Third, the concepts of grouping and roles from network analysis are integrated with text mining methods to enable the theoretically grounded, joint consideration of text data and network data for real-world applications. Overall, in this thesis, an interdisciplinary and computationally rigorous approach is used; thereby advancing the intersection of network analysis, natural language processing and computing. The contributions made with this work help people to utilize text data for network analysis, and to collect, manage and interpret rich network data at any scale. These steps are preconditions for asking substantive and graph-theoretic questions, testing hypotheses, and advancing theories about networks.
KW - Entity extraction
KW - Network clustering
KW - Reference resolution
KW - Relation extraction
KW - Semantic networks
KW - Socio-technical networks
UR - http://www.scopus.com/inward/record.url?scp=84897761231&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84897761231&partnerID=8YFLogxK
U2 - 10.1007/s13218-012-0225-0
DO - 10.1007/s13218-012-0225-0
M3 - Article
AN - SCOPUS:84897761231
SN - 0933-1875
VL - 27
SP - 75
EP - 78
JO - KI - Kunstliche Intelligenz
JF - KI - Kunstliche Intelligenz
IS - 1
ER -