Formational bounds of link prediction in collaboration networks

Jinseok Kim, Jana Diesner

Research output: Contribution to journalArticle

Abstract

Link prediction in collaboration networks is often solved by identifying structural properties of existing nodes that are disconnected at one point in time, and that share a link later on. The maximally possible recall rate or upper bound of this approach’s success is capped by the proportion of links that are formed among existing nodes embedded in these properties. Consequentially, sustained links as well as links that involve one or two new network participants are typically not predicted. The purpose of this study is to highlight formational constraints that need to be considered to increase the practical value of link prediction methods targeted for collaboration networks. In this study, we identify the distribution of basic link formation types based on four large-scale, over-time collaboration networks, showing that roughly speaking, 25% of links represent continued collaborations, 25% of links are new collaborations between existing authors, and 50% are formed between an existing author and a new network member. This implies that for collaboration networks, increasing the accuracy of computational link prediction solutions may not be a reasonable goal when the ratio of collaboration links that are eligible to the classic link prediction process is low.

Original languageEnglish (US)
Pages (from-to)687-706
Number of pages20
JournalScientometrics
Volume119
Issue number2
DOIs
StatePublished - May 15 2019

Fingerprint

Structural properties
time

Keywords

  • Collaboration network
  • Link formation primitives
  • Link prediction
  • Network evolution
  • Preferential attachment

ASJC Scopus subject areas

  • Social Sciences(all)
  • Computer Science Applications
  • Library and Information Sciences

Cite this

Formational bounds of link prediction in collaboration networks. / Kim, Jinseok; Diesner, Jana.

In: Scientometrics, Vol. 119, No. 2, 15.05.2019, p. 687-706.

Research output: Contribution to journalArticle

@article{45dfa05533324a01ad3b1a4243ef692c,
title = "Formational bounds of link prediction in collaboration networks",
abstract = "Link prediction in collaboration networks is often solved by identifying structural properties of existing nodes that are disconnected at one point in time, and that share a link later on. The maximally possible recall rate or upper bound of this approach’s success is capped by the proportion of links that are formed among existing nodes embedded in these properties. Consequentially, sustained links as well as links that involve one or two new network participants are typically not predicted. The purpose of this study is to highlight formational constraints that need to be considered to increase the practical value of link prediction methods targeted for collaboration networks. In this study, we identify the distribution of basic link formation types based on four large-scale, over-time collaboration networks, showing that roughly speaking, 25{\%} of links represent continued collaborations, 25{\%} of links are new collaborations between existing authors, and 50{\%} are formed between an existing author and a new network member. This implies that for collaboration networks, increasing the accuracy of computational link prediction solutions may not be a reasonable goal when the ratio of collaboration links that are eligible to the classic link prediction process is low.",
keywords = "Collaboration network, Link formation primitives, Link prediction, Network evolution, Preferential attachment",
author = "Jinseok Kim and Jana Diesner",
year = "2019",
month = "5",
day = "15",
doi = "10.1007/s11192-019-03055-6",
language = "English (US)",
volume = "119",
pages = "687--706",
journal = "Scientometrics",
issn = "0138-9130",
publisher = "Springer Netherlands",
number = "2",

}

TY - JOUR

T1 - Formational bounds of link prediction in collaboration networks

AU - Kim, Jinseok

AU - Diesner, Jana

PY - 2019/5/15

Y1 - 2019/5/15

N2 - Link prediction in collaboration networks is often solved by identifying structural properties of existing nodes that are disconnected at one point in time, and that share a link later on. The maximally possible recall rate or upper bound of this approach’s success is capped by the proportion of links that are formed among existing nodes embedded in these properties. Consequentially, sustained links as well as links that involve one or two new network participants are typically not predicted. The purpose of this study is to highlight formational constraints that need to be considered to increase the practical value of link prediction methods targeted for collaboration networks. In this study, we identify the distribution of basic link formation types based on four large-scale, over-time collaboration networks, showing that roughly speaking, 25% of links represent continued collaborations, 25% of links are new collaborations between existing authors, and 50% are formed between an existing author and a new network member. This implies that for collaboration networks, increasing the accuracy of computational link prediction solutions may not be a reasonable goal when the ratio of collaboration links that are eligible to the classic link prediction process is low.

AB - Link prediction in collaboration networks is often solved by identifying structural properties of existing nodes that are disconnected at one point in time, and that share a link later on. The maximally possible recall rate or upper bound of this approach’s success is capped by the proportion of links that are formed among existing nodes embedded in these properties. Consequentially, sustained links as well as links that involve one or two new network participants are typically not predicted. The purpose of this study is to highlight formational constraints that need to be considered to increase the practical value of link prediction methods targeted for collaboration networks. In this study, we identify the distribution of basic link formation types based on four large-scale, over-time collaboration networks, showing that roughly speaking, 25% of links represent continued collaborations, 25% of links are new collaborations between existing authors, and 50% are formed between an existing author and a new network member. This implies that for collaboration networks, increasing the accuracy of computational link prediction solutions may not be a reasonable goal when the ratio of collaboration links that are eligible to the classic link prediction process is low.

KW - Collaboration network

KW - Link formation primitives

KW - Link prediction

KW - Network evolution

KW - Preferential attachment

UR - http://www.scopus.com/inward/record.url?scp=85062768884&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062768884&partnerID=8YFLogxK

U2 - 10.1007/s11192-019-03055-6

DO - 10.1007/s11192-019-03055-6

M3 - Article

AN - SCOPUS:85062768884

VL - 119

SP - 687

EP - 706

JO - Scientometrics

JF - Scientometrics

SN - 0138-9130

IS - 2

ER -