TY - GEN
T1 - Paper2repo
T2 - 29th International World Wide Web Conference, WWW 2020
AU - Shao, Huajie
AU - Sun, Dachun
AU - Wu, Jiahao
AU - Zhang, Zecheng
AU - Zhang, Aston
AU - Yao, Shuochao
AU - Liu, Shengzhong
AU - Wang, Tianshi
AU - Zhang, Chao
AU - Abdelzaher, Tarek
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/4/20
Y1 - 2020/4/20
N2 - GitHub has become a popular social application platform, where a large number of users post their open source projects. In particular, an increasing number of researchers release repositories of source code related to their research papers in order to attract more people to follow their work. Motivated by this trend, we describe a novel item-item cross-platform recommender system, paper2repo, that recommends relevant repositories on GitHub that match a given paper in an academic search system such as Microsoft Academic. The key challenge is to identify the similarity between an input paper and its related repositories across the two platforms, without the benefit of human labeling. Towards that end, paper2repo integrates text encoding and constrained graph convolutional networks (GCN) to automatically learn and map the embeddings of papers and repositories into the same space, where proximity offers the basis for recommendation. To make our method more practical in real life systems, labels used for model training are computed automatically from features of user actions on GitHub. In machine learning, such automatic labeling is often called distant supervision. To the authors' knowledge, this is the first distant-supervised cross-platform (paper to repository) matching system. We evaluate the performance of paper2repo on real-world data sets collected from GitHub and Microsoft Academic. Results demonstrate that it outperforms other state of the art recommendation methods.
AB - GitHub has become a popular social application platform, where a large number of users post their open source projects. In particular, an increasing number of researchers release repositories of source code related to their research papers in order to attract more people to follow their work. Motivated by this trend, we describe a novel item-item cross-platform recommender system, paper2repo, that recommends relevant repositories on GitHub that match a given paper in an academic search system such as Microsoft Academic. The key challenge is to identify the similarity between an input paper and its related repositories across the two platforms, without the benefit of human labeling. Towards that end, paper2repo integrates text encoding and constrained graph convolutional networks (GCN) to automatically learn and map the embeddings of papers and repositories into the same space, where proximity offers the basis for recommendation. To make our method more practical in real life systems, labels used for model training are computed automatically from features of user actions on GitHub. In machine learning, such automatic labeling is often called distant supervision. To the authors' knowledge, this is the first distant-supervised cross-platform (paper to repository) matching system. We evaluate the performance of paper2repo on real-world data sets collected from GitHub and Microsoft Academic. Results demonstrate that it outperforms other state of the art recommendation methods.
KW - Recommender system
KW - constrained graph convolutional networks
KW - cross-platform recommendation
KW - text encoding
UR - http://www.scopus.com/inward/record.url?scp=85086576698&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85086576698&partnerID=8YFLogxK
U2 - 10.1145/3366423.3380145
DO - 10.1145/3366423.3380145
M3 - Conference contribution
AN - SCOPUS:85086576698
T3 - The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
SP - 629
EP - 639
BT - The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
PB - Association for Computing Machinery
Y2 - 20 April 2020 through 24 April 2020
ER -