Abstract
In recent years, large networks are routinely used to represent data from many scientific fields. Statistical analysis of these networks, such as estimation and hypothesis testing, has received considerable attention. However, most of the methods proposed in the literature are computationally expensive for large networks. In this article, we propose a subsampling-based method to reduce the computational cost of estimation and two-sample hypothesis testing. The idea is to divide the network into smaller subgraphs with an overlap region, then draw inference based on each subgraph, and finally combine the results together. We first develop the subsampling method for random dot product graph models, and establish theoretical consistency of the proposed method. Then we extend the subsampling method to a more general setup and establish similar theoretical properties. We demonstrate the performance of our methods through simulation experiments and real data analysis. Supplemental materials for the article are available online. The code is available in the following GitHub repository: https://github.com/kchak19/SubsampleTestingNetwork. Supplementary materials for this article are available online.
Original language | English (US) |
---|---|
Journal | Journal of Computational and Graphical Statistics |
Early online date | Jan 27 2025 |
DOIs | |
State | E-pub ahead of print - Jan 27 2025 |
Keywords
- Estimation
- Hypothesis testing
- Network data
- Random dot product graph model
- Subsampling
ASJC Scopus subject areas
- Statistics and Probability
- Discrete Mathematics and Combinatorics
- Statistics, Probability and Uncertainty