TY - JOUR
T1 - Towards a unified approach to industry recovery
T2 - Insights from intraday stock data and advanced community detection methods
AU - Bracht, Eamon
AU - Brunner, Robert
AU - McMullin, Jeff
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/7/1
Y1 - 2025/7/1
N2 - In this paper, we explore the impact of various time series parameters — such as sampling frequency, sample period, and time series length — on the ability to recover industry classifications within financial networks. By using high-frequency stock data from the S&P 500 from 2005 to 2012, we construct information connection networks using normalized mutual information (NMI) and employ the Planar Maximally Filtered Graph (PMFG) to filter noise. We apply both Leiden and spectral clustering algorithms to identify communities of stocks and compare them with the Global Industry Classification Standard (GICS) using the Adjusted Rand Index (ARI) to assess clustering accuracy. Our analysis reveals that the optimal recovery of industry structures occurs at a sampling frequency much faster than daily: with ARI values peaking at frequencies between 4 min and 48 min timescale and decreasing over longer frequencies. We observe that higher sampling frequencies introduce noise, leading to weaker clustering performance, likely due to the Epps effect. Additionally, the results indicate that ARI is sensitive to market conditions, with higher clustering accuracy during and after periods of market volatility, such as the 2008 financial crisis.
AB - In this paper, we explore the impact of various time series parameters — such as sampling frequency, sample period, and time series length — on the ability to recover industry classifications within financial networks. By using high-frequency stock data from the S&P 500 from 2005 to 2012, we construct information connection networks using normalized mutual information (NMI) and employ the Planar Maximally Filtered Graph (PMFG) to filter noise. We apply both Leiden and spectral clustering algorithms to identify communities of stocks and compare them with the Global Industry Classification Standard (GICS) using the Adjusted Rand Index (ARI) to assess clustering accuracy. Our analysis reveals that the optimal recovery of industry structures occurs at a sampling frequency much faster than daily: with ARI values peaking at frequencies between 4 min and 48 min timescale and decreasing over longer frequencies. We observe that higher sampling frequencies introduce noise, leading to weaker clustering performance, likely due to the Epps effect. Additionally, the results indicate that ARI is sensitive to market conditions, with higher clustering accuracy during and after periods of market volatility, such as the 2008 financial crisis.
KW - Adjusted rand index
KW - Community detection
KW - Correlation networks
KW - GICS
KW - Normalized mutual information
KW - Stock market
UR - http://www.scopus.com/inward/record.url?scp=105002564953&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105002564953&partnerID=8YFLogxK
U2 - 10.1016/j.physa.2025.130501
DO - 10.1016/j.physa.2025.130501
M3 - Article
AN - SCOPUS:105002564953
SN - 0378-4371
VL - 669
JO - Physica A: Statistical Mechanics and its Applications
JF - Physica A: Statistical Mechanics and its Applications
M1 - 130501
ER -