TY - GEN
T1 - Your Causal Self-Attentive Recommender Hosts a Lonely Neighborhood
AU - Wang, Yueqi
AU - He, Zhankui
AU - Yue, Zhenrui
AU - McAuley, Julian
AU - Wang, Dong
N1 - This research is supported in part by the National Science Foundation under Grant No. CNS-2427070, IIS-2331069, IIS-2130263, CNS-2131622. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
PY - 2025/3/10
Y1 - 2025/3/10
N2 - In the context of sequential recommendation, a pivotal issue pertains to the comparative analysis between bi-directional/auto-encoding (AE) and uni-directional/auto-regressive (AR) attention mechanisms, where the conclusions regarding architectural and performance superiority remain inconclusive. Previous efforts in such comparisons primarily involve summarizing existing works to identify a consensus or conducting ablation studies on peripheral modeling techniques, such as choices of loss functions. However, far fewer efforts have been made in (1) theoretical and (2) extensive empirical analysis of the self-attention module, the very pivotal structure on which performance and designing insights should be anchored. In this work, we first provide a comprehensive theoretical analysis of AE/AR attention matrix in the aspect of (1) sparse local inductive bias, a.k.a neighborhood effects, and (2) low rank approximation. Analytical metrics reveal that the AR attention exhibits sparse neighborhood effects suitable for generally sparse recommendation scenarios. Secondly, to support our theoretical analysis, we conduct extensive empirical experiments on comparing AE/AR attention on five popular benchmarks with AR performing better overall. Empirical results reported are based on our experimental pipeline named Modularized Design Space for Self-Attentive Recommender (ModSAR), supporting adaptive hyperparameter tuning, modularized design space and Huggingface plug-ins. We invite the recommendation community to utilize/contribute to ModSAR to (1) conduct more module/model-level examining beyond AE/AR comparison and (2) accelerate state-of-the-art model design. Lastly, we shed light on future design choices for performant self-attentive recommenders. We make our pipeline implementation and data available at https://github.com/yueqirex/SAR-Check.
AB - In the context of sequential recommendation, a pivotal issue pertains to the comparative analysis between bi-directional/auto-encoding (AE) and uni-directional/auto-regressive (AR) attention mechanisms, where the conclusions regarding architectural and performance superiority remain inconclusive. Previous efforts in such comparisons primarily involve summarizing existing works to identify a consensus or conducting ablation studies on peripheral modeling techniques, such as choices of loss functions. However, far fewer efforts have been made in (1) theoretical and (2) extensive empirical analysis of the self-attention module, the very pivotal structure on which performance and designing insights should be anchored. In this work, we first provide a comprehensive theoretical analysis of AE/AR attention matrix in the aspect of (1) sparse local inductive bias, a.k.a neighborhood effects, and (2) low rank approximation. Analytical metrics reveal that the AR attention exhibits sparse neighborhood effects suitable for generally sparse recommendation scenarios. Secondly, to support our theoretical analysis, we conduct extensive empirical experiments on comparing AE/AR attention on five popular benchmarks with AR performing better overall. Empirical results reported are based on our experimental pipeline named Modularized Design Space for Self-Attentive Recommender (ModSAR), supporting adaptive hyperparameter tuning, modularized design space and Huggingface plug-ins. We invite the recommendation community to utilize/contribute to ModSAR to (1) conduct more module/model-level examining beyond AE/AR comparison and (2) accelerate state-of-the-art model design. Lastly, we shed light on future design choices for performant self-attentive recommenders. We make our pipeline implementation and data available at https://github.com/yueqirex/SAR-Check.
KW - Auto-Encoding
KW - Auto-Regression
KW - BERT4Rec
KW - Matrix Analysis
KW - SASRec
KW - Self-Attention
KW - Sequential Recommendation
UR - http://www.scopus.com/inward/record.url?scp=105001670872&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105001670872&partnerID=8YFLogxK
U2 - 10.1145/3701551.3703587
DO - 10.1145/3701551.3703587
M3 - Conference contribution
AN - SCOPUS:105001670872
T3 - WSDM 2025 - Proceedings of the 18th ACM International Conference on Web Search and Data Mining
SP - 688
EP - 696
BT - WSDM 2025 - Proceedings of the 18th ACM International Conference on Web Search and Data Mining
PB - Association for Computing Machinery
T2 - 18th ACM International Conference on Web Search and Data Mining, WSDM 2025
Y2 - 10 March 2025 through 14 March 2025
ER -