TY - GEN
T1 - Modeling task relationships in multi-task learning with multi-gate mixture-of-experts
AU - Ma, Jiaqi
AU - Zhao, Zhe
AU - Yi, Xinyang
AU - Chen, Jilin
AU - Hong, Lichan
AU - Chi, Ed H.
N1 - Publisher Copyright:
© 2018 Copyright held by the owner/author(s). Publication rights licensed to Association for Computing Machinery.
PY - 2018/7/19
Y1 - 2018/7/19
N2 - Neural-based multi-task learning has been successfully used in many real-world large-scale applications such as recommendation systems. For example, in movie recommendations, beyond providing users movies which they tend to purchase and watch, the system might also optimize for users liking the movies afterwards. With multi-task learning, we aim to build a single model that learns these multiple goals and tasks simultaneously. However, the prediction quality of commonly used multi-task models is often sensitive to the relationships between tasks. It is therefore important to study the modeling tradeos between task-specic objectives and inter-task relationships. In this work, we propose a novel multi-task learning approach, Multi-gate Mixture-of-Experts (MMoE), which explicitly learns to model task relationships from data. We adapt the Mixture-of-Experts (MoE) structure to multi-task learning by sharing the expert submodels across all tasks, while also having a gating network trained to optimize each task. To validate our approach on data with dierent levels of task relatedness, we rst apply it to a synthetic dataset where we control the task relatedness. We show that the proposed approach performs better than baseline methods when the tasks are less related. We also show that the MMoE structure results in an additional trainability benet, depending on dierent levels of randomness in the training data and model initialization. Furthermore, we demonstrate the performance improvements by MMoE on real tasks including a binary classication benchmark, and a large-scale content recommendation system at Google.
AB - Neural-based multi-task learning has been successfully used in many real-world large-scale applications such as recommendation systems. For example, in movie recommendations, beyond providing users movies which they tend to purchase and watch, the system might also optimize for users liking the movies afterwards. With multi-task learning, we aim to build a single model that learns these multiple goals and tasks simultaneously. However, the prediction quality of commonly used multi-task models is often sensitive to the relationships between tasks. It is therefore important to study the modeling tradeos between task-specic objectives and inter-task relationships. In this work, we propose a novel multi-task learning approach, Multi-gate Mixture-of-Experts (MMoE), which explicitly learns to model task relationships from data. We adapt the Mixture-of-Experts (MoE) structure to multi-task learning by sharing the expert submodels across all tasks, while also having a gating network trained to optimize each task. To validate our approach on data with dierent levels of task relatedness, we rst apply it to a synthetic dataset where we control the task relatedness. We show that the proposed approach performs better than baseline methods when the tasks are less related. We also show that the MMoE structure results in an additional trainability benet, depending on dierent levels of randomness in the training data and model initialization. Furthermore, we demonstrate the performance improvements by MMoE on real tasks including a binary classication benchmark, and a large-scale content recommendation system at Google.
KW - Mixture of experts
KW - Multi-task learning
KW - Neural network
KW - Recommendation system
UR - http://www.scopus.com/inward/record.url?scp=85051473411&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051473411&partnerID=8YFLogxK
U2 - 10.1145/3219819.3220007
DO - 10.1145/3219819.3220007
M3 - Conference contribution
AN - SCOPUS:85051473411
SN - 9781450355520
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1930
EP - 1939
BT - KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018
Y2 - 19 August 2018 through 23 August 2018
ER -