TY - GEN
T1 - Decouple and Decompose
T2 - 19th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2025
AU - Xu, Zhiying
AU - Yu, Minlan
AU - Yan, Francis Y.
N1 - We thank our anonymous reviewers and shepherd for their insightful comments. We also thank Yang Zhou, Justin Chiu, Alexander Rush, and Victor Bahl for their valuable feedback and support. We extend special thanks to Suvan Chatakondu for enhancing our code after the paper's acceptance. This work is supported in part by ACE, one of the seven centers in JUMP 2.0, a Semiconductor Research Corporation (SRC) program sponsored by DARPA. This work is also supported by National Science Foundation grant CCF-2326605 (Expedition in Computing).
PY - 2025
Y1 - 2025
N2 - Efficient resource allocation is essential in cloud systems to facilitate resource sharing among tenants. However, the growing scale of these optimization problems have outpaced commercial solvers commonly employed in production. To accelerate resource allocation, prior approaches either customize solutions for narrow domains or impose workload-specific assumptions. In this work, we revisit real-world resource allocation problems and uncover a common underlying structure: the vast majority of these problems are inherently separable, i.e., they optimize the aggregate utility of individual resource and demand allocations, under separate constraints for each resource and each demand. Building on this observation, we develop DEDE, a scalable and theoretically rooted optimization framework for large-scale resource allocation. At the core of DEDE is a decouple-and-decompose approach: it decouples entangled resource and demand constraints and thereby decomposes the overall optimization into alternating per-resource and per-demand subproblems that can be solved efficiently and in parallel. We have implemented and released DEDE as a Python package with a familiar modeling interface. Our experiments on three representative resource allocation tasks-cluster scheduling, traffic engineering, and load balancing-demonstrate that DEDE delivers significant speedups while generating higher-quality allocations.
AB - Efficient resource allocation is essential in cloud systems to facilitate resource sharing among tenants. However, the growing scale of these optimization problems have outpaced commercial solvers commonly employed in production. To accelerate resource allocation, prior approaches either customize solutions for narrow domains or impose workload-specific assumptions. In this work, we revisit real-world resource allocation problems and uncover a common underlying structure: the vast majority of these problems are inherently separable, i.e., they optimize the aggregate utility of individual resource and demand allocations, under separate constraints for each resource and each demand. Building on this observation, we develop DEDE, a scalable and theoretically rooted optimization framework for large-scale resource allocation. At the core of DEDE is a decouple-and-decompose approach: it decouples entangled resource and demand constraints and thereby decomposes the overall optimization into alternating per-resource and per-demand subproblems that can be solved efficiently and in parallel. We have implemented and released DEDE as a Python package with a familiar modeling interface. Our experiments on three representative resource allocation tasks-cluster scheduling, traffic engineering, and load balancing-demonstrate that DEDE delivers significant speedups while generating higher-quality allocations.
UR - https://www.scopus.com/pages/publications/105011593412
UR - https://www.scopus.com/pages/publications/105011593412#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:105011593412
T3 - Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2025
SP - 393
EP - 409
BT - Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2025
PB - USENIX Association
Y2 - 7 July 2025 through 9 July 2025
ER -