TY - GEN
T1 - Taiji
T2 - 27th ACM Symposium on Operating Systems Principles, SOSP 2019
AU - Chou, David
AU - Xu, Tianyin
AU - Veeraraghavan, Kaushik
AU - Newell, Andrew
AU - Margulis, Sonia
AU - Xiao, Lin
AU - Ruiz, Pol Mauri
AU - Meza, Justin
AU - Ha, Kiryong
AU - Padmanabha, Shruti
AU - Cole, Kevin
AU - Perelman, Dmitri
N1 - Publisher Copyright:
© 2019 Copyright held by the owner/author(s).
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019/10/27
Y1 - 2019/10/27
N2 - We present Taiji, a new system for managing user traffic for large-scale Internet services that accomplishes two goals: 1) balancing the utilization of data centers and 2) minimizing network latency of user requests. Taiji models edge-to-datacenter traffic routing as an assignment problem—assigning traffic objects at the edge to the data centers to satisfy service-level objectives. Taiji uses a constraint optimization solver to generate an optimal routing table that specifies the fractions of traffic each edge node will distribute to different data centers. Taiji continuously adjusts the routing table to accommodate the dynamics of user traffic and failure events that reduce capacity. Taiji leverages connections among users to selectively route traffic of highly-connected users to the same data centers based on fractions in the routing table. This routing strategy, which we term connection-aware routing, allows us to reduce query load on our backend storage by 17%. Taiji has been used in production at Facebook for more than four years and routes global traffic in a user-aware manner for several large-scale product services across dozens of edge nodes and data centers.
AB - We present Taiji, a new system for managing user traffic for large-scale Internet services that accomplishes two goals: 1) balancing the utilization of data centers and 2) minimizing network latency of user requests. Taiji models edge-to-datacenter traffic routing as an assignment problem—assigning traffic objects at the edge to the data centers to satisfy service-level objectives. Taiji uses a constraint optimization solver to generate an optimal routing table that specifies the fractions of traffic each edge node will distribute to different data centers. Taiji continuously adjusts the routing table to accommodate the dynamics of user traffic and failure events that reduce capacity. Taiji leverages connections among users to selectively route traffic of highly-connected users to the same data centers based on fractions in the routing table. This routing strategy, which we term connection-aware routing, allows us to reduce query load on our backend storage by 17%. Taiji has been used in production at Facebook for more than four years and routes global traffic in a user-aware manner for several large-scale product services across dozens of edge nodes and data centers.
UR - http://www.scopus.com/inward/record.url?scp=85076763406&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076763406&partnerID=8YFLogxK
U2 - 10.1145/3341301.3359655
DO - 10.1145/3341301.3359655
M3 - Conference contribution
AN - SCOPUS:85076763406
T3 - SOSP 2019 - Proceedings of the 27th ACM Symposium on Operating Systems Principles
SP - 430
EP - 446
BT - SOSP 2019 - Proceedings of the 27th ACM Symposium on Operating Systems Principles
PB - Association for Computing Machinery, Inc
Y2 - 27 October 2019 through 30 October 2019
ER -