TY - GEN
T1 - Scalable termination detection for distributed actor systems
AU - Plyukhin, Dan
AU - Agha, Gul
N1 - Funding Information:
Funding This work was supported in part by the National Science Foundation under Grant No. SHF 1617401, and in part by the Laboratory Directed Research and Development program at Sandia National Laboratories, a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
Publisher Copyright:
© Dan Plyukhin and Gul Agha; licensed under Creative Commons License CC-BY 31st International Conference on Concurrency Theory (CONCUR 2020).
PY - 2020/8/1
Y1 - 2020/8/1
N2 - Automatic garbage collection (GC) prevents certain kinds of bugs and reduces programming overhead. GC techniques for sequential programs are based on reachability analysis. However, testing reachability from a root set is inadequate for determining whether an actor is garbage because an unreachable actor may send a message to a reachable actor. Instead, it is sufficient to check termination (sometimes also called quiescence): an actor is terminated if it is not currently processing a message and cannot receive a message in the future. Moreover, many actor frameworks provide all actors with access to file I/O or external storage; without inspecting an actor's internal code, it is necessary to check that the actor has terminated to ensure that it may be garbage collected in these frameworks. Previous algorithms to detect actor garbage require coordination mechanisms such as causal message delivery or nonlocal monitoring of actors for mutation. Such coordination mechanisms adversely affect concurrency and are therefore expensive in distributed systems. We present a low-overhead reference listing technique (called DRL) for termination detection in actor systems. DRL is based on asynchronous local snapshots and message-passing between actors. This enables a decentralized implementation and transient network partition tolerance. The paper provides a formal description of DRL, shows that all actors identified as garbage have indeed terminated (safety), and that all terminated actors-under certain reasonable assumptions-will eventually be identified (liveness).
AB - Automatic garbage collection (GC) prevents certain kinds of bugs and reduces programming overhead. GC techniques for sequential programs are based on reachability analysis. However, testing reachability from a root set is inadequate for determining whether an actor is garbage because an unreachable actor may send a message to a reachable actor. Instead, it is sufficient to check termination (sometimes also called quiescence): an actor is terminated if it is not currently processing a message and cannot receive a message in the future. Moreover, many actor frameworks provide all actors with access to file I/O or external storage; without inspecting an actor's internal code, it is necessary to check that the actor has terminated to ensure that it may be garbage collected in these frameworks. Previous algorithms to detect actor garbage require coordination mechanisms such as causal message delivery or nonlocal monitoring of actors for mutation. Such coordination mechanisms adversely affect concurrency and are therefore expensive in distributed systems. We present a low-overhead reference listing technique (called DRL) for termination detection in actor systems. DRL is based on asynchronous local snapshots and message-passing between actors. This enables a decentralized implementation and transient network partition tolerance. The paper provides a formal description of DRL, shows that all actors identified as garbage have indeed terminated (safety), and that all terminated actors-under certain reasonable assumptions-will eventually be identified (liveness).
KW - Actors
KW - Concurrency
KW - Distributed systems
KW - Garbage collection
KW - Quiescence detection
KW - Termination detection
UR - http://www.scopus.com/inward/record.url?scp=85091560461&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091560461&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.CONCUR.2020.11
DO - 10.4230/LIPIcs.CONCUR.2020.11
M3 - Conference contribution
AN - SCOPUS:85091560461
T3 - Leibniz International Proceedings in Informatics, LIPIcs
SP - 111
EP - 1123
BT - 31st International Conference on Concurrency Theory, CONCUR 2020
A2 - Konnov, Igor
A2 - Kovacs, Laura
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 31st International Conference on Concurrency Theory, CONCUR 2020
Y2 - 1 September 2020 through 4 September 2020
ER -