TY - JOUR
T1 - funcX
T2 - Federated Function as a Service for Science
AU - Li, Zhuozhao
AU - Chard, Ryan
AU - Babuji, Yadu
AU - Galewsky, Ben
AU - Skluzacek, Tyler J.
AU - Nagaitsev, Kirill
AU - Woodard, Anna
AU - Blaiszik, Ben
AU - Bryan, Josh
AU - Katz, Daniel S.
AU - Foster, Ian
AU - Chard, Kyle
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2022/12/1
Y1 - 2022/12/1
N2 - f funcX is a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS systems, f funcX decouples the cloud-hosted management functionality from the edge-hosted execution functionality. f funcX's endpoint software can be deployed, by users or administrators, on arbitrary laptops, clouds, clusters, and supercomputers, in effect turning them into function serving systems. f funcX's cloud-hosted service provides a single location for registering, sharing, and managing both functions and endpoints. It allows for transparent, secure, and reliable function execution across the federated ecosystem of endpoints-enabling users to route functions to endpoints based on specific needs. f funcX uses containers (e.g., Docker, Singularity, and Shifter) to provide common execution environments across endpoints. f funcX implements various container management strategies to execute functions with high performance and efficiency on diverse f funcX endpoints. f funcX also integrates with an in-memory data store and Globus for managing data that may span endpoints. We motivate the need for f funcX, present our prototype design and implementation, and demonstrate, via experiments on two supercomputers, that f funcX can scale to more than 130000 concurrent workers. We show that f funcX's container warming-aware routing algorithm can reduce the completion time for 3,000 functions by up to 61% compared to a randomized algorithm and the in-memory data store can speed up data transfers by up to 3x compared to a shared file system.
AB - f funcX is a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS systems, f funcX decouples the cloud-hosted management functionality from the edge-hosted execution functionality. f funcX's endpoint software can be deployed, by users or administrators, on arbitrary laptops, clouds, clusters, and supercomputers, in effect turning them into function serving systems. f funcX's cloud-hosted service provides a single location for registering, sharing, and managing both functions and endpoints. It allows for transparent, secure, and reliable function execution across the federated ecosystem of endpoints-enabling users to route functions to endpoints based on specific needs. f funcX uses containers (e.g., Docker, Singularity, and Shifter) to provide common execution environments across endpoints. f funcX implements various container management strategies to execute functions with high performance and efficiency on diverse f funcX endpoints. f funcX also integrates with an in-memory data store and Globus for managing data that may span endpoints. We motivate the need for f funcX, present our prototype design and implementation, and demonstrate, via experiments on two supercomputers, that f funcX can scale to more than 130000 concurrent workers. We show that f funcX's container warming-aware routing algorithm can reduce the completion time for 3,000 functions by up to 61% compared to a randomized algorithm and the in-memory data store can speed up data transfers by up to 3x compared to a shared file system.
KW - Function-as-a-service
KW - cyberinfrastructure
KW - distributed computing
UR - http://www.scopus.com/inward/record.url?scp=85139402053&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139402053&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2022.3208767
DO - 10.1109/TPDS.2022.3208767
M3 - Article
AN - SCOPUS:85139402053
SN - 1045-9219
VL - 33
SP - 4948
EP - 4963
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 12
ER -