TY - GEN
T1 - Transactional Python for Durable Machine Learning
T2 - 7th Workshop on Data Management for End-To-End Machine Learning, DEEM 2023
AU - Chockchowwat, Supawit
AU - Li, Zhaoheng
AU - Park, Yongjoo
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/6/18
Y1 - 2023/6/18
N2 - In machine learning (ML), Python serves as a convenient abstraction for working with key libraries such as PyTorch, scikit-learn, and others. Unlike DBMS, however, Python applications may lose important data, such as trained models and extracted features, due to machine failures or human errors, leading to a waste of time and resources. Specifically, they lack four essential properties that could make ML more reliable and user-friendly - -durability, atomicity, replicability, and time-versioning (DART).This paper presents our vision of Transactional Python that provides DART without any code modifications to user programs or the Python kernel, by non-intrusively monitoring application states at the object level and determining a minimal amount of information sufficient to reconstruct a whole application. Our evaluation of a proof-of-concept implementation with public PyTorch and scikit-learn applications shows that DART can be offered with overheads ranging 1.5% - 15.6%.
AB - In machine learning (ML), Python serves as a convenient abstraction for working with key libraries such as PyTorch, scikit-learn, and others. Unlike DBMS, however, Python applications may lose important data, such as trained models and extracted features, due to machine failures or human errors, leading to a waste of time and resources. Specifically, they lack four essential properties that could make ML more reliable and user-friendly - -durability, atomicity, replicability, and time-versioning (DART).This paper presents our vision of Transactional Python that provides DART without any code modifications to user programs or the Python kernel, by non-intrusively monitoring application states at the object level and determining a minimal amount of information sufficient to reconstruct a whole application. Our evaluation of a proof-of-concept implementation with public PyTorch and scikit-learn applications shows that DART can be offered with overheads ranging 1.5% - 15.6%.
UR - http://www.scopus.com/inward/record.url?scp=85168360163&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85168360163&partnerID=8YFLogxK
U2 - 10.1145/3595360.3595855
DO - 10.1145/3595360.3595855
M3 - Conference contribution
AN - SCOPUS:85168360163
T3 - Proceedings of the 7th Workshop on Data Management for End-To-End Machine Learning, DEEM 2023
BT - Proceedings of the 7th Workshop on Data Management for End-To-End Machine Learning, DEEM 2023
PB - Association for Computing Machinery
Y2 - 18 June 2023
ER -