TY - GEN
T1 - New algorithms for planning bulk transfer via internet and shipping networks
AU - Cho, Brian
AU - Gupta, Indranil
PY - 2010/8/27
Y1 - 2010/8/27
N2 - Cloud computing is enabling groups of academic collaborators, groups of business partners, etc., to come together in an ad-hoc manner. This paper focuses on the group-based data transfer problem in such settings. Each participant source site in such a group has a large dataset, which may range in size from gigabytes to terabytes. This data needs to be transferred to a single sink site (e.g., AWS, Google datacenters, etc.) in a manner that reduces both total dollar costs incurred by the group as well as the total transfer latency of the collective dataset. This paper is the first to explore the problem of planning a group-based deadline-oriented data transfer in a scenario where data can be sent over both: (1) the internet, and (2) by shipping storage devices (e.g., external or hot-plug drives, or SSDs) via companies such as Fedex, UPS, USPS, etc. We first formalize the problem and prove its NP-Hardness. Then, we propose novel algorithms and use them to build a planning system called Pandora (People and Networks Moving Data Around). Pandora uses new concepts of time-expanded networks and delta-time-expanded networks, combining them with integer programming techniques and optimizations for both shipping and internet edges. Our experimental evaluation using real data from Fedex and from PlanetLab indicate the Pandora planner manages to satisfy deadlines and reduce costs significantly.
AB - Cloud computing is enabling groups of academic collaborators, groups of business partners, etc., to come together in an ad-hoc manner. This paper focuses on the group-based data transfer problem in such settings. Each participant source site in such a group has a large dataset, which may range in size from gigabytes to terabytes. This data needs to be transferred to a single sink site (e.g., AWS, Google datacenters, etc.) in a manner that reduces both total dollar costs incurred by the group as well as the total transfer latency of the collective dataset. This paper is the first to explore the problem of planning a group-based deadline-oriented data transfer in a scenario where data can be sent over both: (1) the internet, and (2) by shipping storage devices (e.g., external or hot-plug drives, or SSDs) via companies such as Fedex, UPS, USPS, etc. We first formalize the problem and prove its NP-Hardness. Then, we propose novel algorithms and use them to build a planning system called Pandora (People and Networks Moving Data Around). Pandora uses new concepts of time-expanded networks and delta-time-expanded networks, combining them with integer programming techniques and optimizations for both shipping and internet edges. Our experimental evaluation using real data from Fedex and from PlanetLab indicate the Pandora planner manages to satisfy deadlines and reduce costs significantly.
UR - http://www.scopus.com/inward/record.url?scp=77955914560&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955914560&partnerID=8YFLogxK
U2 - 10.1109/ICDCS.2010.59
DO - 10.1109/ICDCS.2010.59
M3 - Conference contribution
AN - SCOPUS:77955914560
SN - 9780769540597
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 305
EP - 314
BT - ICDCS 2010 - 2010 International Conference on Distributed Computing Systems
T2 - 30th IEEE International Conference on Distributed Computing Systems, ICDCS 2010
Y2 - 21 June 2010 through 25 June 2010
ER -