TY - GEN
T1 - Streaming satellite data to cloud workflows for on-demand computing of environmental data products
AU - Zinn, Daniel
AU - Hart, Quinn
AU - Ludäscher, Bertram
AU - Simmhan, Yogesh
PY - 2010
Y1 - 2010
N2 - Environmental data arriving constantly from satellites and weather stations are used to compute weather coefficients that are essential for agriculture and viticulture. For example, the reference evapotranspiration (ET0) coefficient, overlaid on regional maps, is provided each day by the California Department of Water Resources to local farmers and turf managers to plan daily water use. Scaling out single-processor compute/data intensive applications operating on realtime data to support more users and higher-resolution data poses data engineering challenges. Cloud computing helps data providers expand resource capacity to meet growing needs besides supporting scientific needs like reprocessing historic data using new models. In this article, we examine migration of a legacy script used for daily ET0 computation by CIMIS to a workflow model that eases deployment to and scaling on the Windows Azure Cloud. Our architecture incorporates a direct streaming model into Cloud virtual machines (VMs) that improves the performance by 130% to 160% for our workflow over using Cloud storage for data staging, used commonly. The streaming workflows achieve runtimes comparable to desktop execution for single VMs and a linear speed-up when using multiple VMs, thus allowing computation of environmental coefficients at a much larger resolution than done presently.
AB - Environmental data arriving constantly from satellites and weather stations are used to compute weather coefficients that are essential for agriculture and viticulture. For example, the reference evapotranspiration (ET0) coefficient, overlaid on regional maps, is provided each day by the California Department of Water Resources to local farmers and turf managers to plan daily water use. Scaling out single-processor compute/data intensive applications operating on realtime data to support more users and higher-resolution data poses data engineering challenges. Cloud computing helps data providers expand resource capacity to meet growing needs besides supporting scientific needs like reprocessing historic data using new models. In this article, we examine migration of a legacy script used for daily ET0 computation by CIMIS to a workflow model that eases deployment to and scaling on the Windows Azure Cloud. Our architecture incorporates a direct streaming model into Cloud virtual machines (VMs) that improves the performance by 130% to 160% for our workflow over using Cloud storage for data staging, used commonly. The streaming workflows achieve runtimes comparable to desktop execution for single VMs and a linear speed-up when using multiple VMs, thus allowing computation of environmental coefficients at a much larger resolution than done presently.
UR - http://www.scopus.com/inward/record.url?scp=78751491053&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78751491053&partnerID=8YFLogxK
U2 - 10.1109/WORKS.2010.5671841
DO - 10.1109/WORKS.2010.5671841
M3 - Conference contribution
AN - SCOPUS:78751491053
SN - 9781424489893
T3 - 2010 5th Workshop on Workflows in Support of Large-Scale Science, WORKS 2010
BT - 2010 5th Workshop on Workflows in Support of Large-Scale Science, WORKS 2010
T2 - 2010 5th Workshop on Workflows in Support of Large-Scale Science, WORKS 2010
Y2 - 14 November 2010 through 14 November 2010
ER -