Data movement between host and accelerators is one of the most challenging aspects of developing applications for heterogeneous systems. Most existing runtime systems for GPGPU programming require developers to perform data movement manually in the source code, while having to support different hardware and software environments. In this paper, we present a novel way to perform data movement for distributed applications based on the Charm ++ programming system. We extend Charm ++'s support for migration across memory address spaces to handle accelerator devices by making use of the description of data contained in Charm ++'s parallel objects. This allows the Charm ++ runtime to handle data movement automatically to a large extent, while supporting different hardware platforms transparently. This increases both developer productivity and the portability of Charm ++ applications. We demonstrate our proposal with a Charm ++ application that runs offloaded CUDA code on three different hardware platforms with a single data movement specification.