The coming generation of supercomputing architectures will require fundamental changes in programming models to effectively make use of the expected million to billion way concurrency and thousand-fold reduction in per-core memory. Most current parallel analysis and visualization tools achieve scalability by partitioning the data, either spatially or temporally, and running serial computational kernels on each data partition, using message passing as needed. These techniques lack the necessary level of data parallelism to execute effectively on the underlying hardware. This paper introduces a framework that enables the expression of analysis and visualization algorithms with memory-efficient execution in a hybrid distributed and data parallel manner on both multi-core and many-core processors. We demonstrate results on scientific data using CPUs and GPUs in scalable heterogeneous systems.