The increasing availability of access to large-scale computing clusters -for example via the cloud -is changing the way scientific research can be conducted, enabling experiments of a scale and scope that would have been inconceivable several years ago. An ambitious data scientist today can carry out projects involving several million CPU hours. In the near future, we anticipate a typical Ph.D. in computational science may be expected or even required to offer findings based on at least 1 million CPU hours of computations. The massive scale of these soon-to-be-upon-us computational experiments demands that we change how we organize our experimental practices. Traditionally, and still the dominant paradigm today, the end-to-end process of experiment design and execution involves a significant amount of manual intervention and situational tweaking, cutting and pasting, and the use of disparate disconnected tools, much of which is undocumented and easily lost. This makes it difficult to detect and understand possible failure points in the computational workflow, making it virtually impossible to correct, let alone simply rerun the experiment. This is an amazing state of affairs, considering the ubiquity of error in scientific computation and in research generally. Following such unstructured and undocumented research practices limits the ability of the researcher to exploit cluster and cloud-based paradigms, as each increase in scale under the dominant paradigm is likely to lead to ever more errors and misunderstandings. A better paradigm will integrate the design of large experiments seamlessly with job management, output harvesting, data analysis, reporting, and publication of code and data. In particular such a paradigm would submerge the details of all the processing, harvesting, and management while exposing transparently the description of the discovery process itself, including details such as the parameter space exploration. Reproducing any job would be a push-button affair, and creating a new experiment from a previous one might involve only changes of a line or two of code followed again by push-button execution and reporting. Even though such experiments would be operating at a much greater scale than today, under such a paradigm they would be easier to conduct, obtain a lower error rate, and offer a much greater opportunity for 'outsiders' to understand the results. In this article, we discuss the challenges of massive computational experimentation and present a taxonomy of some of the desiderata which such paradigms should offer. We then present ClusterJob (CJ), an efficient computing environment that we and other researchers have used to conduct and share million-CPU-hour experiments in a painless and reproducible way.