A processor laid out vertically in stacked layers can benefit from reduced wire delays, low energy consumption, and a small footprint. Such a design can be enabled by Monolithic 3D (M3D), a technology that provides short wire lengths, good thermal properties, and high integration. In current M3D technology, due to manufacturing constraints, the layers in the stack are asymmetric: the bottom-most one has a relatively higher performance. In this paper, we examine how to partition a processor for M3D. We partition logic and storage structures into two layers, taking into account that the top layer has lower-performance transistors. In logic structures, we place the critical paths in the bottom layer. In storage structures, we partition the hardware unequally, assigning to the top layer fewer ports with larger access transistors, or a shorter bitcell subarray with larger bitcells. We find that, with conservative assumptions on M3D technology, an M3D core executes applications on average 25% faster than a 2D core, while consuming 39% less energy. With aggressive technology assumptions, the M3D core performs even better: it is on average 38% faster than a 2D core and consumes 41% less energy. Further, under a similar power budget, an M3D multicore can use twice as many cores as a 2D multicore, executing applications on average 92% faster with 39% less energy. Finally, an M3D core is thermally efficient.