The most straightforward methodology for designing a multi-core architecture is to replicate an off-the-shelf core design multiple times, and then connect the cores together using an interconnect mechanism. However, this methodology is "multi-core oblivious" as subsystems are designed/optimized unaware of the overall chip-multiprocessing system they would become parts of. The chapter demonstrates that this methodology is inefficient in terms of area/power. It recommends a holistic approach where the subsystems are designed from the ground up to be effective components of a complete system. The inefficiency in "multi-core oblivious" designs comes from many sources. Having multiple replicated cores results in an inability to adapt to the demands of execution workloads, and results in either underutilization or overutilization of processor resources. Single-ISA (instruction-set architecture) heterogeneous multi-core architectures host cores of varying power/performance characteristics on the die, but all cores are capable of running the same ISA. Such a processor can result in significant power savings and performance improvements if the applications are mapped to cores judiciously. The paper also presents holistic design methodologies for such architectures. Another source of inefficiency is blind replication of over-provisioned hardware structures. Conjoined-core chip multiprocessing allows adjacent cores of a multi-core architecture to share some resources. This can result in significant area savings with little performance degradation. Yet another source of inefficiency is the interconnect. The interconnection overheads can be very significant for a "multi-core oblivious" multi-core design-especially as the number of cores increases and the pipelines get deeper. The paper demonstrates the need to co-design the cores, the memory and the interconnection to address the inefficiency problem, and also makes several suggestions regarding co-design.