Recently-proposed processor microarchitectures for high Memory Level Parallelism (MLP) promise substantial performance gains. Unfortunately, current cache hierarchies have Miss-Handling Architectures (MHAs) that are too limited to support the required MLP - they need to be redesigned to support 1-2 orders of magnitude more outstanding misses. Yet, designing scalable MHAs is challenging: designs must minimize cache lock-up time and deliver high bandwidth while keeping the area consumption reasonable. This paper presents a novel scalable MHA design for high-MLP processors. Our design introduces two main innovations. First, it is hierarchical, with a small MSHR file per cache bank, and a larger MSHR file shared by all banks. Second, it uses a Bloom filter to reduce searches in the larger MSHR file. The result is a high-performance, area-efficient design. Compared to a state-of-the-art MHA on a high-MLP processor, our design speeds-up some SPECint, SPECfp, and multiprogrammed workloads by a geometric mean of 32%, 50%, and 95%, respectively. Moreover, compared to two extrapolations of current MHA designs, namely a large monolithic MSHR file and a large banked MSHR file, all consuming the same area, our design speeds-up the workloads by a geometric mean of 1-18% and 10-21%, respectively. Finally, our design performs very close to an unlimited-size, ideal MHA.