State space search problems abound in the artificial intelligence, planning and optimization literature. Solving such problems is generally NP-hard. Therefore, a brute-force approach to state space search must be employed. It is instructive to solve them on large parallel machines with significant computational power. However, writing efficient and scalable parallel programs has traditionally been a challenging undertaking. In this paper, we analyze several performance characteristics common to all parallel state space search applications. In particular, we focus on the issues of grain size, the prioritized execution of tasks and the balancing of load among processors in the system. We demonstrate the techniques that are used to scale such applications to large scale. We have incorporated these techniques into a general search engine framework that is designed to solve a broad class of state space search problems. We demonstrate the efficiency and scalability of our design using three example applications, and present scaling results up to 16,384 processors.