Both technology mapping and circuit clustering have a large impact on FPGA designs in terms of circuit performance, area, and power dissipation. Existing FPGA design flows carry out these two synthesis steps sequentially. Such a two-step approach cannot guarantee that the final delay of the circuit is optimal, because the quality of clustering depends significantly on the initial mapping result. To address this problem, we develop an algorithm that performs mapping and clustering simultaneously and optimally under a widely used clustering delay model. To our knowledge, our algorithm, named SMAC (simultaneous mapping and clustering) is the first delay-optimal algorithm to generate a synthesis solution that considers a combination of both steps. Compared to a synthesis flow using state-of-the-art mapping and clustering algorithms DAOmap  + T-VPACK  - SMAC achieves a 25% performance gain with a 22% area overhead under the clustering delay model. After placement and routing, SMAC is 12% better in performance.