Control intensive scalar programs pose a very different challenge to highly pipelined supercomputers than vectorizable numeric applications. Function call/return and branch instructions disrupt the flow of instructions through the pipeline, degrading the utilization of the pipelined datapaths. This paper describes control flow optimization for scalar processing using an optimizing compiler. To obtain program control flow information, a system independent profiler has been integrated into the IMPACT-IC compiler. The control flow information obtained is converted into a weighted control graph. Based on the weighted control graph, function inline expansion, multi-way branch layout, and software branch prediction can be implemented. Using better compiler technology results in a very low cost hardware control unit (architecture) for high performance scalar processors.