The GPU has provide higher throughput by integrating more execution resources into a single chip without unduly compromising power efficiency. With the power wall challenge, however, increasing the throughput will require significant improvement in power efficiency. To accomplish this goal, we propose G-Scalar, a cost-effective generalized scalar execution architecture for GPUs in this paper. G-Scalar offers two key advantages over prior architectures supporting scalar execution for only non-divergent arithmetic/logic instructions. First, G-Scalar is more power-efficient as it can also support scalar execution of divergent and special-function instructions, the fraction of which in contemporary GPU applications has notably increased. Second, G-Scalar is less expensive as it can share most of its hardware resources with register value compression, of which adoption has been strongly promoted to reduce high power consumption of accessing the large register file. Compared with the baseline and previous scalar architectures, G-Scalar improves power efficiency by 24% and 15%, respectively, at a negligible cost.