This paper investigates the performance and scalability of a shared-memory multiprocessor - the CEDAR supercomputer - in multiprogrammed environments. For the 4-cluster Cedar system, the overhead due to multiprogramming is 115% for the fine-grain loop parallel applications considered. There is no appreciable variation in the overhead as the multiprogramming level is increased. We find that while a single application executing on a dedicated system achieves significant speedups as the system is scaled up, in multiprogrammed environments however, there is no performance improvement with scaling. The interplay of two factors - unequal distribution of parallel work among the tasks and increased waiting times at the barriers - is found to be the major cause of the performance loss and poor scalability in multiuser environments. We find that the barrier wait time increases sharply as we go from a dedicated environment to one where 2 applications are multiprogrammed. However, the barrier wait time decreases as the multiprogramming level is increased further, due to the increasing inequality in the parallel work distribution.