A number of optimistic synchronization schemes for parallel simulation rely upon a global synchronization. The problem is to determine when every processor has completed all its work, and there are no messages in transit in the system that will cause more work. Most previous solutions to the problem have used distributed termination algorithms, which are inherently serial; other parallel mechanisms may be inefficient. In this paper we describe an efficient parallel algorithm derived from a common `barrier' synchronization algorithm used in parallel processing. The algorithm's principle attraction is speed, and generality - it is designed to be used in contexts more general than parallel discrete-event simulation. To establish our claim to speed, we compare our algorithm's performance with the standard barrier algorithm, and find that its additional costs are not excessive. Our experiments are conducted using up to 256 processors on the Intel Touchstone Delta.