This paper explores the performance of decentralized reinforcement learning agents with communication capabilities for the operation of traffic signals in an oversaturated network. An explicit coordinating mechanism is implemented as part of the reward structure of the agent using the max-plus algorithm, aiming at improving the network-wide performance. Results from a simulated network with realistic features showed that Q-learning agents could process a greater number of vehicles than optimized signal timings from state-of-practice simulation software TRANSYT7F, even under varying oversaturation conditions. The effect of adding the max-plus algorithm was limited, but towards improved performance in terms of both total throughput and reduced number of stops per vehicle. Ongoing research evaluates potential conditions where the coordination should be emphasized to further enhance results, as well as alternative implementations of the max-plus algorithm.