Motivated by applications in machine learning and statistics, we study distributed optimization problems over a network of processors, where the goal is to optimize a global objective composed of a sum of local functions. In these problems, due to the large scale of the data sets, the data and computation must be distributed over multiple processors resulting in the need for distributed algorithms. In this paper, we consider a popular distributed gradient-based consensus algorithm, which only requires local computation and communication. An important problem in this area is to analyze the convergence rate of such algorithms in the presence of communication delays that are inevitable in distributed systems. We prove the convergence of the gradient-based consensus algorithm in the presence of uniform, but possibly arbitrarily large, communication delays between the processors. Moreover, we obtain an upper bound on the rate of convergence of the algorithm as a function of the network size, topology, and the inter-processor communication delays.