Train delay is a critical problem in railroad operations, which has led to the development of analytical and simulation based approaches to estimate it. With the recent advances in sensing and communication technologies, train positioning information is now available to support new data driven methods for train delay estimation. In this work, two data driven approaches are proposed to estimate train delays based on historical and real time information. A historical regression model is proposed to estimate future train delays at each station using only past performance of the train along the route. Next, several variations of an online regression model are proposed to estimate delay using delay information of the trains at earlier stations along the current trip, as well as delay information of other trains that share the same corridor. The proposed methods are tested with data collected on 282 Amtrak trains (the largest US passenger railroad service) from 2011 to 2013, which consists of more than 100,000 train trips. Compared to prediction based on the scheduled time table, the proposed historical regression model improves the RMSE estimate of delay by 12%, while the online proposed model improves the RMSE estimate of delay by 60%.