Spatio-temporal action localization of human actions in a video has been a popular topic over the past few years. It tries to localize the bounding boxes, the time span and the class of one action, which summarizes information in the video and helps humans understand it. Though many approaches have been proposed to solve this problem, these efforts have only focused on perspective videos. Unfortunately, perspective videos only cover a small field-of-view (FOV), which limits the capability of action localization. In this paper, we develop a comprehensive approach to real-time spatio-temporallocalization that can be used to detect actions in 360 videos. We create two datasets named UCF-101-24-360 and JHMDB-21-360 for our evaluation. Our experiments show that our method consistently outperforms other competing approaches and achieves a real-time processing speed of 15fps for 360 videos.