Scheduling tasks using the three-phase execution model (load-execute-unload) can effectively reduce the contention on shared resources in real-time systems. Due to system and program constraints, a task is generally segmented and executed over multiple intervals. Several works showed that co-scheduling memory (unload-load) and computation phases can improve the system schedulability by hiding the memory transfer time. However, this is limited to segments of different tasks and hence executing segments of the same task back-to-back is not allowed. In this paper, we propose a new streaming model to allow overlapping the memory and execution phases of segments of the same task. This is accomplished by a segmentation framework implemented within an LLVM-based compiler-level tool along with a Real-Time Operating System (RTOS) API to handle load/unload requests. Memory phases are processed by a DMA engine that loads/unloads the task content into ScratchPad Memory (SPM). We provide a schedulability analysis of the proposed model under fixed priority partitioned scheme and an RTOS implementation of the API on a latest-generation Multiprocessor System-on-Chip (MPSoC).