TY - GEN
T1 - R-storm
T2 - 16th International Middleware Conference, Middleware 2015
AU - Peng, Boyang
AU - Hosseini, Mohammad
AU - Hong, Zhihao
AU - Farivar, Reza
AU - Campbell, Roy
PY - 2015/11/24
Y1 - 2015/11/24
N2 - The era of big data has led to the emergence of new systems for real-time distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in in- dustry today. However, Storm, like many other stream pro- cessing systems lacks an intelligent scheduling mechanism. The default round-robin scheduling currently deployed in Storm disregards resource demands and availability, and can therefore be inefficient at times. We present R-Storm (Resource-Aware Storm), a system that implements resource- aware scheduling within Storm. R-Storm is designed to in- crease overall throughput by maximizing resource utilization while minimizing network latency. When scheduling tasks, R-Storm can satisfy both soft and hard resource constraints as well as minimizing network distance between components that communicate with each other. We evaluate R-Storm on set of micro-benchmark Storm applications as well as Storm applications used in production at Yahoo! Inc. From our experimental results we conclude that R-Storm achieves 30-47% higher throughput and 69-350% better CPU utiliza- tion than default Storm for the micro-benchmarks. For the Yahoo! Storm applications, R-Storm outperforms default Storm by around 50% based on overall throughput. We also demonstrate that R-Storm performs much better when scheduling multiple Storm applications than default Storm.
AB - The era of big data has led to the emergence of new systems for real-time distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in in- dustry today. However, Storm, like many other stream pro- cessing systems lacks an intelligent scheduling mechanism. The default round-robin scheduling currently deployed in Storm disregards resource demands and availability, and can therefore be inefficient at times. We present R-Storm (Resource-Aware Storm), a system that implements resource- aware scheduling within Storm. R-Storm is designed to in- crease overall throughput by maximizing resource utilization while minimizing network latency. When scheduling tasks, R-Storm can satisfy both soft and hard resource constraints as well as minimizing network distance between components that communicate with each other. We evaluate R-Storm on set of micro-benchmark Storm applications as well as Storm applications used in production at Yahoo! Inc. From our experimental results we conclude that R-Storm achieves 30-47% higher throughput and 69-350% better CPU utiliza- tion than default Storm for the micro-benchmarks. For the Yahoo! Storm applications, R-Storm outperforms default Storm by around 50% based on overall throughput. We also demonstrate that R-Storm performs much better when scheduling multiple Storm applications than default Storm.
KW - Resource-aware scheduling
KW - Storm
KW - Stream
UR - http://www.scopus.com/inward/record.url?scp=84967211111&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84967211111&partnerID=8YFLogxK
U2 - 10.1145/2814576.2814808
DO - 10.1145/2814576.2814808
M3 - Conference contribution
AN - SCOPUS:84967211111
T3 - Middleware 2015 - Proceedings of the 16th Annual Middleware Conference
SP - 149
EP - 161
BT - Middleware 2015 - Proceedings of the 16th Annual Middleware Conference
PB - Association for Computing Machinery, Inc
Y2 - 7 December 2015 through 11 December 2015
ER -