TY - GEN
T1 - Enhancing Neural Adaptive Wireless Video Streaming via Lower-Layer Information Exposure
AU - Zhao, Lingzhi
AU - Cui, Ying
AU - Jia, Yuhang
AU - Zhang, Yunfei
AU - Nahrstedt, Klara
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Deep reinforcement learning (DRL) demonstrates its promising potential in the realm of adaptive video streaming. However, existing DRL-based methods for adaptive video streaming use only application (APP) layer information and adopt heuristic training methods. This paper aims to boost the quality of experience (QoE) of adaptive wireless video streaming by using lower-layer information and deriving a rigorous training method. First, we formulate a more comprehensive and accurate adaptive wireless video streaming problem as an infinite stage discounted Markov decision process (MDP) problem by additionally incorporating past and lower-layer information, allowing a flexible tradeoff between QoE and computational and memory costs for solving the problem. Then, we propose an enhanced asynchronous advantage actor-critic (eA3C) method by jointly optimizing the parameters of parameterized policy and value function. Specifically, we build an eA3C network consisting of a policy network and a value network that can utilize cross-layer, past, and current information and jointly train the eA3C network using pre-collected samples. Finally, experimental results show that the proposed eA3C method can improve the QoE by 6.8% ∼ 14.4% compared to the state-of-the-arts.
AB - Deep reinforcement learning (DRL) demonstrates its promising potential in the realm of adaptive video streaming. However, existing DRL-based methods for adaptive video streaming use only application (APP) layer information and adopt heuristic training methods. This paper aims to boost the quality of experience (QoE) of adaptive wireless video streaming by using lower-layer information and deriving a rigorous training method. First, we formulate a more comprehensive and accurate adaptive wireless video streaming problem as an infinite stage discounted Markov decision process (MDP) problem by additionally incorporating past and lower-layer information, allowing a flexible tradeoff between QoE and computational and memory costs for solving the problem. Then, we propose an enhanced asynchronous advantage actor-critic (eA3C) method by jointly optimizing the parameters of parameterized policy and value function. Specifically, we build an eA3C network consisting of a policy network and a value network that can utilize cross-layer, past, and current information and jointly train the eA3C network using pre-collected samples. Finally, experimental results show that the proposed eA3C method can improve the QoE by 6.8% ∼ 14.4% compared to the state-of-the-arts.
UR - http://www.scopus.com/inward/record.url?scp=85202829484&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85202829484&partnerID=8YFLogxK
U2 - 10.1109/ICC51166.2024.10622554
DO - 10.1109/ICC51166.2024.10622554
M3 - Conference contribution
AN - SCOPUS:85202829484
T3 - IEEE International Conference on Communications
SP - 3383
EP - 3388
BT - ICC 2024 - IEEE International Conference on Communications
A2 - Valenti, Matthew
A2 - Reed, David
A2 - Torres, Melissa
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 59th Annual IEEE International Conference on Communications, ICC 2024
Y2 - 9 June 2024 through 13 June 2024
ER -