This paper presents a resilient model-free reinforcement learning solution to linear quadratic regulator control of cyber-physical systems under sensor attacks. To guarantee resiliency to sensor attacks, a sparse least-squares optimization is introduced to solve the Bellman equation. While the Bellman equation does not involve any dynamics, it implicitly solves a Lyapunov equation which depends on the system dynamics. Thus, if the data are corrupted and do not follow the dynamics, that causes an error in the Bellman equation. Therefore, assuming a strong system observability, i.e., s-sparse observability, the proposed sparse optimization assures that the data from compromised sensors that lead to a sizable error in the Bellman equation have no effect in reconstructing the state of the system, and, thus on evaluation of the policy. That is, only sensory outputs that result in a small error in the Bellman equation affect the policy evaluation. Once the optimal control policy is found, it can be applied to the system, until a surprise signal depending on the Bellman error is activated to indicate a change caused by a new attack or a change in the system dynamics.