Deep neural networks are becoming increasingly popular in mobile sensing and computing applications. Their capability of fusing multiple sensor inputs and extracting temporal relationships can enhance intelligence in a wide range of applications. One key problem however is the noisy on-device sensors, whose characters are heterogeneous and varying over time. The existing mobile deep learning frameworks usually treat every sensor input equally over time, lacking the ability of identifying and exploiting the heterogeneity of sensor noise. In this work, we propose QualityDeepSense, a deep learning framework that can automatically balance the contribution of sensor inputs over time by their sensing qualities. We propose a sensor-temporal attention mechanism to learn the dependencies among sensor inputs over time. These correlations are used to infer the qualities and reassign the contribution of sensor inputs. QualityDeepSense can thus focus on more informative sensor inputs for prediction. We demonstrate the effectiveness of QualityDeepSense using the noise-augmented heterogeneous human activity recognition task. QualityDeepSense outperforms the state-of-the-art methods by a clear margin. In addition, we show QualityDeepSense only impose limited resource-consumption burden on embedded devices.