The deep neural network becomes an increasingly crucial component in recent intelligent applications. The excessive resource consumptions of state-of-the-art neural networks, however, remains a huge impediment towards their widespread deployment in the Internet of Things (IoT). In this paper, we propose an IoT-oriented deep learning serving system, Stardust, that accelerates the neural network inference to improve the quality of IoT services. Stardust integrates several joint contributions from both the system and AI perspectives, including system performance predictor, model compression, and compressive offloading. On one hand, the performance predictor profiles and predicts the runtime characteristics of neural network operations on a particular device with the targeted runtime environment, which enables a hardware and software oriented performance optimization during model compression and offloading. On the other hand, the model compression minimizes the computation time of neural networks on different devices, and the compressive offloading diminishes the network data transferring time during the mobile-edge offloading. Moreover, all these optimizations can be done with almost no compromise on inference accuracy. The integration of these modules, therefore, collaboratively reduce the end-to-end latency of serving deep learning services that reside across embedded/mobile devices and edge servers. We deploy illustrative applications on Stardust, performing human perception tasks with on-device camera microphone and motion sensors to demonstrate the capability of Stardust serving system.