The paper presents a real-time computing framework for intelligent real-time edge services, on behalf of local embedded devices that are themselves unable to support extensive computations. The work contributes to a new direction in realtime computing that develops scheduling algorithms for machine intelligence tasks that enable anytime prediction. We show that deep neural network workflows can be cast as imprecise computations, each with a mandatory part and (several) optional parts whose execution utility depends on input data. With our design, deep neural networks can be preempted before their completion and support anytime inference. The goal of the realtime scheduler is to maximize the average accuracy of deep neural network outputs while meeting task deadlines, thanks to opportunistic shedding of the least necessary optional parts. The work is motivated by the proliferation of increasingly ubiquitous but resource-constrained embedded devices (for applications ranging from autonomous cars to the Internet of Things) and the desire to develop services that endow them with intelligence. Experiments on recent GPU hardware and a state of the art deep neural network for machine vision illustrate that our scheme can increase the overall accuracy by 10% ∼ 20% while incurring (nearly) no deadline misses.