In many real applications, the input data are naturally expressed as tensors, such as virtual metrology in semiconductor manufacturing, face recognition and gait recognition in computer vision, etc. In this paper, we propose a general optimization framework for dealing with tensor inputs. Most existing methods for supervised tensor learning use only rank-one weight tensors in the linear model and cannot readily incorporate domain knowledge. In our framework, we obtain the weight tensor in a hierarchical way - we first approximate it by a low-rank tensor, and then estimate the low-rank approximation using the prior knowledge from various sources, e.g., different domain experts. This is motivated by wafer quality prediction in semiconductor manufacturing. Furthermore, we propose an effective algorithm named H-MOTE for solving this framework, which is guaranteed to converge. The time complexity of H-MOTE is linear with respect to the number of examples as well as the size of the weight tensor. Experimental results show the superiority of H-MOTE over state-of-the-art techniques on both synthetic and real data sets.