Many real problems of multi-task learning exhibit hierarchical task relatedness. In other words, the tasks are partitioned into multiple groups. Different tasks within the same group are related on the task-level, whereas different groups are related on the group-level. For example, in semiconductor manufacturing, the problem of wafer quality prediction can be considered as hierarchical multi-task learning, where each task corresponds to a single side of a chamber with side-level relatedness, and a group of tasks corresponds to a chamber of multiple sides with chamber-level relatedness. Motivated by this application, in this paper, we propose an optimization framework for hierarchical multi-task learning, which partitions all the input features into 2 sets based on their characteristics, and models task-level and group-level relatedness by imposing different constraints on the coefficient vectors of the 2 sets. This is different from existing work on task clustering where the goal is to uncover the grouping of tasks, the tasks do not exhibit group-level relatedness, and the input features are not discriminated in the prediction model to model task-level and group-level relatedness. To solve this framework, we propose the HEAR algorithm based on block coordinate descent, and demonstrate its effectiveness on both synthetic and real data sets from domains of semiconductor manufacturing and document classification.