This paper presents a novel hierarchical density estimation approach for image classification. We first build a collection of randomized decision trees in a discriminative way to split the feature space into small regions. Then for each region, class-conditional Gaussians are learnt to characterize the "local" distribution of feature vectors falling into that region. The parameters of the Gaussians are reliably estimated through hierarchical maximum a posteriori (MAP) estimation and smoothed across multiple randomized trees in the forest. Compared with the widely-used Gaussian Mixture Models (GMM), our new approach not only yields more reliable parameter estimation, but also greatly reduces the computational cost at the testing stage. Experiments on scene classification demonstrate the effectiveness and efficiency of our new approach.