Since the emergence of extensive multimedia data, feature fusion has been more and more important for image and video retrieval, indexing and annotation. Existing feature fusion techniques simply concatenate a pair of different features or use canonical correlation analysis based methods for joint dimensionality reduction in the feature space. However, how to fuse multiple features in a generalized way is still an open problem. In this paper, we reformulate the multiple feature fusion as a general subspace learning problem. The objective of the framework is to find a general linear subspace in which the cumulative pairwise canonical correlation between every pair of feature sets is maximized after the dimension normalization and subspace projection. The learned subspace couples dimensionality reduction and feature fusion together, which can be applied to both unsupervised and supervised learning cases. In the supervised case, the pairwise canonical correlations of feature sets within the same classes are also counted in the objective function for maximization. To better model the high-order feature structure and overcome the computational difficulty, the features extracted from the same pattern source are represented by a single 2D tensor. The tensor-based dimensionality reduction methods are used to further extract low-dimensional discriminative features from the fused feature ensemble. Extensive experiments on visual data classification demonstrate the effectiveness and robustness of the proposed methods.