Deep convolutional neural networks (CNNs) have recognized promise as universal representations for various image recognition tasks. One of their properties is the ability to transfer knowledge from a large annotated source dataset (e.g., ImageNet) to a (typically smaller) target dataset. This is usually accomplished through supervised fine-tuning on labeled new target data. In this work, we address 'unsupervised fine-tuning' that transfers a pre-trained network to target tasks with unlabeled data such as image clustering tasks. To this end, we introduce group-sparse non-negative matrix factorization (GSNMF), a variant of NMF, to identify a rich set of high-level latent variables that are informative on the target task. The resulting 'factorized convolutional network' (FCN) can itself be seen as a feed-forward model that combines CNN and two-layer structured NMF. We empirically validate our approach and demonstrate state-of-the-art image clustering performance on challenging scene (MIT-67) and fine-grained (Birds-200, Flowers-102) benchmarks. We further show that, when used as unsupervised initialization, our approach improves image classification performance as well.