ICD coding aims to automatically assign International Classification of Diseases (ICD) codes from unstructured clinical notes or discharge summaries, which saves human labor and reduces errors. Although several studies are proposed to solve this challenging task, none distinguishes the importance of different phrases with a word window. Intuitively, informative phrases should be more useful for the prediction. This paper proposes a feature compressed ICD coding model named Fusion to address this issue. In particular, we propose an attentive soft-pooling approach to compress the sparse and redundant word representations into informative and dense ones as local features. Besides, we use the key-query attention mechanism for modeling the inner relations among local features to generate the global features, which are further used to predict ICD codes. Experiments on two widely used datasets demonstrate that Fusion outperforms baselines. However, on the MIMIC-III Full dataset, we find that none of the state-of-the-art approaches significantly perform better than others. Thus, automated ICD coding is still a challenging task.