Keypoint matching in images of indoor scenes traditionally employs features like SIFT, GIST and HOG. While those features work very well for two images related to each other by small camera transformations, we commonly observe a drop in performance for patches representing scene elements visualized from a very different perspective. Since increasing the space of considered local transformations for feature matching decreases their discriminative abilities, we propose a more global approach inspired by the recent success of monocular scene understanding. In particular we propose to reconstruct a box-like model of the scene from every single image and use it to rectify images before matching. We show that a monocular scene model reconstruction and rectification preceding standard feature matching significantly improves keypoint matching and dramatically improves reconstruction of difficult indoor scenes.