Accurately recognizing learner affect is critically important for enabling affect-responsive learning environments to support student learning and engagement. Multimodal affect detection combining sensor-based and sensor-free approaches has shown significant promise in both laboratory and classroom settings. However, important questions remain regarding which data channels are most predictive and how they should be combined. In this paper, we investigate a multimodal affect detection framework that integrates motion tracking-based posture data and interaction-based trace data to recognize the affective states of students engaged with a game-based learning environment for emergency medical training. We compare several machine learning-based affective models using competing feature-level and decision-level multimodal data fusion approaches. Results indicate that multimodal affect detectors induced using joint feature representations from posture-based and interaction-based data channels yield improved accuracy relative to unimodal models across several learner-centered affective states. These findings point toward implications for the design of multimodal affect-responsive learning environments that support learning and engagement.