Convolutional neural networks (CNN) are a deep learning technique that has achieved state-of-the-art prediction performance in computer vision and robotics, but assume the input data can be formatted as an image or video (e.g. predicting a robot grasping location given RGB-D image input). This paper considers the problem of augmenting a traditional CNN for handling image-like input (called main-channel input) with additional, highly predictive, non-image-like input (called side-channel input). An example of such a task would be to predict whether a robot path is collision-free given an occupancy grid of the environment and the path's start and goal configurations; the occupancy grid is the main-channel and the start and goal are the side-channel. This paper presents several candidate network architectures for doing so. Empirical tests on robot collision prediction and control problems compare the proposed architectures in terms of learning speed, memory usage, learning capacity, and susceptibility to overfitting.