Modeling the spatial structure of crop inputs is of great importance for accurate yield prediction. It is a fundamental step towards optimizing the spatial allocation of resources such as seed and fertilizer. We propose two distinct architectures of Multi-Stream Convolutional Neural Network (MSCNN) - Late Fusion (LF) and Early Fusion (EF) - to model yield response to seed and nutrient management. A study presents a comparison between proposed models with conventional 2D and 3D CNN architectures, and existing agronomy methods. The dataset used to train and test the models is constructed using on-farm experiment data from nine cornfields across the US together with multispectral satellite images. Results show that the MSCNN-LF achieved a 20% reduction of the prediction's mean squared error value when compared to a 3D CNN, and a 26% reduction when compared to a 2D CNN. An optimization algorithm uses the MSCNN-LF model's gradient to change the manageable inputs variables in a way the expected profit is maximized subject to resource constraints. It is shown that an increase of up to 5.2% on expected crop yield return is obtained when compared to usual management practices.