The use of convolutional neural networks (CNNs) for establishing anthropomorphic numerical observers (ANOs) is being actively explored. In these data-driven approaches, CNNs are trained in a standard supervised way with human-labeled training data; hence, the anthropomorphic component of this procedure resides only in the training labels. However, it is well-known that such traditionally trained CNNs can rely on image features that are highly specific to the training distribution and may not align with features exploited by human perception. While being able to predict human observer performance under certain specified conditions, traditionally-Trained CNNs lack the interpretability and robustness that may be desired for an ANO. To address this, in this work we investigate the use of an adversarial robust training strategy for training CNN-based observers. As recently demonstrated in the computer vision literature, this training strategy can result in CNNs that exploit more human-interpretable features than would be employed by a standard CNN. Robustly trained CNNs are systematically investigated for performing a signal-known-exactly (SKE) and background-known-statistically (BKS) binary detection task. Additionally, a differential evolution-based optimization procedure is developed to establish robustly trained CNNs that achieve a specified performance, which may provide a new approach to establishing ANOs. 2022 SPIE.