To defend against collaborative cheating in code writing questions, instructors of courses with online, asynchronous exams can use the strategy of question variants. These question variants are manually written questions to be selected at random during exam time to assess the same learning goal. In order to create these variants, currently the instructors have to rely on intuition to accomplish the competing goals of ensuring that variants are different enough to defend against collaborative cheating, and yet similar enough where students are assessed fairly. In this paper, we propose data-driven investigation into these variants. We apply our data-driven investigation into a dataset of three midterm exams from a large introductory programming course. Our results show that (1) observable inequalities of student performance exist between variants and (2) these differences are not just limited to score. Our results also show that the information gathered from our data-driven investigation can be used to provide recommendations for improving design of future variants.