TY - JOUR
T1 - Evaluation of LLMs and Other Machine Learning Methods in the Analysis of Qualitative Survey Responses for Accessible Engineering Education Research
AU - Ding, Xiuhao
AU - Gopannagari, Meghana
AU - Sun, Kang
AU - Tao, Alan
AU - Zhao, Delu Louis
AU - Varadhan, Sujit
AU - Hardy, Bobbi Lee Battleson
AU - Dalpiaz, David
AU - Vogiatzis, Chrysafis
AU - Angrave, Lawrence
AU - Liu, Hongye
N1 - This work was funded in part by the Institute for Inclusion, Diversity, Equity, and Access in the Grainger College of Engineering, University of Illinois Urbana-Champaign (Grant #: GIANT2021-03 and GIANT2022-08), and Microsoft Grant for developing UDL practices in college campuses.
PY - 2024/6/23
Y1 - 2024/6/23
N2 - This research paper provides insights and guidance for selecting appropriate analytical tools in engineering educational research. Currently, educators and researchers face difficulties in gaining insights effectively from free-response survey data. We evaluate the effectiveness and accuracy of Large Language Models (LLMs), in addition to the existing methods that employ topic modeling, document clustering coupled with Support Vector Machine (SVM) and Random Forest (RF) approaches, and the unsupervised Latent Dirichlet Allocation (LDA) method. Free responses to open-ended questions from student surveys in multiple courses at University of Illinois Urbana-Champaign were previously collected by engineering education accessibility researchers. The data (N=129 with seven free response questions per student) were previously analyzed to assess the effectiveness, satisfaction, and quality of adding accessible digital notes to multiple engineering courses and the students' perceived belongingness, and self-efficacy. Manual codings for the seven open-ended questions were generated for qualitative tasks of sentiment analysis, topic modeling, and summarization and were used in this study as a gold standard to evaluate automated text analytic approaches. Raw text from open-ended questions was converted into numerical vectors using text vectorization and word embeddings and an unsupervised analysis using document clustering and topic modeling was performed using LDA and BERT methods. In addition to conventional machine learning models, multiple pre-trained open-sourced local LLMs were evaluated (BART and LLaMA) for summarization. The remote online ChatGPT closed-model services by OpenAI (ChatGPT-3.5 and ChatGPT-4) were excluded due to subject data privacy concerns. By comparing the accuracy, recall, and depth of thematic insights derived, we evaluated how effectively the method based on each model categorized and summarized students' responses across educational research interests of effectiveness, satisfaction, and quality of education materials. The paper will present these results and discuss the implications of our findings and conclusions.
AB - This research paper provides insights and guidance for selecting appropriate analytical tools in engineering educational research. Currently, educators and researchers face difficulties in gaining insights effectively from free-response survey data. We evaluate the effectiveness and accuracy of Large Language Models (LLMs), in addition to the existing methods that employ topic modeling, document clustering coupled with Support Vector Machine (SVM) and Random Forest (RF) approaches, and the unsupervised Latent Dirichlet Allocation (LDA) method. Free responses to open-ended questions from student surveys in multiple courses at University of Illinois Urbana-Champaign were previously collected by engineering education accessibility researchers. The data (N=129 with seven free response questions per student) were previously analyzed to assess the effectiveness, satisfaction, and quality of adding accessible digital notes to multiple engineering courses and the students' perceived belongingness, and self-efficacy. Manual codings for the seven open-ended questions were generated for qualitative tasks of sentiment analysis, topic modeling, and summarization and were used in this study as a gold standard to evaluate automated text analytic approaches. Raw text from open-ended questions was converted into numerical vectors using text vectorization and word embeddings and an unsupervised analysis using document clustering and topic modeling was performed using LDA and BERT methods. In addition to conventional machine learning models, multiple pre-trained open-sourced local LLMs were evaluated (BART and LLaMA) for summarization. The remote online ChatGPT closed-model services by OpenAI (ChatGPT-3.5 and ChatGPT-4) were excluded due to subject data privacy concerns. By comparing the accuracy, recall, and depth of thematic insights derived, we evaluated how effectively the method based on each model categorized and summarized students' responses across educational research interests of effectiveness, satisfaction, and quality of education materials. The paper will present these results and discuss the implications of our findings and conclusions.
UR - http://www.scopus.com/inward/record.url?scp=85202055258&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85202055258&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85202055258
SN - 2153-5965
JO - ASEE Annual Conference and Exposition, Conference Proceedings
JF - ASEE Annual Conference and Exposition, Conference Proceedings
T2 - 2024 ASEE Annual Conference and Exposition
Y2 - 23 June 2024 through 26 June 2024
ER -