Description
Impact assessment is an evolving area of research that aims at measuring and predicting the potential effects of projects or programs. Measuring the impact of scientific research is a vibrant subdomain, closely intertwined with impact assessment. A recurring obstacle pertains to the absence of an efficient framework which can facilitate the analysis of lengthy reports and text labeling. To address this issue, we propose a framework for automatically assessing the impact of scientific research projects by identifying pertinent sections in project reports that indicate the potential impacts. We leverage a mixed-method approach, combining manual annotations with supervised machine learning, to extract these passages from project reports. This is a repository to save datasets and codes related to this project.
Please read and cite the following paper if you would like to use the data:
Becker M., Han K., Werthmann A., Rezapour R., Lee H., Diesner J., and Witt A. (2024). Detecting Impact Relevant Sections in Scientific Research. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING).
This folder contains the following files:
evaluation_20220927.ods: Annotated German passages (Artificial Intelligence, Linguistics, and Music) - training data
annotated_data.big_set.corrected.txt: Annotated German passages (Mobility) - training data
incl_translation_all.csv: Annotated English passages (Artificial Intelligence, Linguistics, and Music) - training data
incl_translation_mobility.csv: Annotated German passages (Mobility) - training data
ttparagraph_addmob.txt: German corpus (unannotated passages)
model_result_extraction.csv: Extracted impact-relevant passages from the German corpus based on the model we trained
rf_model.joblib: The random forest model we trained to extract impact-relevant passages
Data processing codes can be found at: https://github.com/khan1792/texttransfer
Please read and cite the following paper if you would like to use the data:
Becker M., Han K., Werthmann A., Rezapour R., Lee H., Diesner J., and Witt A. (2024). Detecting Impact Relevant Sections in Scientific Research. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING).
This folder contains the following files:
evaluation_20220927.ods: Annotated German passages (Artificial Intelligence, Linguistics, and Music) - training data
annotated_data.big_set.corrected.txt: Annotated German passages (Mobility) - training data
incl_translation_all.csv: Annotated English passages (Artificial Intelligence, Linguistics, and Music) - training data
incl_translation_mobility.csv: Annotated German passages (Mobility) - training data
ttparagraph_addmob.txt: German corpus (unannotated passages)
model_result_extraction.csv: Extracted impact-relevant passages from the German corpus based on the model we trained
rf_model.joblib: The random forest model we trained to extract impact-relevant passages
Data processing codes can be found at: https://github.com/khan1792/texttransfer
Date made available | Mar 21 2024 |
---|---|
Publisher | University of Illinois Urbana-Champaign |
Keywords
- annotation
- machine learning
- mixed-methods
- impact detection
- project reports