Conceptual novelty scores for PubMed articles



Conceptual novelty analysis data based on PubMed Medical Subject Headings
Created by Shubhanshu Mishra, and Vetle I. Torvik on April 16th, 2018

## Introduction

This is a dataset created as part of the publication titled: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : the magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra.
It contains final data generated as part of our experiments based on MEDLINE 2015 baseline and MeSH tree from 2015.
The dataset is distributed in the form of the following tab separated text files:

* PubMed2015_NoveltyData.tsv - Novelty scores for each paper in PubMed. The file contains 22,349,417 rows and 6 columns, as follow:
- PMID: PubMed ID
- Year: year of publication
- TimeNovelty: time novelty score of the paper based on individual concepts (see paper)
- VolumeNovelty: volume novelty score of the paper based on individual concepts (see paper)
- PairTimeNovelty: time novelty score of the paper based on pair of concepts (see paper)
- PairVolumeNovelty: volume novelty score of the paper based on pair of concepts (see paper)

* mesh_scores.tsv - Temporal profiles for each MeSH term for all years. The file contains 1,102,831 rows and 5 columns, as follow:
- MeshTerm: Name of the MeSH term
- Year: year
- AbsVal: Total publications with that MeSH term in the given year
- TimeNovelty: age (in years since first publication) of MeSH term in the given year
- VolumeNovelty: : age (in number of papers since first publication) of MeSH term in the given year

* meshpair_scores.txt.gz (36 GB uncompressed) - Temporal profiles for each MeSH term for all years
- Mesh1: Name of the first MeSH term (alphabetically sorted)
- Mesh2: Name of the second MeSH term (alphabetically sorted)
- Year: year
- AbsVal: Total publications with that MeSH pair in the given year
- TimeNovelty: age (in years since first publication) of MeSH pair in the given year
- VolumeNovelty: : age (in number of papers since first publication) of MeSH pair in the given year

* README.txt file

## Dataset creation

This dataset was constructed using multiple datasets described in the following locations:
* MEDLINE 2015 baseline: <a href=""></a>
* MeSH tree 2015: <a href=""></a>
* Source code provided at: <a href=""></a>

Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016.
Check <a href="">here </a>for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions:

Additional data related updates can be found at: <a href="">Torvik Research Group</a>

## Acknowledgments

This work was made possible in part with funding to VIT from <a href="">NIH grant P01AG039347 </a> and <a href="">NSF grant 1348742 </a>. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

## License

Conceptual novelty analysis data based on PubMed Medical Subject Headings by Shubhanshu Mishra, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License.
Permissions beyond the scope of this license may be available at <a href=""></a>
Date made availableApr 23 2018
PublisherUniversity of Illinois Urbana-Champaign


  • PubMed
  • bibliometrics
  • Conceptual novelty
  • Medical Subject Headings
  • MeSH
  • Analysis

Cite this