In the past two decades, digital libraries (DL) have increasingly supported computational studies of digitized books (Jett et al. The hathitrust research center extracted features dataset (2.0), 2020; Underwood, Distant horizons: digital evidence and literary change, University of Chicago Press, Chicago, 2019; Organisciak et al. J Assoc Inf Sci Technol 73:317–332, 2022; Michel et al. Science 331:176–182, 2011). Nonetheless, there remains a dearth of DL data provisions or infrastructures for research on book reception, and user-generated book reviews have opened up unprecedented research opportunities in this area. However, insufficient attention has been paid to real-world complexities and limitations of using these datasets in scholarly research, which may cause analytical oversights (Crawford and Finn, Geo J 80:491–502, 2015), methodological pitfalls (Olteanu et al. Front Big Data 2:13, 2019), and ethical concerns (Hu et al. Research with user-generated book review data: legal and ethical pitfalls and contextualized mitigations, Springer, Berlin, 2023; Diesner and Chin, Gratis, libre, or something else? regulations and misassumptions related to working with publicly available text data, 2016). In this paper, we present three case studies that contextually and empirically investigate book reviews for their temporal, cultural, and socio-participatory complexities: (1) a longitudinal analysis of a ranked book list across ten years and over one month; (2) a text classification of 20,000 sponsored and 20,000 non-sponsored books reviews; and (3) a comparative analysis of 537 book ratings from Anglophone and non-Anglophone readerships. Our work reflects on both (1) data curation challenges that researchers may encounter (e.g., platform providers’ lack of bibliographic control) when studying book reviews and (2) mitigations that researchers might adopt to address these challenges (e.g., how to align data from various platforms). Taken together, our findings illustrate some of the sociotechnical complexities of working with user-generated book reviews by revealing the transiency, power dynamics, and cultural dependency in these datasets. This paper explores some of the limitations and challenges of using user-generated book reviews for scholarship and calls for critical and contextualized usage of user-generated book reviews in future scholarly research.

Original languageEnglish (US)
JournalInternational Journal on Digital Libraries
StateAccepted/In press - 2023


  • Book reviews
  • Critical data science
  • Digital humanities
  • Digital libraries
  • Social media
  • User-generated content

ASJC Scopus subject areas

  • Library and Information Sciences


Dive into the research topics of 'Complexities of leveraging user-generated book reviews for scholarly research: transiency, power dynamics, and cultural dependency'. Together they form a unique fingerprint.

Cite this