Rescuing lost history: Using big data to recover black women's lived experiences

Ruby Mendenhall, Mark W Vanmoer, Malaika McKee, Nicole Brown, Ismini Lourentzou, Assata Zerai, Michael L. Black, Karen Flynn

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This study employs Latent Dirichlet allocation (LDA) algorithms and comparative text mining to search 800,000 periodicals in JSTOR (Journal Storage) and HathiTrust from 1746 to 2014 to identify the types of conversations that emerge about Black women's shared experience over time and the resulting knowledge that developed called standpoint We used MALLET to interrogate various genres of text (poetry, science, psychology, sociology, African American Studies, policy, etc.). We also used comparative text mining (CTM) to explore latent themes across collections written in different time periods by analyzing the common and expert models. We used data visualization techniques, such as tree maps, to identify spikes in certain topics during various historical contexts such as slavery, reconstruction, Jim Crow, etc. We identified a subset of our corpus (20,000) comprised of articles about or by or Black women and compared patterns of words in the subset against the larger 800,000 corpus. Preliminary findings indicate that when we pulled 300,000 volumes, about 800,000 (~27%) do not have subject metadata. This appears to suggest that if a researcher searched for volumes about Black women, they may not have access to a significant amount of data on the topic. When volumes are not tagged properly, researchers would have to know that these texts exists when they do their searches. The recovery nature of this project involves identifying these untagged volumes and making the corpus publicly available to librarians and others with copyr. considerations.

Original languageEnglish (US)
Title of host publicationProceedings of XSEDE 2016
Subtitle of host publicationDiversity, Big Data, and Science at Scale
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450347556
DOIs
StatePublished - Jul 17 2016
EventConference on Diversity, Big Data, and Science at Scale, XSEDE 2016 - Miami, United States
Duration: Jul 17 2016Jul 21 2016

Publication series

NameACM International Conference Proceeding Series
Volume17-21-July-2016

Other

OtherConference on Diversity, Big Data, and Science at Scale, XSEDE 2016
CountryUnited States
CityMiami
Period7/17/167/21/16

Fingerprint

Data visualization
Metadata
Recovery
Big data

Keywords

  • Black women
  • Comparative text mining
  • Intermediate reading.
  • Standpoint theory
  • Topic modeling

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Mendenhall, R., Vanmoer, M. W., McKee, M., Brown, N., Lourentzou, I., Zerai, A., ... Flynn, K. (2016). Rescuing lost history: Using big data to recover black women's lived experiences. In Proceedings of XSEDE 2016: Diversity, Big Data, and Science at Scale [a56] (ACM International Conference Proceeding Series; Vol. 17-21-July-2016). Association for Computing Machinery. https://doi.org/10.1145/2949550.2949642

Rescuing lost history : Using big data to recover black women's lived experiences. / Mendenhall, Ruby; Vanmoer, Mark W; McKee, Malaika; Brown, Nicole; Lourentzou, Ismini; Zerai, Assata; Black, Michael L.; Flynn, Karen.

Proceedings of XSEDE 2016: Diversity, Big Data, and Science at Scale. Association for Computing Machinery, 2016. a56 (ACM International Conference Proceeding Series; Vol. 17-21-July-2016).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mendenhall, R, Vanmoer, MW, McKee, M, Brown, N, Lourentzou, I, Zerai, A, Black, ML & Flynn, K 2016, Rescuing lost history: Using big data to recover black women's lived experiences. in Proceedings of XSEDE 2016: Diversity, Big Data, and Science at Scale., a56, ACM International Conference Proceeding Series, vol. 17-21-July-2016, Association for Computing Machinery, Conference on Diversity, Big Data, and Science at Scale, XSEDE 2016, Miami, United States, 7/17/16. https://doi.org/10.1145/2949550.2949642
Mendenhall R, Vanmoer MW, McKee M, Brown N, Lourentzou I, Zerai A et al. Rescuing lost history: Using big data to recover black women's lived experiences. In Proceedings of XSEDE 2016: Diversity, Big Data, and Science at Scale. Association for Computing Machinery. 2016. a56. (ACM International Conference Proceeding Series). https://doi.org/10.1145/2949550.2949642
Mendenhall, Ruby ; Vanmoer, Mark W ; McKee, Malaika ; Brown, Nicole ; Lourentzou, Ismini ; Zerai, Assata ; Black, Michael L. ; Flynn, Karen. / Rescuing lost history : Using big data to recover black women's lived experiences. Proceedings of XSEDE 2016: Diversity, Big Data, and Science at Scale. Association for Computing Machinery, 2016. (ACM International Conference Proceeding Series).
@inproceedings{b679a8c39e744e5ab095014c51605ff2,
title = "Rescuing lost history: Using big data to recover black women's lived experiences",
abstract = "This study employs Latent Dirichlet allocation (LDA) algorithms and comparative text mining to search 800,000 periodicals in JSTOR (Journal Storage) and HathiTrust from 1746 to 2014 to identify the types of conversations that emerge about Black women's shared experience over time and the resulting knowledge that developed called standpoint We used MALLET to interrogate various genres of text (poetry, science, psychology, sociology, African American Studies, policy, etc.). We also used comparative text mining (CTM) to explore latent themes across collections written in different time periods by analyzing the common and expert models. We used data visualization techniques, such as tree maps, to identify spikes in certain topics during various historical contexts such as slavery, reconstruction, Jim Crow, etc. We identified a subset of our corpus (20,000) comprised of articles about or by or Black women and compared patterns of words in the subset against the larger 800,000 corpus. Preliminary findings indicate that when we pulled 300,000 volumes, about 800,000 (~27{\%}) do not have subject metadata. This appears to suggest that if a researcher searched for volumes about Black women, they may not have access to a significant amount of data on the topic. When volumes are not tagged properly, researchers would have to know that these texts exists when they do their searches. The recovery nature of this project involves identifying these untagged volumes and making the corpus publicly available to librarians and others with copyr. considerations.",
keywords = "Black women, Comparative text mining, Intermediate reading., Standpoint theory, Topic modeling",
author = "Ruby Mendenhall and Vanmoer, {Mark W} and Malaika McKee and Nicole Brown and Ismini Lourentzou and Assata Zerai and Black, {Michael L.} and Karen Flynn",
year = "2016",
month = "7",
day = "17",
doi = "10.1145/2949550.2949642",
language = "English (US)",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
booktitle = "Proceedings of XSEDE 2016",

}

TY - GEN

T1 - Rescuing lost history

T2 - Using big data to recover black women's lived experiences

AU - Mendenhall, Ruby

AU - Vanmoer, Mark W

AU - McKee, Malaika

AU - Brown, Nicole

AU - Lourentzou, Ismini

AU - Zerai, Assata

AU - Black, Michael L.

AU - Flynn, Karen

PY - 2016/7/17

Y1 - 2016/7/17

N2 - This study employs Latent Dirichlet allocation (LDA) algorithms and comparative text mining to search 800,000 periodicals in JSTOR (Journal Storage) and HathiTrust from 1746 to 2014 to identify the types of conversations that emerge about Black women's shared experience over time and the resulting knowledge that developed called standpoint We used MALLET to interrogate various genres of text (poetry, science, psychology, sociology, African American Studies, policy, etc.). We also used comparative text mining (CTM) to explore latent themes across collections written in different time periods by analyzing the common and expert models. We used data visualization techniques, such as tree maps, to identify spikes in certain topics during various historical contexts such as slavery, reconstruction, Jim Crow, etc. We identified a subset of our corpus (20,000) comprised of articles about or by or Black women and compared patterns of words in the subset against the larger 800,000 corpus. Preliminary findings indicate that when we pulled 300,000 volumes, about 800,000 (~27%) do not have subject metadata. This appears to suggest that if a researcher searched for volumes about Black women, they may not have access to a significant amount of data on the topic. When volumes are not tagged properly, researchers would have to know that these texts exists when they do their searches. The recovery nature of this project involves identifying these untagged volumes and making the corpus publicly available to librarians and others with copyr. considerations.

AB - This study employs Latent Dirichlet allocation (LDA) algorithms and comparative text mining to search 800,000 periodicals in JSTOR (Journal Storage) and HathiTrust from 1746 to 2014 to identify the types of conversations that emerge about Black women's shared experience over time and the resulting knowledge that developed called standpoint We used MALLET to interrogate various genres of text (poetry, science, psychology, sociology, African American Studies, policy, etc.). We also used comparative text mining (CTM) to explore latent themes across collections written in different time periods by analyzing the common and expert models. We used data visualization techniques, such as tree maps, to identify spikes in certain topics during various historical contexts such as slavery, reconstruction, Jim Crow, etc. We identified a subset of our corpus (20,000) comprised of articles about or by or Black women and compared patterns of words in the subset against the larger 800,000 corpus. Preliminary findings indicate that when we pulled 300,000 volumes, about 800,000 (~27%) do not have subject metadata. This appears to suggest that if a researcher searched for volumes about Black women, they may not have access to a significant amount of data on the topic. When volumes are not tagged properly, researchers would have to know that these texts exists when they do their searches. The recovery nature of this project involves identifying these untagged volumes and making the corpus publicly available to librarians and others with copyr. considerations.

KW - Black women

KW - Comparative text mining

KW - Intermediate reading.

KW - Standpoint theory

KW - Topic modeling

UR - http://www.scopus.com/inward/record.url?scp=84989204302&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84989204302&partnerID=8YFLogxK

U2 - 10.1145/2949550.2949642

DO - 10.1145/2949550.2949642

M3 - Conference contribution

AN - SCOPUS:84989204302

T3 - ACM International Conference Proceeding Series

BT - Proceedings of XSEDE 2016

PB - Association for Computing Machinery

ER -