Topic Modeling and Visualization for Big Data in Social Sciences

Nitin Sukhija, Mahidhar Tatineni, Nicole Brown, Mark Van Moer, Paul Rodriguez, Spencer Callicott

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Topic modeling is a widely used approach for analyzing large text collections. In particular, Latent Dirichlet Allocation (LDA) is one of the most popular topic modeling approaches to aggregate vocabulary from a document corpus to form latent 'topics'. However, learning meaningful topic models with massive document collections which contain millions of documents, billions of tokens is challenging, given the complexity of the data involved, the difficulty in distributing the computation across multiple computing nodes. In recent years some data processing frameworks, such as Spark, Mallet, others have been developed to address the issues associated with analyzing large volumes of unlabeled text pertaining to various domains in a scalable, efficient manner. In this paper, we will present a preliminary case study demonstrating the scholarship achieved in the study of political consumerism via XSEDE resources. The experimental study will showcase the use of digitized social sciences data, text analytics toolkits to generate topic models, visualize topics for empowering intersectional research engaging the relationship between consumption, race, class, gender in the area of sociology. Consequently, this comparative big data textual analysis involving use of JSTOR data, LDA modeling toolkit's, visualization techniques, computational components is of paramount importance, especially for researchers from academic domain dealing with social science applications involving big data.

Original languageEnglish (US)
Title of host publicationProceedings - 13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016
EditorsDidier El Baz, Julien Bourgeois
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1198-1205
Number of pages8
ISBN (Electronic)9781509027705
DOIs
StatePublished - Jan 12 2017
Event13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016 - Toulouse, France
Duration: Jul 18 2016Jul 21 2016

Publication series

NameProceedings - 13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016

Other

Other13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016
CountryFrance
CityToulouse
Period7/18/167/21/16

    Fingerprint

Keywords

  • Big Data
  • LDA
  • Machine learning
  • Mallet
  • Scalability
  • Social Science
  • Spark
  • Text Analytics
  • Topic Modeling
  • Visualization

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Cite this

Sukhija, N., Tatineni, M., Brown, N., Moer, M. V., Rodriguez, P., & Callicott, S. (2017). Topic Modeling and Visualization for Big Data in Social Sciences. In D. El Baz, & J. Bourgeois (Eds.), Proceedings - 13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016 (pp. 1198-1205). [7816979] (Proceedings - 13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0183