How did the data extraction business model come to dominate? Changes in the web use ecosystem before mobiles surpassed personal computers

Angela Xiao Wu, Harsh Taneja

Research output: Contribution to journalArticle


It is widely believed that the spread of data extraction technologies on the Internet has led to the erosion of traditional professional content providers and the transformation of the online media ecosystem. To investigate this shift in media ecology, we conduct relational analyses of actual user behavior, departing from existing research that primarily focuses on business institutions and designs of technology. We assess the prevalence of the data extraction business model by grouping websites along two architectural traits that afford data extraction–user content generation and curation–and analyzing how some website architectures get privileged in the web use ecosystem. Since data extraction is relational, we advocate a network measure to capture shared usage in addition to individual popularity of websites. Our analyses of world’s 850 most popular websites in 2009, 2011, and 2013 reveal that data extraction fostered a two-tier hierarchical web use ecosystem, marked by interdependence between professional content producers and data extractors. Our study thereby shows that the dynamics in play are more complicated than what is captured by explanations centered on either capabilities of platform giants or the decline of traditional journalism and media organizations.

Original languageEnglish (US)
Pages (from-to)272-285
Number of pages14
JournalInformation Society
Issue number5
StatePublished - Oct 20 2019



  • Advertising
  • curation
  • data extraction
  • intermediation
  • media industries
  • platformization
  • user-generated content
  • web usage

ASJC Scopus subject areas

  • Management Information Systems
  • Cultural Studies
  • Information Systems
  • Political Science and International Relations

Cite this