Generating Small Areal Synthetic Microdata from Public Aggregated Data Using an Optimization Method

Yue Lin, Ningchuan Xiao

Research output: Contribution to journalArticlepeer-review

Abstract

Small area microdata contain attributes and locations of individual members of a population in small census geographies. This type of data is critical in research and policymaking, but it is often not publicly available due to confidentiality concerns. The limited access to small area microdata can result in insufficient data for certain research (data scarcity). Even for researchers qualified to access the small area microdata, their research can hardly be reproduced by others (method irreproducibility). To address these issues, we develop a method to generate small area synthetic microdata (SASM) that is suitable for public use. Specifically, an optimization approach is proposed to minimize the difference between published census tables and the SASM. Two counties in Ohio are used as case studies to test the efficacy of the proposed method and the validity of the resulting SASM. The results show that the SASM aligns not only with the census tables, but also with an external data source that contains a sample of the small area microdata. We also illustrate how the SASM can be used to address data scarcity and method irreproducibility in demographic research.

Original languageEnglish (US)
Pages (from-to)905-915
Number of pages11
JournalProfessional Geographer
Volume75
Issue number6
DOIs
StatePublished - 2023
Externally publishedYes

Keywords

  • census data
  • open and reproducible research
  • spatial microdata
  • spatial open data
  • synthetic population

ASJC Scopus subject areas

  • Geography, Planning and Development
  • Earth-Surface Processes

Fingerprint

Dive into the research topics of 'Generating Small Areal Synthetic Microdata from Public Aggregated Data Using an Optimization Method'. Together they form a unique fingerprint.

Cite this