VL-Con: Vision-Language Dataset for Deep Learning-based Construction Monitoring Applications

Shun Hsiang Hsu, Junryu Fu, Mani Golparvar-Fard

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recently, vision-language research has gained significant interest by successfully connecting visual concepts to natural language, advancing computer vision-based construction monitoring using a wide variety of text queries. While vision language models demonstrate high capability, performance degradation can be expected when adapting the model to the ever-changing construction scenarios. In contrast to the source image-text pairs, it is more challenging to cover the multitude of potentially involved objects and their naming conventions for construction activities. To bridge the domain gap, this study aims to collect construction-specific image-text pairs of building elements and related site work based on the ASTM Uniformat II. The image-text pairs of 641 activities in Uniformat are retrieved from the LAION-5B dataset based on the image and text embeddings using CLIP with two different prompts. Then, the collected images are labeled at the image level to conclude the requirements of the vision-language datasets for further development. Based on the results, a vision-language dataset, VL-Con, consisting of image-text pairs for construction monitoring applications is proposed with the aid of a construction semantic predictor and prompt engineering. The proposed VL-Con dataset can be accessed through https://github.com/huhuman/VL-Con.

Original languageEnglish (US)
Title of host publicationProceedings of the 41st International Symposium on Automation and Robotics in Construction, ISARC 2024
PublisherInternational Association for Automation and Robotics in Construction (IAARC)
Pages1128-1135
Number of pages8
ISBN (Electronic)9780645832211
DOIs
StatePublished - 2024
Event41st International Symposium on Automation and Robotics in Construction, ISARC 2024 - Lille, France
Duration: Jun 3 2024Jun 5 2024

Publication series

NameProceedings of the International Symposium on Automation and Robotics in Construction
ISSN (Electronic)2413-5844

Conference

Conference41st International Symposium on Automation and Robotics in Construction, ISARC 2024
Country/TerritoryFrance
CityLille
Period6/3/246/5/24

Keywords

  • Construction Monitoring
  • Foundation Model
  • Vision-Language Dataset

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Civil and Structural Engineering
  • Building and Construction
  • Safety, Risk, Reliability and Quality
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'VL-Con: Vision-Language Dataset for Deep Learning-based Construction Monitoring Applications'. Together they form a unique fingerprint.

Cite this